Fine-Tuner Component¶
The Fine-Tuner is the core component responsible for training language models using Parameter-Efficient Fine-Tuning (PEFT) techniques, specifically LoRA (Low-Rank Adaptation).
Overview¶
The FineTune
class in app/finetuner.py
handles the complete fine-tuning pipeline:
- Model Loading: Downloads and prepares the base model
- Data Processing: Converts datasets to the required format
- Training: Executes the fine-tuning process
- Saving: Saves the model locally and to Hugging Face Hub
Key Features¶
- ✅ LoRA Fine-Tuning: Memory-efficient training with Low-Rank Adaptation
- ✅ 4-bit/8-bit Quantization: Reduce memory usage without sacrificing quality
- ✅ Chat Template Support: Automatic formatting for conversational models
- ✅ Weights & Biases Integration: Comprehensive experiment tracking
- ✅ Hub Integration: Automatic model publishing to Hugging Face
- ✅ Gradient Checkpointing: Memory optimization for large models
- ✅ Response-Only Training: Train only on assistant responses
Architecture¶
mermaid
graph TD
A[Configuration] --> B[Model Loading]
B --> C[Dataset Processing]
C --> D[LoRA Setup]
D --> E[Training Loop]
E --> F[Model Saving]
E --> G[W&B Logging]
F --> H[Hub Upload]
Usage¶
Basic Usage¶
from app.finetuner import FineTune
# Initialize with default configuration
tuner = FineTune()
# Run the fine-tuning process
trainer_stats = tuner.run()
Custom Configuration¶
from app.config_manager import ConfigManager
from app.finetuner import FineTune
# Load custom configuration
config_manager = ConfigManager("custom_config.toml")
tuner = FineTune(config_manager=config_manager)
# Run training
trainer_stats = tuner.run()
Command Line Usage¶
# Basic usage with API keys
uv run app/finetuner.py --hf-key "your_token" --wandb-key "your_key"
# Using environment variables
export HF_TOKEN="your_token"
export WANDB_TOKEN="your_key"
uv run app/finetuner.py
Core Methods¶
load_base_model_and_tokenizer()
¶
Loads the pre-trained model and tokenizer from Hugging Face Hub or local path.
Features: - Automatic dtype selection (bfloat16 on supported hardware) - 4-bit/8-bit quantization support - Gated model authentication - Memory-efficient loading
get_peft_model()
¶
Converts the base model to a Parameter-Efficient Fine-Tuning (PEFT) model using LoRA.
Configuration: - Rank and alpha values for adaptation strength - Target modules for LoRA application - Dropout and bias handling - RSLoRA and LoFTQ support
convert_a_data_row_to_conversation_format()
¶
Transforms dataset rows into conversation format for chat-based training.
Output Format:
{
"conversations": [
{"role": "system", "content": "System prompt"},
{"role": "user", "content": "User question"},
{"role": "assistant", "content": "Assistant response"}
]
}
apply_chat_template_to_conversations()
¶
Applies the model's chat template to format conversations for training.
Features: - Automatic template detection - Consistent formatting across models - Support for various chat formats (ChatML, Llama, etc.)
Training Process¶
1. Initialization Phase¶
[1/6] Loading configuration from config.toml
[2/6] Authenticating with Hugging Face Hub
[3/6] Loading base model and tokenizer
[4/6] Converting to PEFT model with LoRA
[5/6] Processing training dataset
[6/6] Initializing trainer and W&B logging
2. Training Phase¶
Training Progress:
Epoch 1/3: ████████████████████ 100% [Loss: 1.234]
Epoch 2/3: ████████████████████ 100% [Loss: 0.987]
Epoch 3/3: ████████████████████ 100% [Loss: 0.765]
3. Saving Phase¶
[1/3] Saving model to ./models/fine_tuned/
[2/3] Uploading to Hugging Face Hub
[3/3] Finalizing W&B logging
Memory Management¶
The Fine-Tuner implements several memory optimization strategies:
Quantization¶
# 4-bit quantization (most memory efficient)
load_in_4bit = True
load_in_8bit = False
# 8-bit quantization (balanced)
load_in_4bit = False
load_in_8bit = True
# Full precision (highest quality, most memory)
load_in_4bit = False
load_in_8bit = False
Gradient Checkpointing¶
# Unsloth optimized checkpointing (recommended)
use_gradient_checkpointing = "unsloth"
# Standard checkpointing
use_gradient_checkpointing = True
# No checkpointing (fastest, most memory)
use_gradient_checkpointing = False
Batch Size Optimization¶
# Effective batch size = device_batch_size * grad_accumulation
device_train_batch_size = 2 # Per-device batch size
grad_accumulation = 8 # Accumulation steps
# Effective batch size = 2 * 8 = 16
Data Processing Pipeline¶
Input Formats¶
The Fine-Tuner supports multiple dataset formats:
Processing Steps¶
- Column Mapping: Maps dataset columns to expected format
- Conversation Conversion: Transforms to chat format
- Template Application: Applies model-specific chat template
- Response Filtering: Optionally trains only on responses
Monitoring and Logging¶
Weights & Biases Integration¶
The Fine-Tuner automatically logs:
- Training Metrics: Loss, learning rate, gradients
- System Metrics: GPU utilization, memory usage
- Model Metadata: Configuration, hyperparameters
- Artifacts: Model checkpoints, configuration files
Console Output¶
--- ✅ Login to Hugging Face Hub successful. ---
--- ✅ Training dataset loaded: username/dataset ---
--- ✅ Model and tokenizer loaded successfully. ---
--- ✅ Data preprocessing completed. ---
Run name set to: fine-tuned-model-20250629-143022
--- ✅ Weights & Biases setup completed. ---
--- ✅ Trainer initialized successfully. ---
--- ✅ Starting training... ---
--- ✅ Training completed with stats: {...} ---
--- ✅ Model saved and uploaded successfully. ---
Error Handling¶
The Fine-Tuner includes comprehensive error handling:
Common Errors¶
# Model loading errors
FileNotFoundError: Model not found at specified path
HfHubError: Authentication failed or model access denied
# Configuration errors
ValueError: Invalid configuration parameter
TypeError: Incorrect data type for parameter
# Training errors
RuntimeError: CUDA out of memory
DataError: Dataset format not supported
# Upload errors
ConnectionError: Failed to upload to Hugging Face Hub
Recovery Strategies¶
# Automatic retry for network errors
# Checkpoint saving for recovery from interruptions
# Memory optimization suggestions for OOM errors
# Configuration validation before training starts
Performance Optimization¶
Speed Optimizations¶
- Sequence Packing: Pack multiple short sequences per batch
- Flash Attention: Use optimized attention implementation
- Mixed Precision: Automatic selection of optimal precision
- Data Loading: Parallel dataset processing
Quality Optimizations¶
- LoRA Rank: Higher rank for better adaptation
- Learning Rate: Optimal scheduling and warmup
- Validation: Monitor overfitting with validation set
- Response-Only Training: Focus learning on responses
Integration with Other Components¶
Inferencer Integration¶
# Fine-tuner saves model with consistent naming
model_path = f"{hf_user_id}/{run_name}"
# Inferencer automatically loads the model
inferencer = Inferencer(model_path=model_path)
Evaluator Integration¶
# Fine-tuner outputs can be evaluated
evaluator = Evaluator(
model_outputs="inferencer_output.jsonl",
ground_truth="test_dataset.json"
)
Advanced Usage¶
Custom Training Loop¶
class CustomFineTune(FineTune):
def custom_training_step(self, batch):
# Custom training logic
return super().training_step(batch)
def on_epoch_end(self, epoch, logs):
# Custom epoch-end logic
self.save_intermediate_checkpoint(epoch)
Multi-GPU Training¶
# Automatic multi-GPU detection and usage
# Data parallel training across available GPUs
# Gradient synchronization and accumulation
Best Practices¶
Configuration¶
- Use version control for configurations
- Document parameter choices
- Test with small datasets first
Training¶
- Monitor loss curves in W&B
- Use validation sets to prevent overfitting
- Save checkpoints regularly
Memory Management¶
- Start with 4-bit quantization
- Adjust batch size based on GPU memory
- Use gradient checkpointing for large models
Quality¶
- Use appropriate learning rates
- Ensure data quality and format consistency
- Validate model outputs after training