Fine-Tuner Configuration¶
The [fine_tuner]
section controls all aspects of the model fine-tuning process. This page provides comprehensive documentation of all available configuration options.
Model Configuration¶
Base Model Settings¶
[fine_tuner]
# Required: Hugging Face model ID or local path
base_model_id = "unsloth/Qwen2.5-0.5B-Instruct-bnb-4bit"
# Maximum sequence length for training
max_sequence_length = 4096
# Data type for model weights (null for auto-detection)
dtype = "null" # Options: "float16", "bfloat16", null
# Quantization settings (choose one)
load_in_4bit = true
load_in_8bit = false
# Whether to use full fine-tuning instead of LoRA
full_finetuning = false
Model Recommendations¶
Use Case | Recommended Model | Memory Requirement |
---|---|---|
Quick Testing | unsloth/Qwen2.5-0.5B-Instruct-bnb-4bit |
2GB |
Development | unsloth/Qwen2.5-1.5B-Instruct-bnb-4bit |
4GB |
Production | unsloth/Llama-3.2-3B-Instruct-bnb-4bit |
8GB |
Large Scale | unsloth/Llama-3.1-8B-Instruct-bnb-4bit |
16GB |
LoRA Configuration¶
Low-Rank Adaptation (LoRA) settings for parameter-efficient fine-tuning:
[fine_tuner]
# LoRA rank - higher values = more trainable parameters
rank = 16 # Typical values: 8, 16, 32, 64
# LoRA alpha - scaling factor for LoRA updates
lora_alpha = 16 # Usually equal to rank
# Dropout rate for LoRA layers
lora_dropout = 0.1 # Range: 0.0 - 0.3
# Target modules for LoRA adaptation
target_modules = [
"q_proj", "k_proj", "v_proj", "o_proj", # Attention layers
"gate_proj", "up_proj", "down_proj" # Feed-forward layers
]
# Bias handling
bias = "none" # Options: "none", "all", "lora_only"
# Advanced LoRA options
use_rslora = false # Rank-Stabilized LoRA
loftq_config = "null" # LoFTQ configuration
LoRA Performance Guide¶
Rank | Parameters | Speed | Quality | Use Case |
---|---|---|---|---|
8 | ~0.5M | Fast | Good | Quick prototyping |
16 | ~1M | Medium | Better | General purpose |
32 | ~2M | Slower | High | Quality-focused |
64 | ~4M | Slowest | Highest | Research/production |
Dataset Configuration¶
Data Sources¶
[fine_tuner]
# Required: Training dataset
training_data_id = "your-username/training-dataset"
# Optional: Validation dataset
validation_data_id = "your-username/validation-dataset" # or "null"
# Number of processes for dataset loading
dataset_num_proc = 4
Column Mapping¶
Map your dataset columns to the expected format:
[fine_tuner]
# Required columns
question_column = "question" # Input/instruction column
ground_truth_column = "answer" # Target/response column
# Optional system prompt
system_prompt_column = "system" # System prompt column (or "null")
# Override system prompt for all examples
system_prompt_override_text = "null" # Custom system prompt (or "null")
Dataset Format Examples¶
Training Parameters¶
Basic Training Settings¶
[fine_tuner]
# Number of training epochs
epochs = 3
# Learning rate
learning_rate = 0.0002 # Typical range: 1e-5 to 5e-4
# Batch sizes
device_train_batch_size = 4 # Per-device batch size
device_validation_batch_size = 4 # Validation batch size
grad_accumulation = 4 # Gradient accumulation steps
# Warmup and scheduling
warmup_steps = 5 # Learning rate warmup
lr_scheduler_type = "linear" # Options: "linear", "cosine", "constant"
# Optimization
optimizer = "paged_adamw_8bit" # Memory-efficient optimizer
weight_decay = 0.01 # L2 regularization
# Random seed for reproducibility
seed = 42
Advanced Training Options¶
[fine_tuner]
# Memory optimization
use_gradient_checkpointing = "unsloth" # Options: true, false, "unsloth"
use_flash_attention = true # Flash attention for efficiency
packing = false # Pack multiple sequences per batch
# Training on responses only (for chat models)
train_on_responses_only = true
question_part = "<|im_start|>user\n" # Question template
answer_part = "<|im_start|>assistant\n" # Answer template
Logging and Monitoring¶
Weights & Biases Integration¶
[fine_tuner]
# W&B project name
wandb_project_name = "fine-tuning-project"
# Logging configuration
log_steps = 10 # Log metrics every N steps
log_first_step = true # Log the first step
report_to = "wandb" # Reporting backend: "wandb", "tensorboard", "none"
Model Saving¶
[fine_tuner]
# Checkpoint saving
save_steps = 20 # Save checkpoint every N steps
save_total_limit = 3 # Maximum number of checkpoints to keep
# Hugging Face Hub integration
push_to_hub = true # Push final model to Hub
Run Naming¶
Control how your training runs are named:
[fine_tuner]
# Run name configuration
run_name = "null" # Custom run name (or "null" for auto)
run_name_prefix = "" # Prefix for auto-generated names
run_name_suffix = "" # Suffix for auto-generated names
Run Name Examples¶
Configuration | Generated Name |
---|---|
run_name = "my-model" |
my-model |
run_name_prefix = "exp-" |
exp-20250629-143022 |
run_name_suffix = "-v1" |
20250629-143022-v1 |
Memory Optimization Guide¶
For 4GB GPU (RTX 3060)¶
[fine_tuner]
base_model_id = "unsloth/Qwen2.5-0.5B-Instruct-bnb-4bit"
max_sequence_length = 1024
load_in_4bit = true
device_train_batch_size = 1
grad_accumulation = 16
use_gradient_checkpointing = "unsloth"
rank = 8
For 8GB GPU (RTX 3070)¶
[fine_tuner]
base_model_id = "unsloth/Qwen2.5-1.5B-Instruct-bnb-4bit"
max_sequence_length = 2048
load_in_4bit = true
device_train_batch_size = 2
grad_accumulation = 8
use_gradient_checkpointing = "unsloth"
rank = 16
For 12GB+ GPU (RTX 3080 Ti/4070 Ti)¶
[fine_tuner]
base_model_id = "unsloth/Llama-3.2-3B-Instruct-bnb-4bit"
max_sequence_length = 4096
load_in_4bit = true
device_train_batch_size = 4
grad_accumulation = 4
use_gradient_checkpointing = "unsloth"
rank = 32
Performance Tuning¶
Speed Optimization¶
[fine_tuner]
# Enable packing for 5x speed improvement on short sequences
packing = true
# Use flash attention
use_flash_attention = true
# Optimize data loading
dataset_num_proc = 8 # Match your CPU cores
# Efficient precision
dtype = "null" # Auto-select best precision
Quality Optimization¶
[fine_tuner]
# Higher LoRA rank for better quality
rank = 64
lora_alpha = 32
# More training epochs
epochs = 5
# Lower learning rate for stability
learning_rate = 0.0001
# Add validation dataset
validation_data_id = "your-username/validation-dataset"
Common Configuration Patterns¶
Research/Experimentation¶
Production Training¶
[fine_tuner]
epochs = 5
device_train_batch_size = 8
push_to_hub = true
save_steps = 100
save_total_limit = 5
Memory-Constrained¶
[fine_tuner]
load_in_4bit = true
device_train_batch_size = 1
grad_accumulation = 32
use_gradient_checkpointing = "unsloth"
max_sequence_length = 1024
Troubleshooting¶
Out of Memory Errors¶
- Reduce
device_train_batch_size
- Increase
grad_accumulation
to maintain effective batch size - Reduce
max_sequence_length
- Enable
use_gradient_checkpointing
- Use smaller model or higher quantization
Slow Training¶
- Enable
packing = true
- Enable
use_flash_attention = true
- Increase
dataset_num_proc
- Use larger
device_train_batch_size
if memory allows - Consider using a smaller model for prototyping
Poor Quality Results¶
- Increase
rank
andlora_alpha
- Add validation dataset
- Increase
epochs
- Lower
learning_rate
- Check data quality and format