Fine-Tuner Configuration¶

The [fine_tuner] section controls all aspects of the model fine-tuning process. This page provides comprehensive documentation of all available configuration options.

Model Configuration¶

Base Model Settings¶

[fine_tuner]
# Required: Hugging Face model ID or local path
base_model_id = "unsloth/Qwen2.5-0.5B-Instruct-bnb-4bit"

# Maximum sequence length for training
max_sequence_length = 4096

# Data type for model weights (null for auto-detection)
dtype = "null"  # Options: "float16", "bfloat16", null

# Quantization settings (choose one)
load_in_4bit = true
load_in_8bit = false

# Whether to use full fine-tuning instead of LoRA
full_finetuning = false

Model Recommendations¶

Use Case	Recommended Model	Memory Requirement
Quick Testing	`unsloth/Qwen2.5-0.5B-Instruct-bnb-4bit`	2GB
Development	`unsloth/Qwen2.5-1.5B-Instruct-bnb-4bit`	4GB
Production	`unsloth/Llama-3.2-3B-Instruct-bnb-4bit`	8GB
Large Scale	`unsloth/Llama-3.1-8B-Instruct-bnb-4bit`	16GB

LoRA Configuration¶

Low-Rank Adaptation (LoRA) settings for parameter-efficient fine-tuning:

[fine_tuner]
# LoRA rank - higher values = more trainable parameters
rank = 16  # Typical values: 8, 16, 32, 64

# LoRA alpha - scaling factor for LoRA updates
lora_alpha = 16  # Usually equal to rank

# Dropout rate for LoRA layers
lora_dropout = 0.1  # Range: 0.0 - 0.3

# Target modules for LoRA adaptation
target_modules = [
    "q_proj", "k_proj", "v_proj", "o_proj",  # Attention layers
    "gate_proj", "up_proj", "down_proj"       # Feed-forward layers
]

# Bias handling
bias = "none"  # Options: "none", "all", "lora_only"

# Advanced LoRA options
use_rslora = false      # Rank-Stabilized LoRA
loftq_config = "null"   # LoFTQ configuration

LoRA Performance Guide¶

Rank	Parameters	Speed	Quality	Use Case
8	~0.5M	Fast	Good	Quick prototyping
16	~1M	Medium	Better	General purpose
32	~2M	Slower	High	Quality-focused
64	~4M	Slowest	Highest	Research/production

Dataset Configuration¶

Data Sources¶

[fine_tuner]
# Required: Training dataset
training_data_id = "your-username/training-dataset"

# Optional: Validation dataset
validation_data_id = "your-username/validation-dataset"  # or "null"

# Number of processes for dataset loading
dataset_num_proc = 4

Column Mapping¶

Map your dataset columns to the expected format:

[fine_tuner]
# Required columns
question_column = "question"        # Input/instruction column
ground_truth_column = "answer"      # Target/response column

# Optional system prompt
system_prompt_column = "system"     # System prompt column (or "null")

# Override system prompt for all examples
system_prompt_override_text = "null"  # Custom system prompt (or "null")

Dataset Format Examples¶

Q&A FormatInstruction FormatChat Format

{
    "question": "What is the capital of France?",
    "answer": "The capital of France is Paris.",
    "system": "You are a geography expert."
}

{
    "instruction": "Summarize the following text:",
    "input": "Long text to summarize...",
    "output": "Brief summary of the text."
}

{
    "conversations": [
        {"role": "user", "content": "Hello!"},
        {"role": "assistant", "content": "Hi there!"}
    ]
}

Training Parameters¶

Basic Training Settings¶

[fine_tuner]
# Number of training epochs
epochs = 3

# Learning rate
learning_rate = 0.0002  # Typical range: 1e-5 to 5e-4

# Batch sizes
device_train_batch_size = 4        # Per-device batch size
device_validation_batch_size = 4   # Validation batch size
grad_accumulation = 4              # Gradient accumulation steps

# Warmup and scheduling
warmup_steps = 5                   # Learning rate warmup
lr_scheduler_type = "linear"       # Options: "linear", "cosine", "constant"

# Optimization
optimizer = "paged_adamw_8bit"     # Memory-efficient optimizer
weight_decay = 0.01                # L2 regularization

# Random seed for reproducibility
seed = 42

Advanced Training Options¶

[fine_tuner]
# Memory optimization
use_gradient_checkpointing = "unsloth"  # Options: true, false, "unsloth"
use_flash_attention = true              # Flash attention for efficiency
packing = false                         # Pack multiple sequences per batch

# Training on responses only (for chat models)
train_on_responses_only = true
question_part = "<|im_start|>user\n"      # Question template
answer_part = "<|im_start|>assistant\n"   # Answer template

Logging and Monitoring¶

Weights & Biases Integration¶

[fine_tuner]
# W&B project name
wandb_project_name = "fine-tuning-project"

# Logging configuration
log_steps = 10          # Log metrics every N steps
log_first_step = true   # Log the first step
report_to = "wandb"     # Reporting backend: "wandb", "tensorboard", "none"

Model Saving¶

[fine_tuner]
# Checkpoint saving
save_steps = 20           # Save checkpoint every N steps
save_total_limit = 3      # Maximum number of checkpoints to keep

# Hugging Face Hub integration
push_to_hub = true        # Push final model to Hub

Run Naming¶

Control how your training runs are named:

[fine_tuner]
# Run name configuration
run_name = "null"           # Custom run name (or "null" for auto)
run_name_prefix = ""        # Prefix for auto-generated names
run_name_suffix = ""        # Suffix for auto-generated names

Run Name Examples¶

Configuration	Generated Name
`run_name = "my-model"`	`my-model`
`run_name_prefix = "exp-"`	`exp-20250629-143022`
`run_name_suffix = "-v1"`	`20250629-143022-v1`

Memory Optimization Guide¶

For 4GB GPU (RTX 3060)¶

[fine_tuner]
base_model_id = "unsloth/Qwen2.5-0.5B-Instruct-bnb-4bit"
max_sequence_length = 1024
load_in_4bit = true
device_train_batch_size = 1
grad_accumulation = 16
use_gradient_checkpointing = "unsloth"
rank = 8

For 8GB GPU (RTX 3070)¶

[fine_tuner]
base_model_id = "unsloth/Qwen2.5-1.5B-Instruct-bnb-4bit"
max_sequence_length = 2048
load_in_4bit = true
device_train_batch_size = 2
grad_accumulation = 8
use_gradient_checkpointing = "unsloth"
rank = 16

For 12GB+ GPU (RTX 3080 Ti/4070 Ti)¶

[fine_tuner]
base_model_id = "unsloth/Llama-3.2-3B-Instruct-bnb-4bit"
max_sequence_length = 4096
load_in_4bit = true
device_train_batch_size = 4
grad_accumulation = 4
use_gradient_checkpointing = "unsloth"
rank = 32

Performance Tuning¶

Speed Optimization¶

[fine_tuner]
# Enable packing for 5x speed improvement on short sequences
packing = true

# Use flash attention
use_flash_attention = true

# Optimize data loading
dataset_num_proc = 8  # Match your CPU cores

# Efficient precision
dtype = "null"  # Auto-select best precision

Quality Optimization¶

[fine_tuner]
# Higher LoRA rank for better quality
rank = 64
lora_alpha = 32

# More training epochs
epochs = 5

# Lower learning rate for stability
learning_rate = 0.0001

# Add validation dataset
validation_data_id = "your-username/validation-dataset"

Common Configuration Patterns¶

Research/Experimentation¶

[fine_tuner]
epochs = 1
device_train_batch_size = 1
push_to_hub = false
report_to = "none"

Production Training¶

[fine_tuner]
epochs = 5
device_train_batch_size = 8
push_to_hub = true
save_steps = 100
save_total_limit = 5

Memory-Constrained¶

[fine_tuner]
load_in_4bit = true
device_train_batch_size = 1
grad_accumulation = 32
use_gradient_checkpointing = "unsloth"
max_sequence_length = 1024

Troubleshooting¶

Out of Memory Errors¶

Reduce device_train_batch_size
Increase grad_accumulation to maintain effective batch size
Reduce max_sequence_length
Enable use_gradient_checkpointing
Use smaller model or higher quantization

Slow Training¶

Enable packing = true
Enable use_flash_attention = true
Increase dataset_num_proc
Use larger device_train_batch_size if memory allows
Consider using a smaller model for prototyping

Poor Quality Results¶

Increase rank and lora_alpha
Add validation dataset
Increase epochs
Lower learning_rate
Check data quality and format