Skip to content

Fine-Tuner Configuration

The [fine_tuner] section controls all aspects of the model fine-tuning process. This page provides comprehensive documentation of all available configuration options.

Model Configuration

Base Model Settings

[fine_tuner]
# Required: Hugging Face model ID or local path
base_model_id = "unsloth/Qwen2.5-0.5B-Instruct-bnb-4bit"

# Maximum sequence length for training
max_sequence_length = 4096

# Data type for model weights (null for auto-detection)
dtype = "null"  # Options: "float16", "bfloat16", null

# Quantization settings (choose one)
load_in_4bit = true
load_in_8bit = false

# Whether to use full fine-tuning instead of LoRA
full_finetuning = false

Model Recommendations

Use Case Recommended Model Memory Requirement
Quick Testing unsloth/Qwen2.5-0.5B-Instruct-bnb-4bit 2GB
Development unsloth/Qwen2.5-1.5B-Instruct-bnb-4bit 4GB
Production unsloth/Llama-3.2-3B-Instruct-bnb-4bit 8GB
Large Scale unsloth/Llama-3.1-8B-Instruct-bnb-4bit 16GB

LoRA Configuration

Low-Rank Adaptation (LoRA) settings for parameter-efficient fine-tuning:

[fine_tuner]
# LoRA rank - higher values = more trainable parameters
rank = 16  # Typical values: 8, 16, 32, 64

# LoRA alpha - scaling factor for LoRA updates
lora_alpha = 16  # Usually equal to rank

# Dropout rate for LoRA layers
lora_dropout = 0.1  # Range: 0.0 - 0.3

# Target modules for LoRA adaptation
target_modules = [
    "q_proj", "k_proj", "v_proj", "o_proj",  # Attention layers
    "gate_proj", "up_proj", "down_proj"       # Feed-forward layers
]

# Bias handling
bias = "none"  # Options: "none", "all", "lora_only"

# Advanced LoRA options
use_rslora = false      # Rank-Stabilized LoRA
loftq_config = "null"   # LoFTQ configuration

LoRA Performance Guide

Rank Parameters Speed Quality Use Case
8 ~0.5M Fast Good Quick prototyping
16 ~1M Medium Better General purpose
32 ~2M Slower High Quality-focused
64 ~4M Slowest Highest Research/production

Dataset Configuration

Data Sources

[fine_tuner]
# Required: Training dataset
training_data_id = "your-username/training-dataset"

# Optional: Validation dataset
validation_data_id = "your-username/validation-dataset"  # or "null"

# Number of processes for dataset loading
dataset_num_proc = 4

Column Mapping

Map your dataset columns to the expected format:

[fine_tuner]
# Required columns
question_column = "question"        # Input/instruction column
ground_truth_column = "answer"      # Target/response column

# Optional system prompt
system_prompt_column = "system"     # System prompt column (or "null")

# Override system prompt for all examples
system_prompt_override_text = "null"  # Custom system prompt (or "null")

Dataset Format Examples

{
    "question": "What is the capital of France?",
    "answer": "The capital of France is Paris.",
    "system": "You are a geography expert."
}
{
    "instruction": "Summarize the following text:",
    "input": "Long text to summarize...",
    "output": "Brief summary of the text."
}
{
    "conversations": [
        {"role": "user", "content": "Hello!"},
        {"role": "assistant", "content": "Hi there!"}
    ]
}

Training Parameters

Basic Training Settings

[fine_tuner]
# Number of training epochs
epochs = 3

# Learning rate
learning_rate = 0.0002  # Typical range: 1e-5 to 5e-4

# Batch sizes
device_train_batch_size = 4        # Per-device batch size
device_validation_batch_size = 4   # Validation batch size
grad_accumulation = 4              # Gradient accumulation steps

# Warmup and scheduling
warmup_steps = 5                   # Learning rate warmup
lr_scheduler_type = "linear"       # Options: "linear", "cosine", "constant"

# Optimization
optimizer = "paged_adamw_8bit"     # Memory-efficient optimizer
weight_decay = 0.01                # L2 regularization

# Random seed for reproducibility
seed = 42

Advanced Training Options

[fine_tuner]
# Memory optimization
use_gradient_checkpointing = "unsloth"  # Options: true, false, "unsloth"
use_flash_attention = true              # Flash attention for efficiency
packing = false                         # Pack multiple sequences per batch

# Training on responses only (for chat models)
train_on_responses_only = true
question_part = "<|im_start|>user\n"      # Question template
answer_part = "<|im_start|>assistant\n"   # Answer template

Logging and Monitoring

Weights & Biases Integration

[fine_tuner]
# W&B project name
wandb_project_name = "fine-tuning-project"

# Logging configuration
log_steps = 10          # Log metrics every N steps
log_first_step = true   # Log the first step
report_to = "wandb"     # Reporting backend: "wandb", "tensorboard", "none"

Model Saving

[fine_tuner]
# Checkpoint saving
save_steps = 20           # Save checkpoint every N steps
save_total_limit = 3      # Maximum number of checkpoints to keep

# Hugging Face Hub integration
push_to_hub = true        # Push final model to Hub

Run Naming

Control how your training runs are named:

[fine_tuner]
# Run name configuration
run_name = "null"           # Custom run name (or "null" for auto)
run_name_prefix = ""        # Prefix for auto-generated names
run_name_suffix = ""        # Suffix for auto-generated names

Run Name Examples

Configuration Generated Name
run_name = "my-model" my-model
run_name_prefix = "exp-" exp-20250629-143022
run_name_suffix = "-v1" 20250629-143022-v1

Memory Optimization Guide

For 4GB GPU (RTX 3060)

[fine_tuner]
base_model_id = "unsloth/Qwen2.5-0.5B-Instruct-bnb-4bit"
max_sequence_length = 1024
load_in_4bit = true
device_train_batch_size = 1
grad_accumulation = 16
use_gradient_checkpointing = "unsloth"
rank = 8

For 8GB GPU (RTX 3070)

[fine_tuner]
base_model_id = "unsloth/Qwen2.5-1.5B-Instruct-bnb-4bit"
max_sequence_length = 2048
load_in_4bit = true
device_train_batch_size = 2
grad_accumulation = 8
use_gradient_checkpointing = "unsloth"
rank = 16

For 12GB+ GPU (RTX 3080 Ti/4070 Ti)

[fine_tuner]
base_model_id = "unsloth/Llama-3.2-3B-Instruct-bnb-4bit"
max_sequence_length = 4096
load_in_4bit = true
device_train_batch_size = 4
grad_accumulation = 4
use_gradient_checkpointing = "unsloth"
rank = 32

Performance Tuning

Speed Optimization

[fine_tuner]
# Enable packing for 5x speed improvement on short sequences
packing = true

# Use flash attention
use_flash_attention = true

# Optimize data loading
dataset_num_proc = 8  # Match your CPU cores

# Efficient precision
dtype = "null"  # Auto-select best precision

Quality Optimization

[fine_tuner]
# Higher LoRA rank for better quality
rank = 64
lora_alpha = 32

# More training epochs
epochs = 5

# Lower learning rate for stability
learning_rate = 0.0001

# Add validation dataset
validation_data_id = "your-username/validation-dataset"

Common Configuration Patterns

Research/Experimentation

[fine_tuner]
epochs = 1
device_train_batch_size = 1
push_to_hub = false
report_to = "none"

Production Training

[fine_tuner]
epochs = 5
device_train_batch_size = 8
push_to_hub = true
save_steps = 100
save_total_limit = 5

Memory-Constrained

[fine_tuner]
load_in_4bit = true
device_train_batch_size = 1
grad_accumulation = 32
use_gradient_checkpointing = "unsloth"
max_sequence_length = 1024

Troubleshooting

Out of Memory Errors

  1. Reduce device_train_batch_size
  2. Increase grad_accumulation to maintain effective batch size
  3. Reduce max_sequence_length
  4. Enable use_gradient_checkpointing
  5. Use smaller model or higher quantization

Slow Training

  1. Enable packing = true
  2. Enable use_flash_attention = true
  3. Increase dataset_num_proc
  4. Use larger device_train_batch_size if memory allows
  5. Consider using a smaller model for prototyping

Poor Quality Results

  1. Increase rank and lora_alpha
  2. Add validation dataset
  3. Increase epochs
  4. Lower learning_rate
  5. Check data quality and format