Advanced Configuration¶

This tutorial covers advanced configuration options and techniques for optimizing your Fine-Tune Pipeline for specific use cases and performance requirements.

Prerequisites¶

Before diving into advanced configuration, ensure you have:

✅ Completed the Basic Fine-Tuning Tutorial
✅ Understanding of your hardware limitations
✅ Familiarity with your specific use case requirements
✅ Access to validation datasets for optimization

Advanced LoRA Configuration¶

High-Rank LoRA for Maximum Quality¶

For maximum adaptation capacity at the cost of more parameters:

[fine_tuner]
# High-capacity LoRA setup
rank = 128
lora_alpha = 64
lora_dropout = 0.05

# Target more modules for comprehensive adaptation
target_modules = [
    "q_proj", "k_proj", "v_proj", "o_proj",          # Attention
    "gate_proj", "up_proj", "down_proj",             # MLP
    "embed_tokens", "lm_head"                        # Embeddings
]

# Advanced LoRA techniques
use_rslora = true     # Rank-Stabilized LoRA

LoFTQ Integration¶

LoFTQ (LoRA-Fine-Tuning-aware Quantization) for better quantized fine-tuning:

[fine_tuner]
rank = 32
lora_alpha = 32

# LoFTQ configuration
loftq_config = {
    "loftq_bits": 4,
    "loftq_iter": 1
}

# Must use with quantization
load_in_4bit = true

Dynamic LoRA Scaling¶

[fine_tuner]
# Start with lower rank, increase if needed
rank = 16
lora_alpha = 32  # Higher alpha for stronger adaptation

# Use dropout scheduling
lora_dropout = 0.1  # Start higher, reduce during training

Memory Optimization Strategies¶

Extreme Memory Constraints (4GB GPU)¶

[fine_tuner]
# Ultra-efficient setup
base_model_id = "unsloth/Qwen2.5-0.5B-Instruct-bnb-4bit"
max_sequence_length = 512
load_in_4bit = true

# Minimal batch processing
device_train_batch_size = 1
grad_accumulation = 32  # Maintain effective batch size of 32

# Memory-saving techniques
use_gradient_checkpointing = "unsloth"
packing = false  # Disable for memory savings
dataset_num_proc = 1  # Reduce parallel processing

# Conservative LoRA
rank = 8
lora_alpha = 16
target_modules = ["q_proj", "v_proj"]  # Minimal targets

High-Memory Systems (24GB+ GPU)¶

[fine_tuner]
# Take advantage of available memory
base_model_id = "unsloth/Llama-3.1-8B-Instruct-bnb-4bit"
max_sequence_length = 8192
device_train_batch_size = 8
grad_accumulation = 2

# High-capacity LoRA
rank = 64
lora_alpha = 32
target_modules = [
    "q_proj", "k_proj", "v_proj", "o_proj",
    "gate_proj", "up_proj", "down_proj",
    "embed_tokens", "lm_head"
]

# Performance optimizations
packing = true
use_flash_attention = true
dataset_num_proc = 8

Training Optimization¶

Learning Rate Scheduling¶

[fine_tuner]
# Sophisticated learning rate schedule
learning_rate = 0.0003
warmup_steps = 100
lr_scheduler_type = "cosine"

# Longer training with decay
epochs = 5
weight_decay = 0.01

# Monitor for overfitting
save_steps = 50
eval_steps = 50  # If validation set available

Curriculum Learning¶

Start with easier examples, progress to harder ones:

[fine_tuner]
# Phase 1: Easy examples (short, simple)
training_data_id = "username/easy-examples"
epochs = 2
learning_rate = 0.0005

# Phase 2: Medium difficulty
# Update config and continue training
training_data_id = "username/medium-examples"
epochs = 2
learning_rate = 0.0002

# Phase 3: Full dataset
training_data_id = "username/full-dataset"
epochs = 1
learning_rate = 0.0001

Multi-Stage Training¶

# Stage 1: Quick adaptation
[fine_tuner]
rank = 16
learning_rate = 0.0005
epochs = 1

# Stage 2: Quality refinement
# rank = 32  # Increase capacity
# learning_rate = 0.0001
# epochs = 3

Data Engineering¶

Advanced Data Processing¶

[fine_tuner]
# Sophisticated prompt engineering
system_prompt_override_text = """You are an expert assistant with the following capabilities:
1. Provide accurate, well-researched information
2. Cite sources when applicable
3. Admit uncertainty when appropriate
4. Use clear, concise language

Instructions: {task_specific_instructions}
Context: {domain_context}"""

# Dynamic prompt templates
question_part = "<|im_start|>user\n{context}\n\nQuestion: "
answer_part = "<|im_start|>assistant\nLet me think about this step by step.\n\n"

Data Augmentation¶

Create variations of your training data:

# Example data augmentation script
import random

def augment_qa_pair(question, answer):
    """Create variations of Q&A pairs."""

    # Question variations
    question_templates = [
        f"Can you explain {question.lower()}?",
        f"What is your understanding of {question.lower()}?",
        f"Please describe {question.lower()}.",
        f"Help me understand {question.lower()}."
    ]

    # Answer style variations
    answer_styles = [
        f"Certainly! {answer}",
        f"Here's my explanation: {answer}",
        f"Let me break this down: {answer}",
        f"To answer your question: {answer}"
    ]

    return random.choice(question_templates), random.choice(answer_styles)

Quality Filtering¶

def filter_high_quality_samples(dataset, min_length=50, max_length=1000):
    """Filter dataset for quality samples."""

    filtered = []
    for sample in dataset:
        question = sample['question']
        answer = sample['answer']

        # Length filters
        if not (min_length <= len(answer) <= max_length):
            continue

        # Quality heuristics
        if answer.count('.') < 2:  # Too short/simple
            continue

        if question.lower() in answer.lower():  # Repetitive
            continue

        # Language quality
        if not is_coherent_text(answer):
            continue

        filtered.append(sample)

    return filtered

Multi-GPU Training¶

Data Parallel Training¶

[fine_tuner]
# Configure for multi-GPU setup
device_train_batch_size = 4  # Per GPU
grad_accumulation = 2        # Per GPU

# Total effective batch size = num_gpus * batch_size * grad_accumulation
# Example: 2 GPUs * 4 batch * 2 accumulation = 16 effective batch size

# Optimize for multi-GPU
dataset_num_proc = 16  # More parallel processing
save_steps = 100       # Less frequent saves

Mixed Precision Training¶

[fine_tuner]
# Automatic mixed precision
dtype = "null"  # Auto-select best precision

# Manual precision control
# dtype = "bfloat16"  # For A100, H100
# dtype = "float16"   # For older GPUs

Domain-Specific Optimizations¶

Code Generation¶

[fine_tuner]
# Optimized for code tasks
base_model_id = "unsloth/CodeLlama-7B-Instruct-bnb-4bit"
max_sequence_length = 4096  # Longer for code

# Code-specific prompting
system_prompt_override_text = """You are an expert programmer. Provide:
1. Clean, well-commented code
2. Explanation of logic
3. Best practices and conventions
4. Error handling where appropriate"""

# Training parameters for code
learning_rate = 0.0001  # Lower for code precision
epochs = 3
train_on_responses_only = true  # Focus on code generation

Mathematical Reasoning¶

[fine_tuner]
# Math-focused setup
system_prompt_override_text = """You are a mathematics tutor. Always:
1. Show step-by-step solutions
2. Explain mathematical concepts clearly
3. Check your work for accuracy
4. Use proper mathematical notation"""

# Longer sequences for detailed explanations
max_sequence_length = 2048
max_new_tokens = 512

# Conservative training to maintain accuracy
learning_rate = 0.00005
epochs = 2

Conversational AI¶

[fine_tuner]
# Optimized for dialogue
system_prompt_override_text = """You are a helpful, empathetic assistant. You:
1. Listen carefully to user concerns
2. Provide thoughtful, personalized responses
3. Ask clarifying questions when needed
4. Maintain context throughout conversations"""

# Dialogue-specific parameters
temperature = 0.8           # More creative responses
repetition_penalty = 1.2    # Avoid repetitive responses
train_on_responses_only = true

Validation and Monitoring¶

Advanced Validation Setup¶

[fine_tuner]
# Comprehensive validation
validation_data_id = "username/validation-set"
eval_steps = 25             # Frequent evaluation
save_strategy = "steps"
evaluation_strategy = "steps"

# Early stopping
load_best_model_at_end = true
metric_for_best_model = "eval_loss"
greater_is_better = false

Custom Metrics Monitoring¶

class CustomCallback:
    def on_evaluate(self, logs):
        """Custom evaluation callback."""

        # Custom metrics
        perplexity = math.exp(logs.get('eval_loss', 0))
        logs['eval_perplexity'] = perplexity

        # Log to Weights & Biases
        wandb.log({
            'eval/perplexity': perplexity,
            'eval/custom_score': self.compute_custom_score()
        })

Performance Monitoring¶

Comprehensive Logging¶

[fine_tuner]
# Detailed logging setup
log_steps = 5
logging_first_step = true
report_to = "wandb"

# Custom logging configuration
wandb_project_name = "advanced-fine-tuning"
run_name_prefix = "exp-"

# Additional metrics
dataloader_drop_last = false
include_inputs_for_metrics = true

Resource Monitoring¶

import psutil
import GPUtil

def monitor_resources():
    """Monitor system resources during training."""

    # CPU and Memory
    cpu_percent = psutil.cpu_percent()
    memory = psutil.virtual_memory()

    # GPU monitoring
    gpus = GPUtil.getGPUs()
    gpu_stats = {
        f'gpu_{i}_utilization': gpu.load * 100,
        f'gpu_{i}_memory': gpu.memoryUtil * 100,
        f'gpu_{i}_temperature': gpu.temperature
        for i, gpu in enumerate(gpus)
    }

    # Log to Weights & Biases
    wandb.log({
        'system/cpu_percent': cpu_percent,
        'system/memory_percent': memory.percent,
        **gpu_stats
    })

Troubleshooting Advanced Setups¶

Memory Issues¶

# Monitor GPU memory usage
nvidia-smi -l 1

# Check for memory leaks
python -c "
import torch
print(f'Allocated: {torch.cuda.memory_allocated() / 1e9:.2f} GB')
print(f'Cached: {torch.cuda.memory_reserved() / 1e9:.2f} GB')
"

Training Instability¶

[fine_tuner]
# Stabilize training
learning_rate = 0.00005      # Lower learning rate
weight_decay = 0.01          # L2 regularization
grad_clip_norm = 1.0         # Gradient clipping

# Use warmup
warmup_steps = 100
lr_scheduler_type = "linear"

Convergence Issues¶

[fine_tuner]
# Improve convergence
epochs = 10                  # More training
eval_steps = 50             # Frequent evaluation
patience = 3                # Early stopping patience

# Data quality
max_seq_length = 2048       # Consistent length
packing = false             # Avoid sequence mixing

Production Considerations¶

Model Versioning¶

[fine_tuner]
# Systematic versioning
run_name_prefix = "v2-production-"
push_to_hub = true
save_total_limit = 5

# Detailed metadata
tags = ["production", "v2.0", "optimized"]

Reproducibility¶

[fine_tuner]
# Ensure reproducibility
seed = 42
deterministic = true

# Version control
log_model_config = true
save_training_args = true

Automated Hyperparameter Tuning¶

import optuna

def objective(trial):
    """Optuna objective for hyperparameter optimization."""

    # Suggest hyperparameters
    rank = trial.suggest_int('rank', 8, 64, step=8)
    learning_rate = trial.suggest_float('learning_rate', 1e-5, 1e-3, log=True)
    epochs = trial.suggest_int('epochs', 1, 5)

    # Update configuration
    config.rank = rank
    config.learning_rate = learning_rate
    config.epochs = epochs

    # Run training
    tuner = FineTune(config=config)
    stats = tuner.run()

    # Return metric to optimize
    return stats.eval_loss

# Run optimization
study = optuna.create_study(direction='minimize')
study.optimize(objective, n_trials=20)

Next Steps¶

After mastering advanced configuration:

Experiment Tracking: Set up comprehensive experiment management
A/B Testing: Compare different configurations systematically
Production Deployment: Scale your optimized models
Continuous Learning: Implement online learning workflows

Your advanced configuration skills will enable you to squeeze maximum performance from your fine-tuning pipeline while efficiently managing computational resources.