Basic Fine-Tuning Tutorial¶

This tutorial will guide you through your first fine-tuning project using the Fine-Tune Pipeline. We'll fine-tune a small language model on a question-answering dataset.

Prerequisites¶

Before starting this tutorial, ensure you have:

✅ Installed the pipeline
✅ Set up your environment
✅ API keys for Hugging Face and Weights & Biases
✅ A GPU with at least 4GB VRAM (or CPU for slower training)

Tutorial Overview¶

In this tutorial, we will:

Prepare a custom dataset
Configure the pipeline
Run fine-tuning
Test the fine-tuned model
Evaluate results

Estimated Time: 30-45 minutes

Step 1: Prepare Your Dataset¶

We'll create a simple question-answering dataset about programming concepts.

Create a Dataset File¶

Create a file called programming_qa.jsonl:

{"question": "What is a variable in programming?", "answer": "A variable is a named storage location in memory that holds a value which can be modified during program execution."}
{"question": "What is a function?", "answer": "A function is a reusable block of code that performs a specific task and can accept inputs (parameters) and return outputs."}
{"question": "What is object-oriented programming?", "answer": "Object-oriented programming (OOP) is a programming paradigm based on the concept of objects, which contain data (attributes) and code (methods)."}
{"question": "What is an algorithm?", "answer": "An algorithm is a step-by-step procedure or set of instructions designed to solve a specific problem or perform a particular task."}
{"question": "What is debugging?", "answer": "Debugging is the process of finding, analyzing, and fixing errors or bugs in computer programs to ensure they work correctly."}

Upload to Hugging Face Hub¶

Create a new dataset repository on Hugging Face Hub
Upload your dataset:

# Install Hugging Face CLI if not already installed
pip install huggingface_hub[cli]

# Login to Hugging Face
huggingface-cli login

# Create and upload dataset
huggingface-cli repo create your-username/programming-qa-dataset --type dataset
huggingface-cli upload your-username/programming-qa-dataset programming_qa.jsonl

Alternatively, you can use the web interface to upload your file.

Step 2: Configure the Pipeline¶

Edit your config.toml file with the following configuration:

[fine_tuner]
# Use a small, efficient model for this tutorial
base_model_id = "unsloth/Qwen2.5-0.5B-Instruct-bnb-4bit"
max_sequence_length = 2048

# Memory-efficient settings
load_in_4bit = true
load_in_8bit = false
full_finetuning = false

# LoRA configuration - conservative settings for first run
rank = 16
lora_alpha = 16
lora_dropout = 0.1
target_modules = ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"]
bias = "none"

# Your dataset
training_data_id = "your-username/programming-qa-dataset"
validation_data_id = "null"  # No validation set for this tutorial

# Dataset configuration
dataset_num_proc = 2
question_column = "question"
ground_truth_column = "answer"
system_prompt_column = "null"
system_prompt_override_text = "You are a helpful programming tutor. Provide clear and concise explanations."

# Training parameters - short training for tutorial
epochs = 2
learning_rate = 0.0003
device_train_batch_size = 2
device_validation_batch_size = 2
grad_accumulation = 4

# Optimization settings
warmup_steps = 10
optimizer = "paged_adamw_8bit"
weight_decay = 0.01
lr_scheduler_type = "linear"
seed = 42

# Logging and saving
log_steps = 5
log_first_step = true
save_steps = 50
save_total_limit = 2
push_to_hub = true

# Weights & Biases
wandb_project_name = "programming-qa-tutorial"
report_to = "wandb"

# Run naming
run_name = "null"
run_name_prefix = "programming-qa-"
run_name_suffix = ""

# Advanced settings
packing = false
use_gradient_checkpointing = "unsloth"
use_flash_attention = true
use_rslora = false
loftq_config = "null"

# Response-only training for chat models
train_on_responses_only = true
question_part = "<|im_start|>user\n"
answer_part = "<|im_start|>assistant\n"

Configuration Explanation¶

Parameter	Value	Explanation
`base_model_id`	`unsloth/Qwen2.5-0.5B-Instruct-bnb-4bit`	Small model for quick training
`epochs`	`2`	Short training for tutorial
`rank`	`16`	Moderate LoRA rank for good quality
`device_train_batch_size`	`2`	Small batch size for memory efficiency
`system_prompt_override_text`	Custom prompt	Specialized for programming help

Step 3: Run Fine-Tuning¶

Now let's start the fine-tuning process:

# Navigate to your project directory
cd Fine-Tune-Pipeline

# Ensure dependencies are synced
uv sync

# Run fine-tuning with your API keys
uv run app/finetuner.py --hf-key "your_hf_token" --wandb-key "your_wandb_key"

What to Expect¶

The training should take approximately 10-15 minutes on a modern GPU. You'll see output like:

--- ✅ Login to Hugging Face Hub successful. ---
--- ✅ Training dataset loaded: your-username/programming-qa-dataset ---
--- ✅ No validation dataset provided. Skipping validation. ---
--- ✅ Model and tokenizer loaded successfully. ---
--- ✅ Data preprocessing completed. ---
Run name set to: programming-qa-20250629-143022
--- ✅ Weights & Biases setup completed. ---
--- ✅ Trainer initialized successfully. ---
--- ✅ Starting training... ---

Training Progress:
Epoch 1/2: ██████████ 100% | Loss: 1.234
Epoch 2/2: ██████████ 100% | Loss: 0.987

--- ✅ Training completed successfully ---
--- ✅ Model saved to ./models/fine_tuned locally ---
--- ✅ Model uploaded to Hugging Face Hub ---

Step 4: Test Your Fine-Tuned Model¶

Let's test the model with inference:

4.1 Create Test Data¶

Create a test file test_questions.jsonl:

{"question": "What is a loop in programming?", "answer": ""}
{"question": "What is recursion?", "answer": ""}
{"question": "What is a data structure?", "answer": ""}

4.2 Update Inferencer Configuration¶

Add this section to your config.toml:

[inferencer]
max_sequence_length = 2048
dtype = "null"
load_in_4bit = true
load_in_8bit = false

# Your test dataset
testing_data_id = "your-username/test-questions"  # Upload test_questions.jsonl first

# Column configuration
question_column = "question"
ground_truth_column = "answer"
system_prompt_column = "null"
system_prompt_override_text = "You are a helpful programming tutor. Provide clear and concise explanations."

# Generation parameters
max_new_tokens = 150
use_cache = true
temperature = 0.7
min_p = 0.1

# Model location
hf_user_id = "your-username"
run_name = "null"  # Will use the latest model

4.3 Run Inference¶

# Upload test dataset first
huggingface-cli upload your-username/test-questions test_questions.jsonl

# Run inference
uv run app/inferencer.py --hf-key "your_hf_token"

Step 5: Evaluate Results¶

Let's evaluate how well our model performs:

5.1 Prepare Ground Truth¶

Create ground_truth.jsonl with expected answers:

{"question": "What is a loop in programming?", "answer": "A loop is a programming construct that repeats a block of code multiple times until a certain condition is met."}
{"question": "What is recursion?", "answer": "Recursion is a programming technique where a function calls itself to solve a problem by breaking it down into smaller, similar subproblems."}
{"question": "What is a data structure?", "answer": "A data structure is a way of organizing and storing data in a computer so that it can be accessed and manipulated efficiently."}

5.2 Configure Evaluator¶

Add this to your config.toml:

[evaluator]
# Evaluation metrics
metrics = ["bleu_score", "rouge_score", "semantic_similarity"]

# LLM for evaluation (optional, remove if no OpenAI key)
llm = "gpt-4o-mini"
embedding = "text-embeddings-3-small"

# Run configuration
run_name = "null"
run_name_prefix = "eval-programming-qa-"
run_name_suffix = ""

5.3 Run Evaluation¶

# Run evaluation (with OpenAI key for LLM-based metrics)
uv run app/evaluator.py --openai-key "your_openai_key"

# Or without LLM-based metrics (only BLEU and ROUGE)
uv run app/evaluator.py

Step 6: Review Results¶

Training Metrics¶

Weights & Biases Dashboard:
Go to wandb.ai
Open your project: programming-qa-tutorial
Review loss curves, learning rate, and system metrics
Local Model:
Model saved in ./models/fine_tuned/
Hugging Face Hub: your-username/programming-qa-YYYYMMDD-HHMMSS

Inference Results¶

Check inferencer_output.jsonl:

cat inferencer_output.jsonl

Expected output:

{"question": "What is a loop in programming?", "answer": "A loop is a programming construct that allows you to repeat a block of code multiple times...", "metadata": {...}}

Evaluation Metrics¶

Check the evaluation results:

# Summary metrics
cat evaluator_output_summary.json

# Detailed results
# Open evaluator_output_detailed.xlsx in Excel/LibreOffice

Understanding the Results¶

Training Loss¶

Good Training: Loss should decrease over epochs
Overfitting: Loss stops decreasing or increases
Underfitting: Loss remains high throughout training

BLEU Score¶

Range: 0-100 (higher is better)
Good Score: 20+ for this task
Interpretation: Measures word overlap with reference

ROUGE Score¶

Range: 0-1 (higher is better)
Good Score: 0.2+ for this task
Interpretation: Measures summary quality

Semantic Similarity¶

Range: 0-1 (higher is better)
Good Score: 0.7+ for this task
Interpretation: Meaning similarity using embeddings

Next Steps¶

Congratulations! You've completed your first fine-tuning project. Here's what you can do next:

Improve the Model¶

More Data: Add more diverse programming questions
Longer Training: Increase epochs (3-5)
Higher LoRA Rank: Try rank=32 for better quality
Validation Set: Add validation data to monitor overfitting

Advanced Techniques¶

Advanced Configuration: Explore more options
CI/CD Integration: Automate your pipeline
Multi-GPU Training: Scale to larger models and datasets

Experiment with Different Tasks¶

Text Summarization: Train on summarization datasets
Code Generation: Fine-tune for programming tasks
Translation: Create multilingual models
Classification: Adapt for classification tasks

Troubleshooting¶

Common Issues¶

Out of Memory Error:

# Reduce batch size and sequence length
device_train_batch_size = 1
max_sequence_length = 1024

Slow Training:

# Enable optimizations
packing = true
use_flash_attention = true

Poor Quality Results:

# Increase model capacity
rank = 32
lora_alpha = 32
epochs = 3

Dataset Loading Issues: - Check dataset format (JSONL) - Verify column names match configuration - Ensure dataset is public or you have access

Conclusion¶

You've successfully:

✅ Created and uploaded a custom dataset
✅ Configured the fine-tuning pipeline
✅ Fine-tuned a language model
✅ Generated predictions with inference
✅ Evaluated model performance

This foundation will help you tackle more complex fine-tuning projects. The same process scales to larger models, datasets, and more sophisticated tasks.