Quick Start Guide¶

This guide will walk you through running your first fine-tuning job with the Fine-Tune Pipeline in just a few minutes.

Prerequisites¶

Before you begin, make sure you have:

✅ Set up your environment with API keys
✅ Configured your MLFlow server to log experiments

Step 1: Navigate to the Github repository and branch¶

First, go to the GitHub repository of the pipeline and switch to the branch which aligns with the model you are trying to fine-tune. For example, if you are working with the Qwen2.5 model, switch to the lora-qwen2.5 branch.

If such a branch does not exist, make a branch from the lora-dev branch and name it according to the model you are working with, e.g., lora-model_XYZ.

Step 2: Understanding the Default Configuration¶

In the files, you will find the config.toml file. The pipeline comes with a pre-configured setup in config.toml. Let's look at the key settings:

[fine_tuner]
# Model settings
base_model_id = "unsloth/Qwen2.5-0.5B-Instruct-bnb-4bit"
max_sequence_length = 4096

# Training data
training_data_id = "your-huggingface-username/your-training-dataset"
validation_data_id = "your-huggingface-username/your-validation-dataset"  # Optional

# Training parameters
epochs = 3
learning_rate = 0.0002
device_train_batch_size = 4

[inferencer]
# Model settings
max_sequence_length = 4096
max_new_tokens = 512
temperature = 0.7
min_p = 0.1

# Hugging Face user ID
hf_user_id = "your-huggingface-username"

[evaluator]
# Metrics settings
metrics = ["bleu_score", "rouge_score", "factual_correctness"]

# Hugging Face user ID
hf_user_id = "your-huggingface-username"

[mlflow]
# MLflow settings
tracking_uri = "https://your-mlflow-tracking-uri"
experiment_name = "your-experiment-name"
run_name = "0.0.1"  # Increment this for each run

First Run Recommendation

The default configuration is designed for a quick first run. It uses a small model and dataset that should complete training in 10-15 minutes on a modern GPU.

Step 3: Run Your First Fine-Tuning Job¶

Make a small change to the config.toml file. For example, bump the run_name under [MLFLOW] section by 0.0.1.

This will trigger the pipeline to run. This will consist of 3 stages: fine-tuning, inference, and evaluation.

1. Fine-tuning¶

1.1 What Happens During Fine Tuning¶

Model Loading: Downloads and loads the base model (Qwen2.5-0.5B)
Data Processing: Downloads and processes the training dataset
LoRA Setup: Configures Low-Rank Adaptation for efficient fine-tuning
Training: Runs 3 epochs of training with progress tracking
Saving: Saves the model locally and pushes to Hugging Face Hub

1.2 Expected Output¶

You should see a final output similar to this in github actions:

--- ✅ Fine-tuning completed successfully. ---

2. Inference¶

2.1 What Happens During Inferencing¶

After training, the pipeline will automatically run inference. This involves:

Model Loading: Loads the fine-tuned model
Data Preparation: Downloads and processes the test dataset for inference
Inference Execution: Runs inference with the configured parameters in config.toml
Output Generation: Saves results in JSONL format
Pushing Results: Uploads inference results to Hugging Face Hub

2.2 Expected Output¶

You should see a final output similar to this in github actions:

--- ✅ Inference completed successfully. ---

3. Evaluation¶

3.1 What Happens During Evaluation¶

After inference, the pipeline will automatically run evaluation. This includes:

Loading Results: Loads the inference output
Evaluation Metrics: Computes various metrics like Factual Correctness, Answer Accuracy, and more with RAGAS
Reporting: Generates detailed reports in Excel and JSON formats
Logging: Saves evaluation metrics to MLflow
Pushing Results: Uploads evaluation results to Hugging Face Hub

3.2 Expected Output¶

You should see a final output similar to this in github actions:

--- ✅ Evaluation completed successfully. ---

Next Steps¶

Congratulations! 🎉 You've successfully run your first fine-tuning pipeline. Here's what you can do next:

Customize Your Training¶

Use Your Own Data: Replace training_data_id, testing_data_id with your datasets
Try Different Models: Experiment with larger models like Llama, Gemma by changing base_model_id
Adjust Hyperparameters: Modify learning rate, batch size, epochs etc.
Explore Advanced Features: Check out the Advanced Configuration guide

Troubleshooting¶

If you encounter issues:

Check the Troubleshooting Guide
Verify your API keys are correct
Ensure you have sufficient GPU memory
Check the console output for specific error messages

Common First-Run Issues¶

Out of Memory

If you get CUDA out of memory errors, reduce the batch size:

device_train_batch_size = 2  # Reduce from 4
grad_accumulation = 8        # Increase to maintain effective batch size

Dataset Not Found

If the dataset fails to load, check: - Your internet connection - The dataset ID is correct - You have access to the dataset (some require authentication)

Training Too Slow

For faster training:

learning_rate = 0.0005  # Increase learning rate
epochs = 2              # Reduce number of epochs
device_train_batch_size = 8  # Increase batch size if GPU allows

Happy fine-tuning! 🚀

Quick Start Guide¶

Prerequisites¶

Step 1: Navigate to the Github repository and branch¶

Step 2: Understanding the Default Configuration¶

Step 3: Run Your First Fine-Tuning Job¶

1. Fine-tuning¶

1.1 What Happens During Fine Tuning¶

1.2 Expected Output¶

2. Inference¶

2.1 What Happens During Inferencing¶

2.2 Expected Output¶

3. Evaluation¶

3.1 What Happens During Evaluation¶

3.2 Expected Output¶

Next Steps¶

Customize Your Training¶

See Also¶

Troubleshooting¶

Common First-Run Issues¶