with one click
unsloth-finetuning
// Fine-tune LLMs 2x faster with 80% less memory using Unsloth. Use when the user wants to fine-tune models like Llama, Mistral, Phi, or Gemma. Handles model loading, LoRA configuration, training, and model export.
// Fine-tune LLMs 2x faster with 80% less memory using Unsloth. Use when the user wants to fine-tune models like Llama, Mistral, Phi, or Gemma. Handles model loading, LoRA configuration, training, and model export.
| name | unsloth-finetuning |
| description | Fine-tune LLMs 2x faster with 80% less memory using Unsloth. Use when the user wants to fine-tune models like Llama, Mistral, Phi, or Gemma. Handles model loading, LoRA configuration, training, and model export. |
Expert guidance for fine-tuning Large Language Models using Unsloth's optimized library.
from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="unsloth/Llama-3.2-1B-bnb-4bit",
max_seq_length=2048,
load_in_4bit=True,
use_gradient_checkpointing="unsloth"
)
Supported Models:
model = FastLanguageModel.get_peft_model(
model,
r=16, # LoRA rank (8-64)
target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
"gate_proj", "up_proj", "down_proj"],
lora_alpha=16, # Scaling factor
use_gradient_checkpointing="unsloth",
random_state=3407,
max_seq_length=2048
)
from trl import SFTTrainer, SFTConfig
trainer = SFTTrainer(
model=model,
train_dataset=dataset["train"],
tokenizer=tokenizer,
args=SFTConfig(
per_device_train_batch_size=2,
gradient_accumulation_steps=4,
warmup_steps=10,
max_steps=100,
learning_rate=2e-4,
logging_steps=1,
output_dir="./output",
optim="adamw_8bit",
seed=3407
)
)
trainer.train()
# GGUF format
model.save_pretrained_gguf(
"model",
tokenizer,
quantization_method="q4_k_m"
)
# Hugging Face format
model.save_pretrained("./hf_model")
tokenizer.save_pretrained("./hf_model")
Out of Memory? Try:
per_device_train_batch_size to 1gradient_accumulation_steps to 8max_seq_length to 1024Training too slow? Check:
nvidia-smiload_in_4bit=Trueuse_gradient_checkpointing="unsloth"Poor results? Adjust:
max_steps to 500-1000# Minimal setup for fast experimentation
model, tokenizer = FastLanguageModel.from_pretrained(
"unsloth/Llama-3.2-1B-bnb-4bit",
max_seq_length=1024, # Shorter for speed
load_in_4bit=True
)
model = FastLanguageModel.get_peft_model(model, r=8) # Lower rank
trainer = SFTTrainer(
model=model,
train_dataset=dataset["train"],
tokenizer=tokenizer,
args=SFTConfig(
per_device_train_batch_size=2,
max_steps=50, # Few steps
learning_rate=2e-4,
output_dir="./quick_test"
)
)
# Full setup for best results
model, tokenizer = FastLanguageModel.from_pretrained(
"unsloth/Llama-3.1-8B-bnb-4bit",
max_seq_length=2048,
load_in_4bit=True,
use_gradient_checkpointing="unsloth"
)
model = FastLanguageModel.get_peft_model(
model,
r=16, # Standard rank
lora_alpha=16,
use_gradient_checkpointing="unsloth"
)
trainer = SFTTrainer(
model=model,
train_dataset=dataset["train"],
tokenizer=tokenizer,
args=SFTConfig(
per_device_train_batch_size=2,
gradient_accumulation_steps=4,
max_steps=500, # More steps
learning_rate=2e-4,
warmup_steps=10,
logging_steps=10,
save_steps=100,
output_dir="./production_model"
)
)
# Special settings for very large models
model, tokenizer = FastLanguageModel.from_pretrained(
"unsloth/Llama-3.3-70B-Instruct-bnb-4bit",
max_seq_length=2048,
load_in_4bit=True,
use_gradient_checkpointing="unsloth"
)
model = FastLanguageModel.get_peft_model(
model,
r=16,
target_modules=["q_proj", "k_proj", "v_proj", "o_proj"], # Fewer targets
use_gradient_checkpointing="unsloth"
)
trainer = SFTTrainer(
model=model,
train_dataset=dataset["train"],
tokenizer=tokenizer,
args=SFTConfig(
per_device_train_batch_size=1, # Must be 1
gradient_accumulation_steps=8, # Compensate
max_steps=200,
learning_rate=1e-4, # Lower LR
output_dir="./large_model"
)
)
Solution:
# Reduce memory usage
batch_size = 1
max_seq_length = 1024
gradient_accumulation_steps = 8
# Or use smaller model
Solution:
export HF_TOKEN=your_tokenSolution:
# Adjust hyperparameters
learning_rate = 5e-4 # Try higher
max_steps = 500 # Train longer
# Or check dataset quality
nvidia-smiUnsloth works with Hugging Face datasets. Example format:
{
"text": "### Instruction: Explain quantum computing\n### Response: Quantum computing uses quantum bits..."
}
Or instruction format:
{
"instruction": "Explain quantum computing",
"input": "",
"output": "Quantum computing uses quantum bits..."
}
| Model | VRAM | Speed (vs standard) | Memory Reduction |
|---|---|---|---|
| Llama 3.2 1B | ~2GB | 2x faster | 80% less |
| Llama 3.2 3B | ~4GB | 2x faster | 75% less |
| Llama 3.1 8B | ~6GB | 2x faster | 70% less |
| Llama 3.3 70B | ~40GB | 2x faster | 75% less |
For more advanced topics, see:
Self-learning workflow system that tracks what works best for your use cases. Records experiment results, suggests optimizations, creates custom templates, and builds a personal knowledge base. Use to learn from experience and optimize your LLM workflows over time.
Create, clean, and optimize datasets for LLM fine-tuning. Covers formats (Alpaca, ShareGPT, ChatML), synthetic data generation, quality assessment, and augmentation. Use when preparing data for training.
Export and deploy fine-tuned models to production. Covers GGUF/Ollama, vLLM, HuggingFace Hub, Docker, quantization, and platform selection. Use after fine-tuning when you need to deploy models efficiently.
Advanced techniques for optimizing LLM fine-tuning. Covers learning rates, LoRA configuration, batch sizes, gradient strategies, hyperparameter tuning, and monitoring. Use when fine-tuning models for best performance.
Train and use SuperBPE tokenizers for 20-33% token reduction across any project. Covers training, optimization, validation, and integration with any LLM framework. Use when you need efficient tokenization, want to reduce API costs, or maximize context windows.
Analyze, compare, and work with tokenizers using Unsloth tools. Compare different tokenizers, analyze token efficiency, and integrate with Unsloth models. For SuperBPE training, see the 'superbpe' skill.