Gist: https://gist.github.com/evacchi/7afc3613407597287644ea89b83fd716 Colab: https://colab.research.google.com/gist/evacchi/7afc3613407597287644ea89b83fd716/blockkit-gemma3-finetune.ipynb
Specs:
- Parameters: 270 million
- Memory: ~2 GB
- Speed: Very fast inference
- Quality: Good for simple structured outputs
Training Config:
- Quantization: None (full precision)
- LoRA rank: 128
- Batch size: 4
- Gradient accumulation: 2
- Effective batch size: 8
- Epochs: 2
- Learning rate: 2e-4
- Training time: ~15 minutes on T4
Use Cases: ✅ Fast prototyping ✅ Resource-constrained environments ✅ Simple JSON generation tasks ✅ Edge deployment
Gist: https://gist.github.com/evacchi/5ad3e771234d9a305c942c92eeda1f90 Colab: https://colab.research.google.com/gist/evacchi/5ad3e771234d9a305c942c92eeda1f90/blockkit-gemma3-1b-finetune.ipynb
Specs:
- Parameters: 1 billion
- Memory: ~6 GB (with 4-bit quantization)
- Speed: Fast inference
- Quality: Better understanding and generation
Training Config:
- Quantization: 4-bit (for memory efficiency)
- LoRA rank: 64
- Batch size: 2
- Gradient accumulation: 4
- Effective batch size: 8 (same as 270M)
- Epochs: 2
- Learning rate: 2e-4
- Training time: ~25-30 minutes on T4
Use Cases: ✅ Better quality outputs ✅ More complex JSON structures ✅ Better handling of edge cases ✅ More robust to varied inputs
| Feature | 270M | 1B |
|---|---|---|
| Model Size | unsloth/gemma-3-270m-it | unsloth/gemma-3-1b-it |
| Quantization | None | 4-bit |
| LoRA Rank | 128 | 64 |
| Batch Size | 4 | 2 |
| Memory Usage | ~2-3 GB | ~6-7 GB |
| Training Speed | Faster | Slower |
| Output Quality | Good | Better |
| Inference Speed | Very Fast | Fast |
load_in_4bit = False # No quantization needed
r = 128 # Higher LoRA rank
per_device_train_batch_size = 4load_in_4bit = True # Required for free Colab
r = 64 # Lower rank to save memory
per_device_train_batch_size = 2Choose 270M if:
- You need fast inference
- You're deploying on edge devices
- Training time is critical
- Your task is straightforward JSON generation
Choose 1B if:
- You want better quality outputs
- You can afford slightly slower inference
- Your task involves complex reasoning
- You need better handling of varied inputs
Both models use the same training data:
- File:
blockkit-training-1k.jsonl - Examples: 992
- Format: ISO timestamps + natural language
- Size: 5.7 MB
Both configurations are optimized for free Tesla T4 GPUs:
- 270M: Uses ~40% GPU memory
- 1B: Uses ~80% GPU memory (with 4-bit quantization)
Both should complete training in under 1 hour on free tier.