Skip to content

Instantly share code, notes, and snippets.

View lewtun's full-sized avatar
🤠

lewtun

🤠
View GitHub Profile
"""
Simple RL training script for teaching a model to add.
Demonstrates REINFORCE and GRPO algorithms in a minimal implementation.
If you want to run this script, put it inside of nanochat/scripts/ and run it with:
python -m scripts.simple_rl
First add "matplotlib>=3.9.0" to pyproject.toml and run 'uv sync'
I wrote a separate script to download the weights for the model:
@willccbb
willccbb / grpo_demo.py
Last active December 6, 2025 04:07
GRPO Llama-1B
# train_grpo.py
#
# See https://github.com/willccbb/verifiers for ongoing developments
#
"""
citation:
@misc{brown2025grpodemo,
title={Granular Format Rewards for Eliciting Mathematical Reasoning Capabilities in Small Language Models},
author={Brown, William},