🤠

lewtun

🤠

Cowboy post-training @ Hugging Face

1.5k followers · 0 following

View GitHub Profile

Recently created

Least recently created

Recently updated

Least recently updated

dwarkeshsp / nanochat_simple_rl.py

Created October 21, 2025 00:13

	"""
	Simple RL training script for teaching a model to add.
	Demonstrates REINFORCE and GRPO algorithms in a minimal implementation.

	If you want to run this script, put it inside of nanochat/scripts/ and run it with:
	python -m scripts.simple_rl

	First add "matplotlib>=3.9.0" to pyproject.toml and run 'uv sync'

	I wrote a separate script to download the weights for the model:

willccbb / grpo_demo.py

Last active March 8, 2026 10:23

GRPO Llama-1B

	# train_grpo.py
	#
	# See https://github.com/willccbb/verifiers for ongoing developments
	#
	"""
	citation:

	@misc{brown2025grpodemo,
	title={Granular Format Rewards for Eliciting Mathematical Reasoning Capabilities in Small Language Models},
	author={Brown, William},