Skip to content

Instantly share code, notes, and snippets.

@willccbb
willccbb / grpo_demo.py
Last active December 14, 2025 11:02
GRPO Llama-1B
# train_grpo.py
#
# See https://github.com/willccbb/verifiers for ongoing developments
#
"""
citation:
@misc{brown2025grpodemo,
title={Granular Format Rewards for Eliciting Mathematical Reasoning Capabilities in Small Language Models},
author={Brown, William},
@vgel
vgel / r1.py
Last active August 14, 2025 13:13
script to run deepseek-r1 with a min-thinking-tokens parameter, replacing </think> with a random continuation string to extend the model's chain of thought
import argparse
import random
import sys
from transformers import AutoModelForCausalLM, AutoTokenizer, DynamicCache
import torch
parser = argparse.ArgumentParser()
parser.add_argument("question", type=str)
parser.add_argument(
@egeozcan
egeozcan / index.html
Created October 18, 2024 08:40
image viewer
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Fullscreen Image Viewer</title>
<style>
body {
font-family: Arial, sans-serif;
}
@scpedicini
scpedicini / transcribe.py
Last active September 23, 2025 21:15
Python Dictation Transcription Application
# This script will transcribe an audio file (mp3, wav, etc.) to text and then clean the text using a local LLM model via Ollama. Technically, this script will work with any LLM that supports the standard OpenAI bindings with minor adjustments.
# GETTING STARTED:
# 1. Install required python packages (pip install openai python-dotenv)
# 2. Git clone a copy of ggerganov/whisper (https://github.com/ggerganov/whisper.cpp)
# 3. Build the whisper binary (see the whisper.cpp README for instructions)
# 4. Download one of the whisper models (largev2 is the most accurate for all languages, though the base model works reasonably well for English).
# 5. Install ffmpeg (brew install ffmpeg on macOS, apt-get install ffmpeg)
# 6. Install ollama (https://ollama.com/download)
# 7. Download an LLM model (https://ollama.com/library)
@adamsmith
adamsmith / gist:2a22b08d3d4a11fb9fe06531aea4d67c
Created December 23, 2023 01:07
voice-memo transcript → organized markdown text, using LLMs
There are two prompts, that chain together. The first prompt does most of the work, and the second prompt organizes the sections. I found because of the nature of how LLMs write, I couldn't get just one prompt to never jump back and forth in topics.
Prompt 1, which takes as input a raw transcript and generates a structured-text version...
"""# Instructions
A transcript is provided below of a voice memo I recorded as a "note to self". please extract all the points made or thoughts described, and put them in bullet-point form. use nested bullet points to indicate structure, e.g. a top-level bullet for each topic area and sub-bullets underneath. use multi-level nesting as appropriate to organize the thinking logically. use markdown formatting with `*` instead of `-` for bullet points.
DO NOT OMIT ANY POINTS MADE. This is not a summarization task — your only goal is to structure the thoughts there so they are logically organized and easy to read. Be concise because the reader is busy, but again DO NOT omit any
@ChrisHayduk
ChrisHayduk / merge_qlora_with_quantized_model.py
Last active September 27, 2025 08:22
Merging QLoRA weights with quantized model
"""
The code below combines approaches published by both @eugene-yh and @jinyongyoo on Github.
Thanks for the contributions guys!
"""
import torch
import peft
@adrienbrault
adrienbrault / llama2-mac-gpu.sh
Last active April 8, 2025 13:49
Run Llama-2-13B-chat locally on your M1/M2 Mac with GPU inference. Uses 10GB RAM. UPDATE: see https://twitter.com/simonw/status/1691495807319674880?s=20
# Clone llama.cpp
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
# Build it
make clean
LLAMA_METAL=1 make
# Download model
export MODEL=llama-2-13b-chat.ggmlv3.q4_0.bin
@younesbelkada
younesbelkada / finetune_llama_v2.py
Last active July 1, 2025 23:14
Fine tune Llama v2 models on Guanaco Dataset
# coding=utf-8
# Copyright 2023 The HuggingFace Inc. team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software

Reinforcement Learning for Language Models

Yoav Goldberg, April 2023.

Why RL?

With the release of the ChatGPT model and followup large language models (LLMs), there was a lot of discussion of the importance of "RLHF training", that is, "reinforcement learning from human feedback". I was puzzled for a while as to why RL (Reinforcement Learning) is better than learning from demonstrations (a.k.a supervised learning) for training language models. Shouldn't learning from demonstrations (or, in language model terminology "instruction fine tuning", learning to immitate human written answers) be sufficient? I came up with a theoretical argument that was somewhat convincing. But I came to realize there is an additional argumment which not only supports the case of RL training, but also requires it, in particular for models like ChatGPT. This additional argument is spelled out in (the first half of) a talk by John Schulman from OpenAI. This post pretty much

@crucialfelix
crucialfelix / 2023-01-13
Last active April 10, 2025 12:08
Get Chrome history for a single day and create a markdown file summarizing browsing activity
# [[2023-01-13]] log
## URLs
- <strong>www.amazon.de</strong>
- [Prime Video - Video on Demand - Online-Videothek: Filme und Serien online ansehen oder als Einzelabruf online leihen oder kaufen](https://www.amazon.de/Amazon-Video/b/?node=3010075031&ref=atv_surl_aiv&redirectToCMP=1) /Amazon-Video/b/
- <strong>www.youtube.com</strong>
- [Parwal vs Kundru | कुंदरु या परवल | Pointed Gourd Vs Ivy Gourd | Everyday Life # 267 - YouTube](https://www.youtube.com/watch?v=6v4XD9T9-Rg&themeRefresh=1) /watch
- [YouTube](https://www.youtube.com/) /