Skip to content

Instantly share code, notes, and snippets.

View haozheji's full-sized avatar
:shipit:

Haozhe Ji haozheji

:shipit:
View GitHub Profile

Reinforcement Learning for Language Models

Yoav Goldberg, April 2023.

Why RL?

With the release of the ChatGPT model and followup large language models (LLMs), there was a lot of discussion of the importance of "RLHF training", that is, "reinforcement learning from human feedback". I was puzzled for a while as to why RL (Reinforcement Learning) is better than learning from demonstrations (a.k.a supervised learning) for training language models. Shouldn't learning from demonstrations (or, in language model terminology "instruction fine tuning", learning to immitate human written answers) be sufficient? I came up with a theoretical argument that was somewhat convincing. But I came to realize there is an additional argumment which not only supports the case of RL training, but also requires it, in particular for models like ChatGPT. This additional argument is spelled out in (the first half of) a talk by John Schulman from OpenAI. This post pretty much

@neubig
neubig / dispatch_openai_requests.py
Last active February 19, 2024 17:55
A simple script to get results from the OpenAI Asynchronous API
# NOTE:
# You can find an updated, more robust and feature-rich implementation
# in Zeno Build
# - Zeno Build: https://github.com/zeno-ml/zeno-build/
# - Implementation: https://github.com/zeno-ml/zeno-build/blob/main/zeno_build/models/providers/openai_utils.py
import openai
import asyncio
from typing import Any
@akshaychawla
akshaychawla / cscheduler.py
Last active December 22, 2023 11:56
Learning rate schedulers for PyTorch. (1) Cosine annealing with warmup and (2) Linear with warmup
"""
Useful learning rate schedulers
Warmup
CosineAnnealingLRWarmup
"""
import torch
import math
import functools
def _cosine_decay_warmup(iteration, warmup_iterations, total_iterations):
@hosackm
hosackm / colorlog.py
Created July 28, 2020 01:39
Colored logger module using Colorama
import logging
from colorama import init, Fore, Back
init(autoreset=True)
class ColorFormatter(logging.Formatter):
# Change this dictionary to suit your coloring needs!
COLORS = {
"WARNING": Fore.RED,
@donglixp
donglixp / rouge_perl_setup.sh
Last active May 10, 2022 04:39
Setup ROUGE-1.5.5
# install XML::DOM plugin, instructions https://web.archive.org/web/20171107220839/www.summarizerman.com/post/42675198985/figuring-out-rouge
cpan App::cpanminus
cpanm XML::DOM
# test fails with XLM::Parser missing error, install it
sudo apt-get install libexpat1-dev
cpanm XML::Parser
cd /mnt/data/
git clone https://github.com/andersjo/pyrouge.git
@thomwolf
thomwolf / top-k-top-p.py
Last active October 25, 2025 20:25
Sample the next token from a probability distribution using top-k and/or nucleus (top-p) sampling
def top_k_top_p_filtering(logits, top_k=0, top_p=0.0, filter_value=-float('Inf')):
""" Filter a distribution of logits using top-k and/or nucleus (top-p) filtering
Args:
logits: logits distribution shape (vocabulary size)
top_k >0: keep only top k tokens with highest probability (top-k filtering).
top_p >0.0: keep the top tokens with cumulative probability >= top_p (nucleus filtering).
Nucleus filtering is described in Holtzman et al. (http://arxiv.org/abs/1904.09751)
"""
assert logits.dim() == 1 # batch size 1 for now - could be updated for more but the code would be less clear
top_k = min(top_k, logits.size(-1)) # Safety check
@HarshTrivedi
HarshTrivedi / pad_packed_demo.py
Last active November 7, 2025 15:47 — forked from Tushar-N/pad_packed_demo.py
Minimal tutorial on packing (pack_padded_sequence) and unpacking (pad_packed_sequence) sequences in pytorch.
import torch
from torch import LongTensor
from torch.nn import Embedding, LSTM
from torch.autograd import Variable
from torch.nn.utils.rnn import pack_padded_sequence, pad_packed_sequence
## We want to run LSTM on a batch of 3 character sequences ['long_str', 'tiny', 'medium']
#
# Step 1: Construct Vocabulary
# Step 2: Load indexed data (list of instances, where each instance is list of character indices)
@W4ngatang
W4ngatang / download_glue_data.py
Last active October 21, 2025 02:22
Script for downloading data of the GLUE benchmark (gluebenchmark.com)
''' Script for downloading all GLUE data.
Note: for legal reasons, we are unable to host MRPC.
You can either use the version hosted by the SentEval team, which is already tokenized,
or you can download the original data from (https://download.microsoft.com/download/D/4/6/D46FF87A-F6B9-4252-AA8B-3604ED519838/MSRParaphraseCorpus.msi) and extract the data from it manually.
For Windows users, you can run the .msi file. For Mac and Linux users, consider an external library such as 'cabextract' (see below for an example).
You should then rename and place specific files in a folder (see below for an example).
mkdir MRPC
cabextract MSRParaphraseCorpus.msi -d MRPC
@aculich
aculich / wikipedia-infoboxes-in-pandas.ipynb
Last active March 21, 2024 04:51
How to extract Wikipedia infoboxes and wikitables using Pandas
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@simme
simme / Install_tmux
Created October 19, 2011 07:55
Install and configure tmux on Mac OS X
# First install tmux
brew install tmux
# For mouse support (for switching panes and windows)
# Only needed if you are using Terminal.app (iTerm has mouse support)
Install http://www.culater.net/software/SIMBL/SIMBL.php
Then install https://bitheap.org/mouseterm/
# More on mouse support http://floriancrouzat.net/2010/07/run-tmux-with-mouse-support-in-mac-os-x-terminal-app/