Skip to content

Instantly share code, notes, and snippets.

Group Relative Policy Optimization (GRPO): A Comprehensive Guide

Group Relative Policy Optimization (GRPO) is an innovative reinforcement learning algorithm aimed at enhancing large language models (LLMs) for reasoning tasks. This guide explains the GRPO process with detailed diagrams and step-by-step explanations.


Main GRPO Workflow

The core GRPO process is depicted as a circular workflow with five key stages:

@0xabad1dea
0xabad1dea / copilot-risk-assessment.md
Last active June 26, 2025 22:23
Risk Assessment of GitHub Copilot

Risk Assessment of GitHub Copilot

0xabad1dea, July 2021

this is a rough draft and may be updated with more examples

GitHub was kind enough to grant me swift access to the Copilot test phase despite me @'ing them several hundred times about ICE. I would like to examine it not in terms of productivity, but security. How risky is it to allow an AI to write some or all of your code?

Ultimately, a human being must take responsibility for every line of code that is committed. AI should not be used for "responsibility washing." However, Copilot is a tool, and workers need their tools to be reliable. A carpenter doesn't have to

@alisdair
alisdair / intensify.sh
Created May 21, 2019 23:44
intensifies Slack emoji creator
#!/bin/bash
# Generate a `:something-intensifies:` Slack emoji, given a reasonable image
# input. I recommend grabbing an emoji from https://emojipedia.org/
set -euo pipefail
# Number of frames of shaking
count=10
# Max pixels to move while shaking