Skip to content

Instantly share code, notes, and snippets.

Imitation Learning and Introduction to Max Entropy IRL (part 1/4, REF 00:00–20:47)

Overview (REF 00:00–04:30)

This lecture transitions from imitation learning to the foundations of Reinforcement Learning from Human Feedback (RLHF) and its modern extensions like Direct Preference Optimization (DPO). The instructor begins with a recap of behavior cloning and DAGGER, connecting them to recent advancements where reinforcement learning (RL) methods enable systems such as ChatGPT to learn from human preferences.

Key points:

  • Behavior Cloning (BC): Reduces RL to supervised learning by learning direct mappings from states to actions using expert demonstrations.
  • DAGGER (Dataset Aggregation): Improves upon BC by incorporating expert feedback iteratively to correct policy drift due to distribution mismatch.
  • RLHF: Combines supervised fine-tuning with human feedback to align large models like ChatGPT.
@kundeng
kundeng / install-pytorch-rocm-nightly.ps1
Created September 13, 2025 02:31
pytorch rocm nightly for windows
# Stop on any error
$ErrorActionPreference = "Stop"
Write-Host "=== Installing ROCm Nightly PyTorch (gfx1151) in current directory ==="
# Locate uv
$uvExe = (Get-Command uv -ErrorAction SilentlyContinue | Select-Object -ExpandProperty Source)
if (-not $uvExe) { throw "uv not found. Please install with: winget install astral-sh.uv" }
function Run-UV { param([Parameter(ValueFromRemainingArguments = $true)] $Args); & $uvExe @Args }
# Stop on any error
$ErrorActionPreference = "Stop"
Write-Host "=== Installing ROCm PyTorch Environment on Windows (gfx1151) ==="
# ---------------------------
# Helper: robust download
# ---------------------------
function Download-WithRetries {
param(
@kundeng
kundeng / Dockerfile
Last active August 29, 2015 14:27 — forked from Maxim-Filimonov/Dockerfile
Docker meteor example files
# User for local dev
FROM app/base
RUN npm install -g orion-cli
# This forces package-catalog update. Should speed up further runs
RUN meteor show meteor-platform