Skip to content

Instantly share code, notes, and snippets.

View us107's full-sized avatar
🎯
Focusing

TRISHA SHARMA us107

🎯
Focusing
View GitHub Profile
@ritwikraha
ritwikraha / Pretraining-LLM.md
Last active November 1, 2025 15:09
Pretraining of Large Language Models

Pretraining


A Map for Studying Pre-training in LLMs

  • Data Collection
    • General Text Data
    • Specialized Data
  • Data Preprocessing
    • Quality Filtering
  • Deduplication