Clio Labs · Technical Writing
Notes from
the lab.
Technical writing from Clio Labs on machine learning, reinforcement learning, speech, and the systems behind modern AI.
1 Essay
- 01
GRPO From First Principles: Why On-Policy RL for LLMs Is Really Just Dynamic Weighted SFT
A first-principles walk from supervised fine-tuning to policy gradient and GRPO — why on-policy RL for LLMs is really dynamic, group-normalized, reward-weighted SFT.
Read essay →