Clio Labs · Technical Writing

Notes from
the lab.

Technical writing from Clio Labs on machine learning, reinforcement learning, speech, and the systems behind modern AI.

1 Essay

01

May 31, 2026 · GRPO

GRPO From First Principles: Why On-Policy RL for LLMs Is Really Just Dynamic Weighted SFT

A first-principles walk from supervised fine-tuning to policy gradient and GRPO — why on-policy RL for LLMs is really dynamic, group-normalized, reward-weighted SFT.
Read essay →