Dwarkesh PodcastEric Jang – Building AlphaGo from scratch

Why AlphaGo’s Training Is Stable

2:20:28–2:21:27 · 59s

Eric contrasts AlphaGo’s supervised-learning-on-better-labels loop with high-variance RL, explaining how it avoids zero-signal exploration and stays stable as models scale.

We value your privacy

We use cookies to understand how you use our platform and to improve your experience. Click "Accept All" to consent, or "Decline non-essential" to opt out of non-essential cookies. Read our Privacy Policy.