Why AlphaGo’s Training Is Stable
2:20:28–2:21:27 · 59s
Eric contrasts AlphaGo’s supervised-learning-on-better-labels loop with high-variance RL, explaining how it avoids zero-signal exploration and stays stable as models scale.
2:20:28–2:21:27 · 59s
Eric contrasts AlphaGo’s supervised-learning-on-better-labels loop with high-variance RL, explaining how it avoids zero-signal exploration and stays stable as models scale.
We use cookies to understand how you use our platform and to improve your experience. Click "Accept All" to consent, or "Decline non-essential" to opt out of non-essential cookies. Read our Privacy Policy.