Eric Jang – Building AlphaGo from scratch

5/15/20262 hr 37 min

Eric Jang walks through how to build AlphaGo from scratch, but with modern AI tools.

Sometimes you understand the future better by stepping backward. AlphaGo is still the cleanest worked example of the primitives of intelligence: search, learning from experience, and self-play. You have to go back to 2017 to get insight into how the more general AIs of the future might learn.

Once he explained how AlphaGo works, it gave us the context to have a discussion about how RL works in LLMs and how it could work better – naive policy gradient RL has to figure out which of the 100k+ tokens in your trajectory actually got you the right answer, while AlphaGo’s MCTS suggests a strictly better action every single move, giving you a training target that sidesteps the credit assignment problem. The way humans learn is surely closer to the second.

Eric also kickstarted an Autoresearch loop on his project. And it was very interesting to discuss which parts of AI research LLMs can already automate pretty well (implementing and running experiments, optimizing hyperparameters) and which they still struggle with (choosing the right question to investigate next, escaping research dead ends). Informative to all the recent discussion about when we should expect an intelligence explosion, and what it would look like from the inside.

Watch on YouTube. Read the transcript.

And check out the flashcards I wrote to retain the insights.

Clips

AlphaGo’s Hidden Profundity

From TPUs To $10K Go Bot

ResNets Still Beat Transformers

Search That Teaches Itself

Why AlphaGo’s Training Is Stable

Nash Strategy Beats Personality

RL’s Painful Signal Problem

When MCTS Makes You Worse

Daydreaming Better Moves Offline

Transcript preview

First 90 seconds

Dwarkesh Patel· Host0:00
Today I'm here with Eric Jiang, who was most recently vice president of AI at 1X Technologies, before that senior research scientist at what is now Google DeepMind Robotics, and you've been on sabbatical for the last few months. One of the things you've been doing is rebuilding, and improving, and hacking on AlphaGo. And so we're, today what we're gonna do is you're gonna explain building AlphaGo from scratch and what it tells us about the future of AI research and development. But, uh, before we get to that, why is AlphaGo interesting? Why is this, why is this the project you decided to do on sabbatical rather than just hanging out at the beach?
Eric Jang· Guest0:33
Sure, yeah. Um, I like making things, and AlphaGo and Go AI is one of those things that really got me into the field. Uh, when I saw the kind of early breakthroughs, um, on AlphaGo in 2014, 2015, 2016 and so forth, it was just profound to see, you know, how smart AI systems could become, and the, the kind of computational complexity class that they could tackle with deep learning. Um, this is a problem that has, you know, long been understood to be kind of intractable for a search, and yet, um, it was solved, um, through, through deep learning. And so, so that was quite mysterious to me, and I've always wanted to understand that phenomena a little bit better. My training is often in deep neural nets for robotics, where it's, uh, the, the decisions made by the neural networks are a bit more intuitive, but AlphaGo is a sort of problem where the, the decisions are actually the result of a very, very deep search. And it's always been very mysterious to me how, like, a 10-layer network can sort of-

Clips

Transcript preview

We value your privacy