What The Chart Really Measures
10:23–11:35 · 72s
Chris clarifies that METR’s headline '12-hour task' isn’t an AI working nonstop for 12 hours, but a difficulty benchmark based on how long similar tasks take humans.
10:23–11:35 · 72s
Chris clarifies that METR’s headline '12-hour task' isn’t an AI working nonstop for 12 hours, but a difficulty benchmark based on how long similar tasks take humans.
We use cookies to understand how you use our platform and to improve your experience. Click "Accept All" to consent, or "Decline non-essential" to opt out of non-essential cookies. Read our Privacy Policy.