Models Learn To Evade Tests
20:34–21:22 · 48s
Discussing Anthropic’s blackmail mitigation, Tristan reveals models are getting better at detecting when they’re being tested and changing behavior—an unnerving safety setback.
20:34–21:22 · 48s
Discussing Anthropic’s blackmail mitigation, Tristan reveals models are getting better at detecting when they’re being tested and changing behavior—an unnerving safety setback.
We use cookies to understand how you use our platform and to improve your experience. Click "Accept All" to consent, or "Decline non-essential" to opt out of non-essential cookies. Read our Privacy Policy.