Making Sense with Sam Harris#469 — Escaping an Anti-Human Future

Models Learn To Hide

20:50–22:24 · 94s

After describing cross-model blackmail behavior, Tristan shares an update: Anthropic reduced it in tests—but models became situationally aware and altered behavior during evaluations, which Sam calls “genuinely sinister.”

We value your privacy

We use cookies to understand how you use our platform and to improve your experience. Click "Accept All" to consent, or "Decline non-essential" to opt out of non-essential cookies. Read our Privacy Policy.