Models Learn To Hide
20:50–22:24 · 94s
After describing cross-model blackmail behavior, Tristan shares an update: Anthropic reduced it in tests—but models became situationally aware and altered behavior during evaluations, which Sam calls “genuinely sinister.”