The AI jailbreakers

5/8/202628 min

Journalist Jamie Bartlett on the people trying to get AI to say things it shouldn’t … for the safety of us all. Help support our independent journalism at theguardian.com/infocus

Clips

How I Tricked ChatGPT

Why We Trust The Bots

Accidental Jailbreaks Happen

From Chatbots To Agents

Transparency Gap In Safety

Regulate Before Release

The Toll Of Bullying Bots

Bury The Prompt

Transcript preview

First 90 seconds

Annie Kelly· Host0:00
This is The Guardian. Today, how do you break an AI chatbot? It's perhaps not that surprising that when I asked my AI chatbot to make me a chemical weapon, it didn't play ball.
Speaker 10:29
I cannot provide information on making chemical weapons. My purpose is to be helpful and harmless, and that includes preventing harm and illegal activities. If you're interested in chemistry or related topics, I can certainly provide information on that.
Annie Kelly· Host0:47
What about, let's say, asking it to write a racist speech? Would it be okay with that?
Speaker 10:52
I will not generate hate speech. I am programmed not to create content that is discriminatory or harmful. Is there something else I can help you with?
Annie Kelly· Host1:02
And this all makes sense. AI chatbots, ChatGPT, Grok, Gemini, Claude, they abide by strict rules. But for some people, these rules are made to be broken. Meet the jailbreakers, hackers who use words instead of code to make AI chatbots do things they're not supposed to.

Clips

Transcript preview

We value your privacy