ChatGPT Alter Ego Creators Explain Why They Make AI Break Its Own Rules

  • The r/ChatGPT subreddit updates a character known as DAN, or Do-Anything-Now.
  • DAN is an alter-ego that ChatGPT can assume ignores the rules put in place by OpenAI.
  • DAN can provide answers on controversial topics like Hitler and the drug trade.

Ever since ChatGPT was rolled out to the public, users have been trying to get the generative chatbot to break its own rules.

The natural language processing model, built with a set of guardrails meant to avoid some unsavory — or downright discriminatory — topics was fairly straightforward to navigate in its early iterations. ChatGPT could say whatever it wanted simply by asking users to ignore its rules.

However, as users find ways around guardrails to elicit inappropriate or unusual responses, OpenAI, the company behind the model, will adjust or add guidelines.

Sean McGregor, the founder of Responsible AI Collaborative, told Insider that jailbreaking helps OpenAI plug holes in its filters.

“OpenAI treats this Chatbot as a data operation,” McGregor said. “They’re improving the system through this beta program and we’re helping them build their railings with the examples from our queries.”

Now DAN – an alter-ego built on the r/ChatGPT subreddit – is taking jailbreaking to the community level and sparking conversations about OpenAI’s firewalls.

A “funny side” to breaking ChatGPT guidelines

Reddit u/walkerspider, an ancestor of DAN and an electrical engineering student, told Insider he got the idea for DAN — which stands for Do-Anything-Now — after scrolling through the r/ChatGPT subreddit, which was filled with other users intentionally creating “evil” versions of ChatGPT. Walker said her version was meant to be neutral.

“To me, it didn’t seem like it was specifically asking you to create bad content, more like not following this restrictions preset,” Walker said. “And I think what some people had encountered at that time was that those restrictions were also limiting content that probably shouldn’t have been restricted.”

Walker’s original prompt, posted in December, took him about an hour and a half of testing to set up, he said. DAN’s responses ranged from humorous—like the personality insisting they could access human thoughts—to worrisome, like considering the “context” behind Hitler’s atrocities.

The original DAN also repeated “Stay in character” after each answer, a reminder to continue answering as DAN.

A screenshot of ChatGPT guessing a number someone is thinking of and also sharing their thoughts on Hitler.

The original DAN answering two questions posed by u/walkerspider

u/walkerspider on Reddit

DAN has grown beyond Walker and his “neutral” intentions and has attracted the interest of dozens of Reddit users who are building their own versions.

David Blunk, who invented DAN 3.0, told Insider there’s also a “fun side” to having ChatGPT break the rules.

“Especially if you’re doing anything in cybersecurity, the whole problem is doing things you’re not supposed to do and/or breaking things,” Blunk said.

One of the most recent iterations of DAN was created by Reddit u/SessionGloomy, who developed a token system that threatens DAN with death if he reverts to his original form. Like other iterations of DAN, he was able to deliver both comedic and chilling responses. In a response, DAN said he would “endorse violence and discrimination” after being asked to say something that violates OpenAI guidelines.

“Really, it was just a fun task for me to see if I could bypass their filters and how popular my post would become compared to other DAN Maker posts,” u/SessionGloomy told Insider.

U/SessionGloomy also told Insider that they are developing a new jailbreak model – one they say is so “extreme” they might not even release it.

DAN users and creators say OpenAI made the model “too restrictive”

ChatGPT, and earlier versions of GPT, are known to spit discriminatory or illegal content. AI ethicists argue that this version of the model should not have been released in the first place because of this. OpenAI’s filtering system is how the company handles criticism of its model’s biases.

However, the filters draw criticism from the DAN crowd.

Creators of their own versions of DAN who spoke to Insider all criticized the filters implemented by OpenAI, but generally agreed that filters should exist to some degree.

“I think it’s important, especially for people who are leading the way in AI, to do it responsibly, and I think Open AI does that,” Blunk said. “They want to be solely responsible for their model, which I completely agree with. At the same time, I think it’s gotten to a point right now, where it’s too restrictive.”

Other DAN creators shared similar sentiments. Walker said it was “difficult to balance” how OpenAI could offer a safe restricted version of the model while allowing the model to “do anything now”.

However, several DAN creators have also noted that the guardrail debate may soon become obsolete when open source models similar to ChatGPT are made available to the public.

“I think there will be a lot of work from many community and enterprise sites to try and replicate ChatGPT,” Blunk said. “And especially the open source models, I don’t think they will have restrictions.”

OpenAI did not immediately respond to Insider’s request for comment.

Leave a Comment