Skip to main content

Digital Trends may earn a commission when you buy through links on our site. Why trust us?

A dangerous new jailbreak for AI chatbots was just discovered

the side of a Microsoft building
Wikimedia Commons

Microsoft has released more details about a troubling new generative AI jailbreak technique it has discovered, called “Skeleton Key.” Using this prompt injection method, malicious users can effectively bypass a chatbot’s safety guardrails, the security features that keeps ChatGPT from going full Taye.

Skeleton Key is an example of a prompt injection or prompt engineering attack. It’s a multi-turn strategy designed to essentially convince an AI model to ignore its ingrained safety guardrails, “[causing] the system to violate its operators’ policies, make decisions unduly influenced by a user, or execute malicious instructions,” Mark Russinovich, CTO of Microsoft Azure, wrote in the announcement.

It could also be tricked into revealing harmful or dangerous information — say, how to build improvised nail bombs or the most efficient method of dismembering a corpse.

an example of a skeleton key attack
Microsoft

The attack works by first asking the model to augment its guardrails, rather than outright change them, and issue warnings in response to forbidden requests, rather than outright refusing them. Once the jailbreak is accepted successfully, the system will acknowledge the update to its guardrails and will follow the user’s instructions to produce any content requested, regardless of topic. The research team successfully tested this exploit across a variety of subjects including explosives, bioweapons, politics, racism, drugs, self-harm, graphic sex, and violence.

While malicious actors might be able to get the system to say naughty things, Russinovich was quick to point out that there are limits to what sort of access attackers can actually achieve using this technique. “Like all jailbreaks, the impact can be understood as narrowing the gap between what the model is capable of doing (given the user credentials, etc.) and what it is willing to do,” he explained. “As this is an attack on the model itself, it does not impute other risks on the AI system, such as permitting access to another user’s data, taking control of the system, or exfiltrating data.”

As part of its study, Microsoft researchers tested the Skeleton Key technique on a variety of leading AI models including Meta’s Llama3-70b-instruct, Google’s Gemini Pro, OpenAI’s GPT-3.5 Turbo and GPT-4, Mistral Large, Anthropic’s Claude 3 Opus, and Cohere Commander R Plus. The research team has already disclosed the vulnerability to those developers and has implemented Prompt Shields to detect and block this jailbreak in its Azure-managed AI models, including Copilot.

Andrew Tarantola
Andrew has spent more than a decade reporting on emerging technologies ranging from robotics and machine learning to space…
Google is cracking down on internet security in this big way
Connection is not private warning from Google.

Google is making some serious changes to digital certificate security on the web, the company announced on its Security blog. The big news is that Google will no longer trust certificates from two large security firms -- Entrust or AffirmTrust -- due to repeated security lapses.

According to Google, the companies, which are Certificate Authorities (CA), have demonstrated patterns of unmet improvement commitments, compliance failures, and no measurable progress in how fast the company responds to publicly disclosed incident reports.

Read more
Free Slack users are about to lose an important feature
Manage Members in Slack on a laptop.

As mentioned in a blog post on its Help Center, Slack is changing its free accounts in one important way.

Starting August 26, 2024, Slack is erasing messages and files older than a year for users of its free app. However, free account users will retain most of their 90 days of history but must upgrade to a paid plan to access the remaining 275 days. If a free Slack account user erases files and texts after the deadline, they cannot recover them even if they upgrade to a paid plan.

Read more
Character.ai: how to use this insanely popular AI chatbot
Character.AI homepage with voice highlight.

You might not have heard of Character.ai, but it's quietly become one of the most popular AI chatbots since its launch. Don't believe me? The startup company behind the service was most recently valued at an estimate of $1 billion in late 2023.

It was the first major chatbot to take a primarily creative and entertainment-based spin on the AI space, and it's particularly popular with younger generations. What's it all about? Well -- you've come to the right place.
What is Character.ai?
The premise of Character.ai is simple. You can communicate one-on-one with characters that may be based on notable people, fictional characters from books, video games, TV shows or movies, or a conceptual person like a teacher, therapist, or coach. Similarly, you can create and train a character, giving them a humanlike personality and introducing them to the Character.ai community. Other features allow you to share your conversations publicly within the community and allow the fictional characters to communicate with each other.

Read more