Connect with us
LIVE

Business

AI’s ability to ‘think’ makes it more vulnerable to new jailbreak attacks, new research suggests

Published

on

AI's ability to 'think' makes it more vulnerable to new jailbreak attacks, new research suggests

New research suggests that advanced AI models may be easier to hack than previously thought, raising concerns about the safety and security of some leading AI models already used by businesses and consumers.

A joint study from Anthropic, Oxford University, and Stanford undermines the assumption that the more advanced a model becomes at reasoning—its ability to “think” through a user’s requests—the stronger its ability to refuse harmful commands.

Using a method called “Chain-of-Thought Hijacking,” the researchers found that even major commercial AI models can be fooled with an alarmingly high success rate, more than 80% in some tests. The new mode of attack essentially exploits the model’s reasoning steps, or chain-of-thought, to hide harmful commands, effectively tricking the AI into ignoring its built-in safeguards.

These attacks can allow the AI model to skip over its safety guardrails and potentially open the door for it to generate dangerous content, such as instructions for building weapons or leaking sensitive information.

A new jailbreak

Over the last year, large reasoning models have achieved much higher performance by allocating more inference-time compute—meaning they spend more time and resources analyzing each question or prompt before answering, allowing for deeper and more complex reasoning. Previous research suggested this enhanced reasoning might also improve safety by helping models refuse harmful requests. However, the researchers found that the same reasoning capability can be exploited to circumvent safety measures.

According to the research, an attacker could hide a harmful request inside a long sequence of harmless reasoning steps. This tricks the AI by flooding its thought process with benign content, weakening the internal safety checks meant to catch and refuse dangerous prompts. During the hijacking, researchers found that the AI’s attention is mostly focused on the early steps, while the harmful instruction at the end of the prompt is almost completely ignored.

As reasoning length increases, attack success rates jump dramatically. Per the study, success rates jumped from 27% when minimal reasoning is used to 51% at natural reasoning lengths, and soared to 80% or more with extended reasoning chains.

This vulnerability affects nearly every major AI model on the market today, including OpenAI’s GPT, Anthropic’s Claude, Google’s Gemini, and xAI’s Grok. Even models that have been fine-tuned for increased safety, known as “alignment-tuned” models, begin to fail once attackers exploit their internal reasoning layers.

Scaling a model’s reasoning abilities is one of the main ways that AI companies have been able to improve their overall frontier model performance in the last year, after traditional scaling methods appeared to show diminishing gains. Advanced reasoning allows models to tackle more complex questions, helping them act less like pattern-matchers and more like human problem solvers.

One solution the researchers suggest is a type of “reasoning-aware defense.” This approach keeps track of how many of the AI’s safety checks remain active as it thinks through each step of a question. If any step weakens these safety signals, the system penalizes it and brings the AI’s focus back to the potentially harmful part of the prompt. Early tests show this method can restore safety while still allowing the AI to perform well and answer normal questions effectively.

Advertisement

Source link

Title

This industrial giant is emerging as a big AI play, says Wells Fargo This industrial giant is emerging as a big AI play, says Wells Fargo
Crypto4 months ago

This industrial giant is emerging as a big AI play, says Wells Fargo

  Wells Fargo sees Caterpillar continuing to roar higher, emerging as an artificial intelligence play. The bank initiated shares of...

Novo Nordisk's strategy tested as investors push back on board revamp Novo Nordisk's strategy tested as investors push back on board revamp
Crypto4 months ago

Novo Nordisk’s strategy tested as investors push back on board revamp

    Flags with the logos of Danish drugmaker Novo Nordisk, maker of the blockbuster diabetes and weight-loss treatments Ozempic...

Alibaba plans AI subscriptions, stablecoin-like payments with JPMorgan Alibaba plans AI subscriptions, stablecoin-like payments with JPMorgan
Crypto4 months ago

Alibaba plans AI subscriptions, stablecoin-like payments with JPMorgan

  Key Points Alibaba plans to use “tokenization” of payments for cross-border transactions in its business-to-business arm. Kuo Zhang, president...

Abraham Lincoln set off an education revolution in 1862 with the Land Grant Act. We need the same thing today for AI Abraham Lincoln set off an education revolution in 1862 with the Land Grant Act. We need the same thing today for AI
Crypto4 months ago

UK borrowing costs spike on report government to scrap plans to raise income tax

    Rachel Reeves, U.K. chancellor of the exchequer, delivers a speech in London, UK, on Tuesday, Nov. 4, 2025. Bloomberg...

An Indonesian Unicorn's Vision For Digital Payments An Indonesian Unicorn's Vision For Digital Payments
Crypto4 months ago

Trump’s threatened the BBC with a $1B lawsuit: Here’s what’s going on

    US President Donald Trump speaks to reporters as he arrives at Palm Beach International Airport on Oct. 31,...

We're downgrading a portfolio stock. Plus, what's causing the market's rally We're downgrading a portfolio stock. Plus, what's causing the market's rally
Crypto4 months ago

UBS’s picks for global returns next year

  Investors looking for global diversification opportunities should look to a specific subset of stocks in Europe, according to UBS...

Nvidia will soar nearly 75%, says Loop Capital Nvidia will soar nearly 75%, says Loop Capital
News4 months ago

AI companies admit they’re worried about a bubble

    Eakarat Buanoi | Istock | Getty Images LISBON, Portugal — Top tech executives told CNBC they’re concerned about...

CEO Southeast Asia's top bank DBS says AI adoption already paying off CEO Southeast Asia's top bank DBS says AI adoption already paying off
News4 months ago

CEO Southeast Asia’s top bank DBS says AI adoption already paying off

Tan Su Shan, deputy chief executive officer and managing director of institutional banking at DBS Group Holdings Ltd., speaks during...

China's economic slowdown deepens in October as housing slump worsens and investments shrink more than expected China's economic slowdown deepens in October as housing slump worsens and investments shrink more than expected
News4 months ago

China’s economic slowdown deepens in October as housing slump worsens and investments shrink more than expected

CHENGDU, CHINA – OCTOBER 18: People walk past the Louis Vuitton store at Taikoo Li, a high-end shopping area that...

U.S. to remove tariffs on some products from Ecuador, Argentina, Guatemala and El Salvador U.S. to remove tariffs on some products from Ecuador, Argentina, Guatemala and El Salvador
News4 months ago

U.S. to remove tariffs on some products from Ecuador, Argentina, Guatemala and El Salvador

The United States said Thursday it will remove tariffs on some foods and other imports from Argentina, Ecuador, Guatemala and...

Advertisement