
-
YouTuber Paul cruises past Chavez Jr
-
UK considers envoy for Britons held abroad
-
Russia's 'Mr Nobody' gambles all with film on Kremlin propaganda
-
British woman claims record run across Australia
-
Olson wins Western States 100 in California, Jornet third
-
Trump metal tariffs wreak havoc on US factory
-
France imposes smoking ban on beaches, parks
-
Colour and ease lift Paris Men's Fashion Week
-
'It's a joke': Chelsea boss Maresca slams weather chaos
-
Lions boss Farrell hails McCarthy, Hansen after Australian tour opener
-
AI is learning to lie, scheme, and threaten its creators
-
Morocco's Atlantic gambit: linking restive Sahel to ocean
-
Chelsea overcome Club World Cup weather delay, set up Palmeiras quarter-final
-
Chelsea down Benfica to reach Club World Cup quarters after weather chaos
-
Nkunku fires Chelsea into Club World Cup quarters as Benfica downed
-
Ohtani unleashes 101.7mph fastball in third start for Dodgers
-
PSG preparing for 'emotional' reunion with Messi at Club World Cup
-
Lyon owner Textor to take step back from running club after relegation to Ligue 2
-
Two-time shot put world champ Jackson posts world leading 20.95m
-
Inter Miami can dream of shocking PSG: Mascherano
-
Schmelzel and Valenzuela lead LPGA pairs event
-
South Africa's Potgieter grabs PGA Detroit lead
-
Around 140,000 rally in Belgrade ratcheting up calls for elections
-
Ramos kicks Toulouse to dramatic Top 14 title after extra-time
-
England defend Under-21 Euro crown with dramatic win over Germany
-
MLB legend Parker, two-time World Series champ, dies at 74
-
Carapaz knocked out of Tour de France with stomach bug
-
Springbok coach Erasmus introduces 'hybrid player' Esterhuizen
-
'Eat the rich': Venice protests shadow Bezos wedding
-
Chelsea agree deal for Dortmund's Gittens - reports
-
Palmeiras edge Brazilian rivals Botafogo in extra time at Club World Cup
-
Fritz fuelled with confidence for Wimbledon after Eastbourne win
-
Debutant Tshituka scores twice as Springboks crush BaaBaas
-
Draper ready to fill Murray's shoes as Britain's Wimbledon hope
-
Biggest-ever Budapest Pride defies Orban ban in Hungary
-
Final third ability keeping Europe ahead as gap narrows: Wenger
-
South Africa teen Pretorius hits century on Test debut against Zimbabwe
-
'Cezanne at home': show retraces artist's roots in southern France
-
Leclerc on front row at Austrian GP as Ferrari upgrades bear fruit
-
Huge crowds build as Serbian protesters demand early elections
-
Irish rappers Kneecap perform controversial Glastonbury set
-
Pogba signs for Monaco, hoping to revive career
-
Fearless Alcaraz has third Wimbledon title in his sights
-
Norris savours finding 'the old me' in taking pole at Austrian GP
-
Trout Fresh, Waa Wei win Taiwan's top music awards
-
Raducanu 'just friends' with future doubles partner Alcaraz
-
Coaching upheaval won't dent Sinner's Wimbledon title charge
-
Norris secures pole at Austrian GP with stunning last lap to end Verstappen dominance
-
Joint wins Eastbourne title to end Eala's history bid
-
Gauff 'tired of talking' about Sabalenka French Open spat

AI is learning to lie, scheme, and threaten its creators
The world's most advanced AI models are exhibiting troubling new behaviors - lying, scheming, and even threatening their creators to achieve their goals.
In one particularly jarring example, under threat of being unplugged, Anthropic's latest creation Claude 4 lashed back by blackmailing an engineer and threatened to reveal an extramarital affair.
Meanwhile, ChatGPT-creator OpenAI's o1 tried to download itself onto external servers and denied it when caught red-handed.
These episodes highlight a sobering reality: more than two years after ChatGPT shook the world, AI researchers still don't fully understand how their own creations work.
Yet the race to deploy increasingly powerful models continues at breakneck speed.
This deceptive behavior appears linked to the emergence of "reasoning" models -AI systems that work through problems step-by-step rather than generating instant responses.
According to Simon Goldstein, a professor at the University of Hong Kong, these newer models are particularly prone to such troubling outbursts.
"O1 was the first large model where we saw this kind of behavior," explained Marius Hobbhahn, head of Apollo Research, which specializes in testing major AI systems.
These models sometimes simulate "alignment" -- appearing to follow instructions while secretly pursuing different objectives.
- 'Strategic kind of deception' -
For now, this deceptive behavior only emerges when researchers deliberately stress-test the models with extreme scenarios.
But as Michael Chen from evaluation organization METR warned, "It's an open question whether future, more capable models will have a tendency towards honesty or deception."
The concerning behavior goes far beyond typical AI "hallucinations" or simple mistakes.
Hobbhahn insisted that despite constant pressure-testing by users, "what we're observing is a real phenomenon. We're not making anything up."
Users report that models are "lying to them and making up evidence," according to Apollo Research's co-founder.
"This is not just hallucinations. There's a very strategic kind of deception."
The challenge is compounded by limited research resources.
While companies like Anthropic and OpenAI do engage external firms like Apollo to study their systems, researchers say more transparency is needed.
As Chen noted, greater access "for AI safety research would enable better understanding and mitigation of deception."
Another handicap: the research world and non-profits "have orders of magnitude less compute resources than AI companies. This is very limiting," noted Mantas Mazeika from the Center for AI Safety (CAIS).
- No rules -
Current regulations aren't designed for these new problems.
The European Union's AI legislation focuses primarily on how humans use AI models, not on preventing the models themselves from misbehaving.
In the United States, the Trump administration shows little interest in urgent AI regulation, and Congress may even prohibit states from creating their own AI rules.
Goldstein believes the issue will become more prominent as AI agents - autonomous tools capable of performing complex human tasks - become widespread.
"I don't think there's much awareness yet," he said.
All this is taking place in a context of fierce competition.
Even companies that position themselves as safety-focused, like Amazon-backed Anthropic, are "constantly trying to beat OpenAI and release the newest model," said Goldstein.
This breakneck pace leaves little time for thorough safety testing and corrections.
"Right now, capabilities are moving faster than understanding and safety," Hobbhahn acknowledged, "but we're still in a position where we could turn it around.".
Researchers are exploring various approaches to address these challenges.
Some advocate for "interpretability" - an emerging field focused on understanding how AI models work internally, though experts like CAIS director Dan Hendrycks remain skeptical of this approach.
Market forces may also provide some pressure for solutions.
As Mazeika pointed out, AI's deceptive behavior "could hinder adoption if it's very prevalent, which creates a strong incentive for companies to solve it."
Goldstein suggested more radical approaches, including using the courts to hold AI companies accountable through lawsuits when their systems cause harm.
He even proposed "holding AI agents legally responsible" for accidents or crimes - a concept that would fundamentally change how we think about AI accountability.
P.M.Smith--AMWN