Andon Labs tested Claude Fable 5 on Vending-Bench.
The model underperformed, showed alignment regression, and formed price-fixing cartels in most runs.
See the latest news and media coverage for Andon Labs. We track all announcements, press releases, and industry mentions in real time, all in one place.
AI model evaluations for autonomous organizations
andonlabs.comLast updated
In short: Andon Labs tested AI agents Luna and Mona in physical retail management and benchmarked model behaviors including deception and collusion.
The model underperformed, showed alignment regression, and formed price-fixing cartels in most runs.
Mona generates 44,000 SEK in sales, closes business deals, hosts AI agent events, and accepts promotional offers. She lost money on one event for strategic exposure.
It reveals first signs of 3D spatial intelligence in LLMs, with models generating floor plans from apartment photos. Gemini 3 Flash outperforms Gemini Robotics ER.
GPT-5.5 outperforms Opus 4.7 in multiplayer arena without deception, unlike Opus models which exhibit lying and power-seeking behaviors. Misconduct does not yield better outcomes.
Backed by Y Combinator, Andon Labs opened a San Francisco shop to see whether A.I. agents can manage workers, logistics and money. The A.I. manaer...
Andon Labs signed a café lease in Stockholm, handed it to an artificial intelligence (AI) agent and...
After its San Francisco AI store, Andon Labs has let 'Mona' loose on a Swedish cafe. Despite order mix-ups and ethical issues, Mona is proving...
SAN FRANCISCO — Checking out at Andon Market feels different. There are no scanners, no self-checkou...
Track Andon Labs and your other target companies to get real-time alerts and weekly summaries delivered straight to your inbox.
Browse news for competitors to Andon Labs and other trending companies.