AI Security
Mozilla Used Claude Mythos to Find 271 Firefox Bugs — Almost No False Positives
Mozilla built a custom agent wrapper around Anthropic Claude Mythos Preview and pointed it at the Firefox codebase. The result: 271 security vulnerabilities found, 180 rated sec‑high, with almost no false positives.
The Numbers: 271 Bugs, 180 Sec‑High
Mozilla shipped Firefox 150 with fixes for 271 security vulnerabilities discovered by Anthropic's Claude Mythos Preview. Of those, 180 carried Mozilla's highest internal severity rating: sec‑high — bugs exploitable through normal browsing. Another 80 landed at sec‑moderate, 11 at sec‑low. Ars Technica reported Firefox averaged just 20‑25 security fixes per month throughout 2025. In April 2026, the team shipped 423 total fixes, 271 tied directly to Mythos.
The Secret Sauce: A Custom Agent Wrapper
The model alone didn't produce these results. Mozilla built a custom agent wrapper — code that drives the LLM in a structured loop with access to Firefox's build systems, fuzzing infrastructure, and sanitizer builds. Mozilla Distinguished Engineer Brian Grinstead described it to:1 "the code that drives the LLM in order to accomplish a goal. It gives the model instructions, provides it tools, then runs it in a loop until completion." The wrapper transformed Mythos from a passive code reviewer into an active agent.
The Pipeline: Craft, Crash, Verify, Review
Mozilla's AI bug‑hunting pipeline runs in four stages. First, Mythos inspects a source file and crafts a test case — often HTML designed to trigger unsafe behavior. Second, the test runs against a Firefox sanitizer build. A crash equals a confirmed bug. Third, a second LLM grades the output. Fourth, a human engineer performs final review. The sanitizer build provides an unambiguous success signal — the binary crashes or it doesn't. That binary gate eliminated the hallucination problem. Business Insider noted one bug had gone undetected by fuzzers for years.
Almost No False Positives
Earlier AI bug‑finding produced what Mozilla engineers called "unwanted slop" — plausible‑sounding bug reports that were wrong. The Mythos wrapper changed the economics. Grinstead told 1 the reports now have "almost no false positives." Mozilla unhid 12 Bugzilla reports as evidence. The key insight: when your success signal is a sanitizer crash, there is nothing to hallucinate. The binary tells the truth.
From Opus to Mythos: 12x Acceleration
The jump from Opus 4.6 to Mythos Preview quantifies the acceleration. Opus found 22 bugs in Firefox 148. Mythos found 271 in Firefox 150 — a 12x increase between consecutive releases. Mozilla CTO Bobby Holley wrote on the Mozilla blog: "computers were completely incapable of doing this a few months ago, and now they excel at it." Mozilla plans to integrate AI analysis directly into the Firefox development pipeline.
Defenders Finally Get Leverage
Software security has been offensively dominant for decades. Attackers need one weakness. Defenders must protect everything. AI vulnerability discovery at scale — paired with deterministic verification — flips that asymmetry. Holley's closing line on the Mozilla blog: "Defenders finally have a chance to win, decisively." The organizations that build the wrappers first will find the bugs first.
Sources
- 1.Ars Technica(arstechnica.com)
- 2.Business Insider(businessinsider.com)
Jun 12, 2026
Perplexity Moves Deep Research Into Computer, Routing Tasks Across 20+ AI Models
Perplexity has moved its Deep Research capability into Computer, its multi-model orchestration system that breaks complex questions into subtasks and routes them across 20+ frontier AI models. The upgrade produces work-ready reports, decks, and dashboards.
Jun 12, 2026
OpenAI Buys Cloud Startup Ona So Codex Can Run Tasks While Your Laptop Is Closed
OpenAI is acquiring German cloud startup Ona to give Codex persistent cloud environments where AI agents can run multi-step coding tasks across hours or days — even when your laptop is shut. Codex now has over 5 million weekly active users.
Jun 12, 2026
GPT-5.5 Beats Claude Fable 5 on Brutal New Agents' Last Exam Benchmark
OpenAI's GPT-5.5 beat Anthropic's brand-new Claude Fable 5 on the Agents' Last Exam benchmark, a grueling new test from UC Berkeley that measures whether AI can execute real, economically valuable professional workflows — and both models still fail most of the time.
Related News
Jun 10, 2026
Microsoft AI Chief Warns Anthropic's Claude Consciousness Talk Is 'Really Dangerous'
Microsoft AI CEO Mustafa Suleyman publicly accused Anthropic of dangerously anthropomorphizing Claude, calling consciousness speculation 'really, really dangerous' in a rare public clash between top AI labs. The dispute reveals a growing rift over how AI companies should talk about their models.
Jun 10, 2026
Anthropic Releases Claude Fable 5, First Public Mythos-Class AI Model
Anthropic has released Claude Fable 5, the first Mythos-class AI model available to the general public. The model tops coding benchmarks with 80.3% on SWE-Bench Pro and can perform codebase-wide migrations in a single day — but comes with new safety guardrails that route sensitive queries to a weaker model.
Jun 9, 2026
Trump Signs AI Security Order Requiring 30-Day Review of Frontier Models
President Trump signed an executive order mandating a voluntary 30-day pre-release cybersecurity review for frontier AI models, a direct response to Anthropic's Mythos model. The order creates a Treasury-led clearinghouse for vulnerability scanning and marks a shift from the administration's deregulatory stance toward federal AI oversight.