The First AI-Built Zero-Day Is Not the Interesting Part

In the mid 90s I worked at a company called Cybersafe. Today it would get labeled an IAM/SSO vendor. What we actually built was a first-generation security platform: Kerberos, password management, PKI-based MFA, key management, host intrusion detection, and what would now be called zero trust access. The company failed for the usual startup reasons. People. Corporate Politics. Timing. The technology was a decade ahead of its market.

One debate from that period has stayed with me. As we expanded into host intrusion detection, the question of automated response kept surfacing. Could a system safely act on its own to contain an intrusion in progress? Drop a connection. Kill a process. Isolate a host. Nobody on the team could imagine a credible answer. The false positive risk was unbounded. The response itself could be weaponized. The rule sets were not trustworthy enough to delegate authority. We shipped detection and let humans make the call.

That debate has an answer now, and it is not the one we expected. Automation on the offensive side is not new. Worms, exploit kits, credential stuffing, and phishing infrastructure have been automated for decades. What is new is broad delegated judgment at machine speed, in the hands of people who do not have to worry about false positives because the blast radius is somebody else’s network.

What the report actually shows

The interesting question is not whether AI helped produce a zero-day. That was inevitable. The interesting questions are operational. What kinds of systems make bad machine judgment cheap enough to deploy at scale. What kinds of defensive systems are still pretending human review is the control boundary.

Google Threat Intelligence Group’s latest AI Threat Tracker report documents the first zero-day exploit that GTIG says it has high confidence was developed with AI assistance. The headline framing is technically correct. The specifics tell a more interesting story.

The exploit was a Python script that bypassed 2FA on an open-source web-based system administration tool. It required valid user credentials in the first place. The criminal group planned a mass exploitation campaign, and Google disrupted it through responsible disclosure to the vendor. GTIG identified the artifact as AI-developed because the code carried obvious tells. A hallucinated CVSS score. Textbook Python formatting. Detailed help menus. Educational docstrings characteristic of training data. The artifact still carried the seams of its production.

This is not the LLM failing at the hard part. The vulnerability itself is a real find. GTIG specifically notes that the 2FA flaw stems from a hardcoded trust assumption, a high-level semantic logic flaw of the kind that fuzzers and static analyzers tend to miss but that frontier LLMs can reason about by reading developer intent. The model did discovery work that previously required a competent human auditor. Where the operation broke down was in weaponization. The attacker shipped an artifact that still looked like a tutorial.

This is a familiar failure pattern showing up on the offensive side for the first time. Fluency reads as competence. The attacker trusted an artifact with hallucinated metadata and educational comments still attached because it looked like a real exploit, in the same way over-eager engineering teams hand agents production credentials because the agent sounded like it knew what it was doing. The criminals here got bitten by the same dynamic that has been producing outages and data loss in vibe-coded production systems for the last eighteen months. The substrate is doing some of the work of inviting the misconfiguration.

Hultquist’s thread on the report is hedged correctly. The importance is the trajectory, not this specific specimen. Pull the camera back and the rest of the report is more interesting than the lede.

Three things worth surfacing

APT45 sending thousands of repetitive prompts. The North Korean group has been observed using recursive prompting to analyze CVEs and validate proof-of-concept exploits at scale. That is the industrial-scale answer to LLM variance. Solve the quality problem by amortizing across volume, then have humans cherry-pick the outputs that survived validation. The same statistical strategy that makes modern fuzzing work, applied one layer up the stack. The model does not have to be reliable. The pipeline has to be cheap enough that unreliability does not matter.

CANFAIL and LONGSTREAM using LLM-generated decoy code. A Russia-nexus intrusion cluster has been deploying malware that uses LLM-generated code to conceal malicious functionality. GTIG documented LONGSTREAM containing 32 instances of code querying the system’s daylight saving status, repetitive benign-looking activity used to camouflage the malicious core. CANFAIL carries similar filler logic with LLM-generated comments self-describing the decoy blocks. The stylistic noise of LLM output is becoming the obfuscation layer. The verbose docstrings. The textbook structure. The over-explained variable names. These used to be tells. They are now camouflage. Any heuristic built on the AI-tell will start producing false negatives.

The wooyun-legacy skill plugin. A specialized GitHub repository is being distributed as a Claude code skill plugin that integrates a distilled knowledge base of over 85,000 real-world vulnerability cases from the Chinese bug bounty platform WooYun (2010 to 2016). This is the supply side of the same market. Skill packs are tooling. Tooling gets distributed. The economic logic for adversarial skill packs is identical to the economic logic for legitimate ones. Any platform hosting them inherits a familiar problem. App stores and package registries have been working through it for two decades. Making trust decisions at distribution scale about code from parties you cannot directly inspect.

Both sides are running on the same substrate

On the defensive side, Google is using Big Sleep to find vulnerabilities and CodeMender (Gemini-driven) to fix them automatically. The criminals are pulling from a model class indistinguishable from the one Google is running its defensive tooling on. Both sides have access to the same substrate. The differential collapses to data quality, harness sophistication, and discipline around permissions.

That last one is the part the 90s HIDS conversation did not anticipate. It is also the part that should be the least surprising. The controls discipline did not get easier because the platform got more capable. If anything the gradient got worse. A confused regex IDS in 1999 had a bounded action space. The rule set was enumerable. You could write down what it would do wrong. A confused agent in 2026 has whatever action space its credentials grant it, which in most deployments is more than it should. The fluency that made it easy to give the agent broad permissions in the first place is exactly the property that makes its failures look reasonable in the moment.

The race Hultquist refers to is real, and it has started. The race is not about model capability. Both sides are running models from the same vendors, often the same model. The race is about who has better-curated data feeding their harnesses. Who has stricter discipline around what their automation can touch. Who has the institutional memory of what happens when you delegate authority to a system whose judgment you cannot audit in advance.

The HIDS debate from the mid-90s got an answer. It came from the other side of the wire. Not because defenders learned how to trust autonomous judgment, but because attackers learned they did not need to. They could delegate broadly, externalize the blast radius, and let volume compensate for judgment. The defensive answer cannot be more vibes, broader credentials, and better prompts. It has to be the inverse. Narrower authority. Better harnesses. Replayable decisions. And institutional memory about what happens when fluent systems get mistaken for trustworthy ones.

Leave a Reply

Your email address will not be published. Required fields are marked *