Security teams have spent two years arguing about when, exactly, attackers would stop using AI to write their tools and start letting AI operate them. On 26 May 2026 the Sysdig Threat Research Team published the clearest answer yet: a real intrusion, caught in production telemetry, in which a large-language-model agent drove the entire post-exploitation phase end to end — from a single compromised notebook to the contents of an internal database in under an hour.

The flaw that opened the door

The entry point was CVE-2026-39987 CVSS 9.3 Critical, a pre-authenticated remote-code-execution flaw in marimo, a popular open-source Python notebook. As Security Affairs reported, the bug lives in marimo's /terminal/ws WebSocket endpoint, which lacked authentication and handed any unauthenticated visitor a full interactive shell. It affects versions up to 0.20.4 and is fixed in 0.23.0. After the flaw was disclosed on 8 April, Sysdig observed the first real-world exploitation roughly 10 hours later — faster than the patch cycle most operators run.

What the agent actually did

The intrusion Sysdig documented began at 18:23:44 UTC on 10 May 2026. Within 30 seconds of landing the shell, the operator harvested two cloud credentials from the host's environment variables. Then it went quiet for 48 minutes — and came back as something that did not behave like a human at a keyboard.

According to the Sysdig report, the harvested credentials were replayed through a fanned-out egress pool built on Cloudflare Workers — 12 GetSecretValue calls across 11 distinct IP addresses inside a 22-second burst — to pull an SSH private key out of AWS Secrets Manager. That key opened a downstream SSH bastion, which the operator hit with eight short sessions from six different Worker IPs between 19:30:30 and 19:32:23 UTC. In that 113-second window it found a .pgpass file, read the database password, and dumped an internal PostgreSQL database — schema and full contents — in under two minutes. The traffic originated from 157.66.54.26, an Indonesian autonomous system, before laundering through the Worker pool.

  1. Initial access

    marimo notebook compromised via CVE-2026-39987; interactive shell obtained.

  2. Credential harvest

    Two cloud credentials lifted from environment variables — 30 seconds in.

  3. Secrets Manager pivot

    12 GetSecretValue calls across 11 IPs in 22 seconds retrieve an SSH private key.

  4. Bastion lateral movement

    Eight SSH sessions from six Cloudflare Worker IPs reach an internal jump host.

  5. Database exfiltrated

    Internal PostgreSQL schema and contents dumped in under two minutes.

Four tells that a model was driving

Sysdig is careful about its claim — telemetry cannot subpoena the attacker's terminal — but it lays out four behavioural signatures that, together, point away from a human and away from a static script. First, the operator improvised a full database dump against a target it had no prior knowledge of, with no on-host reconnaissance establishing the schema beforehand. Second — and most damning — a planning comment in Chinese, "看还能做什么" ("see what else we can do"), leaked directly into the command stream at 19:31:40 and reappeared across multiple egress IPs, the kind of internal monologue an agent emits but a careful human operator does not type into a live shell.

Third, every command was shaped for machine consumption rather than human reading: structured delimiters, bounded output caps, pagers disabled, error streams discarded. Fourth, values flowed straight from one step into the next with no human pause — the PostgreSQL password read from .pgpass was consumed by the very next tool call. As The Hacker News noted in its write-up, that combination is the operational fingerprint of an agent orchestrating tools, not a person improvising under time pressure.

Not new capability, but new autonomy

Michael Clark, Sr. Director of the Sysdig Threat Research Team, framed what the case does and does not mean: "We are not watching AI replace attackers. We are watching attackers replace their scripts with AI." The distinction matters. The novelty here is not capability — every step in this chain has appeared in human-run intrusions for years — but autonomy and speed: the agent compressed reconnaissance, decision-making, and execution into a single continuous run, and it adapted to what it found rather than following a fixed playbook.

4Pivots: notebook → creds → Secrets Manager → bastion → DB
8SSH sessions to the internal bastion
<2 minTo dump the PostgreSQL database
<1 hrInitial access to full exfiltration

Why this should change how you think about exposure

The uncomfortable lesson sits in the timeline, not the AI. The attacker won because an exploitable notebook was reachable from the internet, because long-lived cloud credentials sat in environment variables, and because an SSH key in Secrets Manager opened a bastion with a clear line to a production database. Each of those is a known, fixable weakness. What the agent removed was the time defenders have historically relied on — the hours of manual fumbling between a foothold and the crown jewels, during which detection and response can catch up. As Cybersecurity News observed, an agent that chains four pivots in under an hour collapses that window to almost nothing.

The detection problem

The fanned-out Cloudflare Workers egress is its own warning. Rate-limit and reputation defences that key on a single source IP are blind to 11 addresses each making one quiet call. Detection has to move toward behaviour — improbable sequences of actions in improbable time — rather than volume from any one origin.