Security

Breaking BOTS: Cheating at Blue Team CTFs with AI Speed-Runs

After we announced our results, CTFs like Splunk's Boss of the SOC (BOTS) started prohibiting AI agents. For science & profit, we keep doing it anyways. In BOTS, the AIs solve most of it in under 10 minutes instead of taking the full day. Our recipe was surprisingly simple: Teach AI agents to self-plan their investigation steps, adapt their plans to new information, work with the SIEM DB, and reason about log dumps. No exotic models, no massive lab budgets - just publicly available LLMs mixed with a bit of science and perseverance. We'll walk through how that works, including videos of the many ways AI trips itself up that marketers would rather hide, and how to do it at home with free and open-source tools. CTF organizers can't detect this - the arms race is probably over before it really began. But the real question isn't "can we cheat at CTFs?" It's what happens when investigations evolve from analysts-who-investigate to analysts-who-manage-AI-investigators. We'll show you what that transition already looks like today and peek into some uncomfortable questions about what comes next.
THE PLAN Live demonstrations of AI agents speed-running blue team challenges, including the failure modes that break investigations. We'll show both what happens when we try the trivial approaches like “just have claude do it”, “AI workflows”, and what ultimately worked, like managed self-planning, semantic SIEM layers, and log agents. Most can be done with free and open tools and techniques on the cheap, so we will walk through that as well. THE DEEP DIVE * Why normal prompts and static AI workflows fail * Self-planning investigation agents that evolve task lists dynamically * What we mean by semantic layers for calling databases and APIs * How to handle millions of log events without bankrupting yourself * Why "no AI" rules are misguided technically and conceptually GOING BEYOND CTFS The same patterns that trivialize training exercises work on real SOC investigations. We're watching blue team work fundamentally transform - from humans investigating to humans managing AI investigators. Training programs teaching skills AI already automates. Hiring practices that can't verify who's doing the work. Certifications losing meaning. More fundamentally, when we talk about who watches the watchers, a lot is about to shift again.

Additional information

Live Stream https://streaming.media.ccc.de/39c3/one
Type Talk
Language English

More sessions

12/27/25
Security
Jade Sheffey
Zero
The Great Firewall of China (GFW) is one of, if not arguably the most advanced Internet censorship systems in the world. Because repressive governments generally do not simply publish their censorship rules, the task of determining exactly what is and isn’t allowed falls upon the censorship measurement community, who run experiments over censored networks. In this talk, we’ll discuss two ways censorship measurement has evolved from passive experimentation to active attacks against the Great ...
12/27/25
Security
Fuse
Reports of GNSS interference in the Baltic Sea have become almost routine — airplanes losing GPS, ships drifting off course, and timing systems failing. But what happens when a group of engineers decides to build a navigation system that simply *doesn’t care* about the jammer? Since 2017, we’ve been developing **R-Mode**, a terrestrial navigation system that uses existing radio beacons and maritime infrastructure to provide independent positioning — no satellites needed. In this talk, ...
12/27/25
Security
Christoph Saatjohann
Zero
Zwei Jahre nach dem ersten KIM-Vortrag auf dem 37C3: Die gezeigten Schwachstellen wurden inzwischen geschlossen. Weiterhin können mit dem aktuellen KIM 1.5+ nun große Dateien bis 500 MB übertragen werden, das Signaturhandling wurde für die Nutzenden vereinfacht, indem die Detailinformationen der Signatur nicht mehr einsehbar sind. Aber ist das System jetzt sicher oder gibt es neue Probleme?
12/27/25
Security
tihmstar
One
While trying to apply fault injection to the AMD Platform Security Processor with unusual (self-imposed) requirements/restrictions, it were software bugs which stopped initial glitching attempts. Once discovered, the software bug was used as an entry to explore the target, which in turn lead to uncovering (and exploiting) more and more bugs, ending up in EL3 of the most secure core on the chip. This talk is about the story of trying to glitch the AMD Platform Security Processor, then ...
12/27/25
Security
One
The Deutschlandticket was the flagship transport policy of the last government, rolled out in an impressive timescale for a political project; but this speed came with a cost - a system ripe for fraud at an industrial scale. German public transport is famously decentralised, with thousands of individual companies involved in ticketing and operations. Unifying all of these under one national, secure, system has proven a challenge too far for politicians. The end result: losses in the hundreds of ...
12/27/25
Security
Ground
In August 2024, Raspberry Pi released their newest MCU: The RP2350. Alongside the chip, they also released the RP2350 Hacking Challenge: A public call to break the secure boot implementation of the RP2350. This challenge concluded in January 2025 and led to five exciting attacks discovered by different individuals. In this talk, we will provide a technical deep dive in the RP2350 security architecture and highlight the different attacks. Afterwards, we talk about two of the breaks in ...
12/27/25
Security
Fuse
FreeBSD’s jail mechanism promises strong isolation—but how strong is it really? In this talk, we explore what it takes to escape a compromised FreeBSD jail by auditing the kernel’s attack surface, identifying dozens of vulnerabilities across exposed subsystems, and developing practical proof-of-concept exploits. We’ll share our findings, demo some real escapes, and discuss what they reveal about the challenges of maintaining robust OS isolation.