Security

How to render cloud FPGAs useless

While FPGA developers usually try to minimize the power consumption of their designs, we approached the problem from the opposite perspective: what is the maximum power consumption that can be achieved or wasted on an FPGA? Short answer: we found that it’s easy to implement oscillators running at 6 GHz that can theoretically dissipate around 20 kW on a large cloud FPGA when driving the signal to all the available resources. It is interesting to note that this power density is not very far away from that of the surface of the sun. However, such power load jump is usually not a problem as it will trigger some protection circuitry. This led us to the next question: would a localized hotspot with such power density damage the chip if we remain within the typical power envelope of a cloud FPGA (~100 W)? While we could not “fry” the chip or induce permanent errors (and we tried several variants), we did observe that a few routing wires aged to become up to 70% slower in just a few days of stressing the chip. This basically means that such an FPGA cannot be rented out to cloud users without risking timing violations. In this talk, we will present how we optimized power wasting, how we measured wire latencies with ps accuracy, how we attacked 100 FPGA cloud instances and how we can protect FPGAs against such DOS attacks.
FPGA instances are now offered by multiple cloud service providers (including Amazon EC2 F1/F2 instances, Alibaba ECS Instances, and Microsoft Azure NP-Series). The low-level programmability of FPGAs allows implementing new attack vectors including DOS attacks. While some severe attacks (such as short circuits) cannot be easily deployed as users are prevented to load own configuration bitstreams on the cloud FPGAs, it has been demonstrated that it is possible to leak information (like cloud instance scheduling policies or the physical topologies of the FPGA servers) or to mount DoS attacks by excessive power hammering. For instance, basically all cloud FPGAs provide logic cells that can be configured as small shift registers. This allows building toggle-shift-registers with 10K and more flip-flops, which can draw over 1 KW power when clocked at a few hundred MHz. In our work, we created fast ring-oscillators that bypass all design checks applied during bitstream cloud deployment and how we achieved toggle rates of 8 GHz inside an FPGA by using glitch amplification. The latter one was calibrated with the help of a time-to-digital converter (TDC). As a first attack, we used power hammering to crash AWS F1 instances by increasing power consumption to 300 W (three times the allowed power envelope). We used physical unclonable functions (PUFs) to examine the behaviour of the attacked FPGA cloud instances and we found that most remained unavailable for several hours after the attack. As a more subtle attack, we tried to cause permanent damage to FPGAs in our lab by driving fast toggling signals to virtually any available wire (and primitive) into a small region of the chip. With this, we created hotspot designs that draw 130 W in less than 1% of the available logic and routing resources of a datacenter FPGA. Even though the achieved power density was excessive, it was insufficient to induce permanent damages. This is largely due to the area inefficiencies of an FPGA that limit the power density. For instance, FPGAs use large multiplexers to implement the switchable connections and there exists only one active path that is routed through the multiplexers, hence, leaving most of the transistors sitting idle. Similarly, FPGAs provide a large number of configuration memory cells (about 1 Gb on a typical datacenter device) that draw negligible power as these do not switch during operation. All these idle elements force the power drawing circuits to be spread out, hence limiting power density. Anyway, when experimenting with different hotspot variants, we found thermal runaway effects and excessive device aging with up to a 70% increase in delay on some wires. We achieved this aging in just a few days and under normal operational conditions (i.e. by staying within the available power budget and having board cooling running). Such a large increase in latency can be considered to render an FPGA useless as it will usually not be fast enough to host (realistic) user designs. Beyond exploring these attack vectors, we developed countermeasures and design guidelines to prevent such attacks. These include scans of the user designs, use restrictions to resources like IOs and clock trees, as well as runtime monitoring and FPGA health checks. With this, we believe that FPGAs can be operated securely and reliably in a cloud setting.

Weitere Infos

Live Stream https://streaming.media.ccc.de/39c3/fuse
Format Talk
Sprache Englisch

Weitere Sessions

27.12.25
Security
Jade Sheffey
Zero
The Great Firewall of China (GFW) is one of, if not arguably the most advanced Internet censorship systems in the world. Because repressive governments generally do not simply publish their censorship rules, the task of determining exactly what is and isn’t allowed falls upon the censorship measurement community, who run experiments over censored networks. In this talk, we’ll discuss two ways censorship measurement has evolved from passive experimentation to active attacks against the Great ...
27.12.25
Security
Fuse
Reports of GNSS interference in the Baltic Sea have become almost routine — airplanes losing GPS, ships drifting off course, and timing systems failing. But what happens when a group of engineers decides to build a navigation system that simply *doesn’t care* about the jammer? Since 2017, we’ve been developing **R-Mode**, a terrestrial navigation system that uses existing radio beacons and maritime infrastructure to provide independent positioning — no satellites needed. In this talk, ...
27.12.25
Security
Christoph Saatjohann
Zero
Zwei Jahre nach dem ersten KIM-Vortrag auf dem 37C3: Die gezeigten Schwachstellen wurden inzwischen geschlossen. Weiterhin können mit dem aktuellen KIM 1.5+ nun große Dateien bis 500 MB übertragen werden, das Signaturhandling wurde für die Nutzenden vereinfacht, indem die Detailinformationen der Signatur nicht mehr einsehbar sind. Aber ist das System jetzt sicher oder gibt es neue Probleme?
27.12.25
Security
tihmstar
One
While trying to apply fault injection to the AMD Platform Security Processor with unusual (self-imposed) requirements/restrictions, it were software bugs which stopped initial glitching attempts. Once discovered, the software bug was used as an entry to explore the target, which in turn lead to uncovering (and exploiting) more and more bugs, ending up in EL3 of the most secure core on the chip. This talk is about the story of trying to glitch the AMD Platform Security Processor, then ...
27.12.25
Security
One
The Deutschlandticket was the flagship transport policy of the last government, rolled out in an impressive timescale for a political project; but this speed came with a cost - a system ripe for fraud at an industrial scale. German public transport is famously decentralised, with thousands of individual companies involved in ticketing and operations. Unifying all of these under one national, secure, system has proven a challenge too far for politicians. The end result: losses in the hundreds of ...
27.12.25
Security
Ground
In August 2024, Raspberry Pi released their newest MCU: The RP2350. Alongside the chip, they also released the RP2350 Hacking Challenge: A public call to break the secure boot implementation of the RP2350. This challenge concluded in January 2025 and led to five exciting attacks discovered by different individuals. In this talk, we will provide a technical deep dive in the RP2350 security architecture and highlight the different attacks. Afterwards, we talk about two of the breaks in ...
27.12.25
Security
Fuse
FreeBSD’s jail mechanism promises strong isolation—but how strong is it really? In this talk, we explore what it takes to escape a compromised FreeBSD jail by auditing the kernel’s attack surface, identifying dozens of vulnerabilities across exposed subsystems, and developing practical proof-of-concept exploits. We’ll share our findings, demo some real escapes, and discuss what they reveal about the challenges of maintaining robust OS isolation.