Kernel

Unlocking extra cluster capacity with enhanced Linux cgroup scheduling

UA2.114 (Baudoux)
Al Amjad Isstaif
<p>Cluster orchestrators such as Kubernetes rely on an accurate model of the resources available on each worker node in a cluster and on the resources a given job requires, using this information to place the job onto a suitable worker node in the cluster. If either is inaccurate, the orchestrator will make poor job placement decisions, resulting in poor performance.</p> <p>I observe that Linux kernel scheduling overheads can, for workloads making heavy use of Linux's group scheduling (cgroups) which include common serverless workloads, become so significant as to make the orchestrator model of worker node resources inaccurate. In practice this effect is mitigated by over-provisioning the cluster.</p> <p>I propose and evaluate an enhancement to the Linux Completely Fair Scheduler (CFS) that mitigates these effects. By prioritising task completion over strict fairness, the enhanced scheduler is able to drain contended CPU run queues more rapidly and reduce time lost to context switching. Experimental results show that this approach can deliver equivalent performance using up at least 10% fewer worker nodes, significantly improving cluster efficiency.</p>

Additional information

Live Stream https://live.fosdem.org/watch/ua2114
Type devroom
Language English

More sessions

2/1/26
Kernel
UA2.114 (Baudoux)
<p>When a kernel component like a storage driver misbehaves in production, developers face a difficult choice. They either have too little information to solve the bug or they enable slow console-level debug logs that ruin performance. This talk introduces a per-component binary logging mechanism designed to support verbose logging in production with negligible run-time cost.</p> <p>We achieve this efficiency by moving the heavy lifting to build time. using preprocessor macros, we emit parameter ...
2/1/26
Kernel
Ahmad Fatoum
UA2.114 (Baudoux)
<p>For years, Ahmad’s ideal has been simple: unpack a rootfs on a server, mount it over NFS (or usb9pfs), boot directly into it, and everything just works™.</p> <p>But as secure boot becomes the default on many embedded systems, squeezing in a network-booted kernel is getting harder and often falls outside the supported boot flow entirely.</p> <p>Fortunately, some recent improvements in the kernel build system pave the way for a far less invasive netboot setup. This talk gives a quick tour ...
2/1/26
Kernel
Bartosz Golaszewski
UA2.114 (Baudoux)
<p>The linux kernel driver model has grown over the years and acquired several different mechanisms for passing device configuration data to platform drivers. This configuration can come from firmware (device-tree, ACPI) or from the kernel code itself (board-files, MFD, auxiliary drivers).</p> <p>For a less experienced driver developer, the different APIs that are used to access device properties can be quite confusing and lead to questions: should I use the OF routines? Maybe fwnode or the ...
2/1/26
Kernel
Fernando Fernandez Mancera
UA2.114 (Baudoux)
<p>A new RFC for Netfilter/nftables arrived recently in the netfilter-devel mailing list [1], introducing flexible math operation support for network packet fields. This could solve some migration problems from iptables to nftables and in addition empower other use-cases.</p> <p>This demo will quickly show how it works with simple real-world scenarios.</p> <p>[1] https://lore.kernel.org/netfilter-devel/20250923152452.3618-1-fmancera@suse.de/</p>
2/1/26
Kernel
Felix Moessbauer
UA2.114 (Baudoux)
<p>Tracing complex systems often requires insights from both the kernel and userspace. While tools like Linux's ftrace excel at kernel-level observability and LTTng provides low-overhead userspace tracing, unifying these disparate data sources for a holistic view remains a challenge: using LTTng for kernel tracing requires an out-of-tree kernel module, which can be a barrier for many users.</p> <p>This talk introduces bt2-ftrace-to-ctf - a new open-source project designed to bridge this gap. Our ...
2/1/26
Kernel
Luca Di Maio
UA2.114 (Baudoux)
<p>Creating filesystem images typically requires mounting, copying files, and hoping your build environment doesn't introduce non-determinism. New capabilities in mkfs.xfs solve both problems. You can now populate an XFS filesystem directly from a directory tree at creation time, no mount required. I'll cover the implementation approach, discuss design, and show how to use it. Useful for distributions, embedded systems, and anyone who needs verifiable filesystem artifacts.</p> <p>Reference ...
2/1/26
Kernel
Julia Lawall
UA2.114 (Baudoux)
<p>Correctness of operating system kernel code is very important. Testing is helpful, but does not always thoroughly uncover all issues. In the Whisper team at Inria, we are exploring the possibility of applying formal verification, using Frama-C, to Linux kernel code. This entails writing specifications, constructing loop invariants, and checking correctness with the support of a SMT solver. This talk will report on the opportunities and challenges encountered.</p>