Containers

Supervising and emulating syscalls

UD2.208 (Decroly)
Christian Brauner
Recently the kernel landed seccomp support for SECCOMPRETUSER_NOTIF which enables a process (supervisee) to retrieve a fd for its seccomp filter. This fd can then be handed to another (usually more privileged) process (supervisor). The supervisor will then be able to receive seccomp messages about the syscalls having been performed by the supervisee. We have integrated this feature into userspace and currently make heavy use of this to intercept mknod(), mount(), and other syscalls in user namespaces aka in containers. For example, if the mknod() syscall matches a device in a pre-determined whitelist the privileged supervisor will perform the mknod syscall in lieu of the unprivileged supervisee and report back to the supervisee on the success or failure of its attempt. If the syscall does not match a device in a whitelist we simply report an error. This talk is going to show how this works and what limitations we run into and what future improvements we plan on doing in the kernel.

Additional information

Type devroom

More sessions

2/1/20
Containers
Sascha Grunert
UD2.208 (Decroly)
Podman is the container management tool of your choice when it comes to boosting day-to-day development tasks around containers. The journey of Podman started as a drop-in replacement for docker, but nowadays it’s even more than just that. For example, Podman is capable of managing pods, running containers without being root and supports fine granular configuration possibilities.
2/1/20
Containers
Akihiro Suda
UD2.208 (Decroly)
The biggest problem of the OCI Image Spec is that a container cannot be started until all the tarball layers are downloaded, even though more than 90% of the tarball contents are often unneeded for the actual workload. This session will show state-of-the-art alternative image formats, which allow runtime implementations to start a container without waiting for all its image contents to be locally available. Especially, this session will put focus on CRFS/stargz and its implementation status in ...
2/1/20
Containers
Daniel Borkmann
UD2.208 (Decroly)
BPF as a foundational technology in the Linux kernel provides a powerful tool for systems developers and users to dynamically reprogram and customize the kernel to meet their needs in order to solve real-world problems and without having to be a kernel expert. Thanks to BPF we have come to the point to overcome having to carry legacy accumulated over decades of development grounded in a more traditional networking environment that is typically far more static than your average Kubernetes ...
2/1/20
Containers
Ralf Haferkamp
UD2.208 (Decroly)
Kata Containers provide a secure container runtime offering an experience close to that of native containers, while providing stronger workload isolation and host infrastructure security by using hardware virtualization technology. This is particularly useful when containers are used to host and run third-party applications. In this presentation, after a short intro to Kata, we will demonstrate how easy it is to install and use on openSUSE. We will show it in action both as part of a podman ...
2/1/20
Containers
Laurent Bernaille
UD2.208 (Decroly)
Kube-proxy enables access to Kubernetes services (virtual IPs backed by pods) by configuring client-side load-balancing on nodes. The first implementation relied on a userspace proxy which was not very performant. The second implementation used iptables and is still the one used in most Kubernetes clusters. Recently, the community introduced an alternative based on IPVS. This talk will start with a description of the different modes and how they work. It will then focus on the IPVS ...
2/1/20
Containers
Adrian Reber
UD2.208 (Decroly)
The difficult task to checkpoint and restore a process is used in many container runtimes to implement container live migration. This talk will give details how CRIU is able to checkpoint and restore processes, how it is integrated in different container runtimes and which optimizations CRIU offers to decrease the downtime during container migration. In this talk I want to provide details how CRIU checkpoints and restores a process. Starting from ptrace() to pause the process, how parasite code ...
2/1/20
Containers
Thierry Carrez
UD2.208 (Decroly)
Today, the task of running containers involves a lot of technologies and levels of abstraction, and it can be difficult to understand, or just to keep up. How do CRI-O and containerd overlap ? Does Kata containers compete with Firecracker ? Is there any relationship between OCI and CRI ? How many different meanings can "container runtime" have ? In this talk, we will navigate this treacherous sea of overlapping technologies and acronyms that take care of running container workloads, below ...