Session
Schedule FOSDEM 2022
Monitoring and Observability

Unikraft Performance Monitoring with Prometheus

D.monitoring
Cezar Craciunoiu
<p>Unikraft, and similar unikernels, offer isolation by running a single application inside a separated virtual machine. As such, extracting information from the machine can prove difficult. Moreover, because Unikraft offers support for running a single process at a time, alternate solutions had to be found for exporting data. Prometheus is a common tool used to collect and visualize data that offers decoupling from the observed system, as such, we saw it as a prime candidate for exporting information.</p> <p>Our solution was to port a Prometheus exporter inside Unikraft as a separate library and run it on a separate thread. Information from the unikernel is extracted by Prometheus through an intermediary library, named ukstore, that behaves like a simplified ProcFS. ukstore offers an easy method for accessing information and metrics from the system. Using Prometheus with Unikraft, we are thus able to extract performance metrics from highly-specialized virtual machines, store them in a time series database and display them using plots.</p>
The main interface between configuration and statistics data from Unikraft and applications is ukstore. It is based on ProcFS and offers easy “folder/data” access. This solves the problem of searching through headers to find functions that offer information. The main advantage of ukstore is that all of its setup is done at build-time so there is no impact on the boot-time. Furthermore, it has flexibility, by offering the possibility to create both static entries (created at build-time) and dynamic entries (created at run-time). Metrics inside Unikraft are provided by the libraries that generate them. As such, every library can contain functions that offer usage data. At the moment, work was done to offer metrics related to memory, networking, and thread-locking, with metrics related to file systems and the CPU underway. The end goal is to cover as much of the known metrics from Linux, offered through ProcFS, and expose them through ukstore. Porting a Prometheus exporter came with the challenge of finding one with the least amount of dependencies, whilst still retaining the usual format. We chose an exporter and adapted it through patches to successfully interact with Unikraft and exposed available performance metrics through it. With all the above puzzle pieces put together, Unikraft instances can be monitored reliably with a minimal impact on the performance of the instances themselves.

Additional information

Type devroom

More sessions

2/6/22
Monitoring and Observability
Richard Hartmann
D.monitoring
<p>Opening!</p>
2/6/22
Monitoring and Observability
Vincent Behar
D.monitoring
<p>In this session, we’ll see why we adopted OpenTelemetry &amp; its collector for an internal platform at Ubisoft - to collect/process/export all our logs, metrics, and traces. We’ll explain how we handled the required mindset change: why people should instrument more their code, and how to onboard them. And of course, we’ll talk about the benefits of fully adopting OpenTelemetry.</p> <p>The intended audience is people who want to adopt OpenTelemetry, or who are already using part of it - ...
2/6/22
Monitoring and Observability
Bram Vogelaar
D.monitoring
<p>A gentle introduction to Observability and how to setup a highly available monitoring platform across multiple datacenters.</p> <p>During this talk we will investigate how we can setup and monitor an monitoring setup across 2 DCs using Prometheus, Loki, Tempo, Alertmanager and Grafana. monitoring some services with some lessons learned along the way.</p>
2/6/22
Monitoring and Observability
Ryan Perry
D.monitoring
<p>Profiling is an effective way of understanding which parts of your application are consuming the most resources. Traditionally, logs, metrics and traces have been considered the three pillars of observability, but more recently profiling has emerged as a fourth pillar to be used alongside these other observability tools.</p> <p>Continuous Profiling, in particular, adds a dimension of time that allows you to understand your system’s resource usage (i.e. CPU, Memory, etc.) over time and gives ...
2/6/22
Monitoring and Observability
D.monitoring
<p>In this session, we’ll see eBPF monitoring in action applied to the Kafka world as an example of a complex Java application: identify Kafka consumers, producers, and brokers, see how they interact with each other and how many resources they consume. We'll even show how to measure consumer lag without external components. If you want to know what’s next in Java and Kafka observability in Kubernetes, this session is for you.</p>
2/6/22
Monitoring and Observability
D.monitoring
<p>This talk is aimed for engineers operating in distributed environments (or microservices) interested in monitoring exceptions at scale. We introduce the open source project "Periskop", a pull-based exception monitoring service built at SoundCloud and inspired by Prometheus.</p>
2/6/22
Monitoring and Observability
Matthias Loibl
D.monitoring
<p>Continuous profiling is a widely used practice at Google but has only recently started gaining popularity in the Observability space, however, resources on this topic are still rare compared to other observability signals especially on open source projects. This talk intends to educate the wider community about the possibilities of continuous profiling, and give a glimpse into open-source tooling allowing everyone to join in on the practice and enabling everyone to build better software.</p>