Session
Schedule FOSDEM 2022
Monitoring and Observability

Periskop: Exception Monitoring at Scale

A pull-based exception monitoring service inspired by Prometheus
<p>This talk is aimed for engineers operating in distributed environments (or microservices) interested in monitoring exceptions at scale. We introduce the open source project "Periskop", a pull-based exception monitoring service built at SoundCloud and inspired by Prometheus.</p>
What problems did we encounter with the traditional push-based model for exception monitoring. Thundering herd issues with bad deployments Difficulty navigating large volumes of logs for identifying exceptions An alternative pull-based model that scales well with the number of exceptions and instances. Aggregation + sampling for concrete occurrences Limitations and trade-offs (short lived processes and fork-based application servers) An implementation of such model into the open source project "Periskop" Initial Development Server and client-libraries Newly added features and roadmap (push-gateway, federation, time series visualization, integrations)

Additional information

Type devroom

More sessions

2/6/22
Monitoring and Observability
Richard Hartmann
D.monitoring
<p>Opening!</p>
2/6/22
Monitoring and Observability
Vincent Behar
D.monitoring
<p>In this session, we’ll see why we adopted OpenTelemetry &amp; its collector for an internal platform at Ubisoft - to collect/process/export all our logs, metrics, and traces. We’ll explain how we handled the required mindset change: why people should instrument more their code, and how to onboard them. And of course, we’ll talk about the benefits of fully adopting OpenTelemetry.</p> <p>The intended audience is people who want to adopt OpenTelemetry, or who are already using part of it - ...
2/6/22
Monitoring and Observability
Bram Vogelaar
D.monitoring
<p>A gentle introduction to Observability and how to setup a highly available monitoring platform across multiple datacenters.</p> <p>During this talk we will investigate how we can setup and monitor an monitoring setup across 2 DCs using Prometheus, Loki, Tempo, Alertmanager and Grafana. monitoring some services with some lessons learned along the way.</p>
2/6/22
Monitoring and Observability
Ryan Perry
D.monitoring
<p>Profiling is an effective way of understanding which parts of your application are consuming the most resources. Traditionally, logs, metrics and traces have been considered the three pillars of observability, but more recently profiling has emerged as a fourth pillar to be used alongside these other observability tools.</p> <p>Continuous Profiling, in particular, adds a dimension of time that allows you to understand your system’s resource usage (i.e. CPU, Memory, etc.) over time and gives ...
2/6/22
Monitoring and Observability
D.monitoring
<p>In this session, we’ll see eBPF monitoring in action applied to the Kafka world as an example of a complex Java application: identify Kafka consumers, producers, and brokers, see how they interact with each other and how many resources they consume. We'll even show how to measure consumer lag without external components. If you want to know what’s next in Java and Kafka observability in Kubernetes, this session is for you.</p>
2/6/22
Monitoring and Observability
Matthias Loibl
D.monitoring
<p>Continuous profiling is a widely used practice at Google but has only recently started gaining popularity in the Observability space, however, resources on this topic are still rare compared to other observability signals especially on open source projects. This talk intends to educate the wider community about the possibilities of continuous profiling, and give a glimpse into open-source tooling allowing everyone to join in on the practice and enabling everyone to build better software.</p>
2/6/22
Monitoring and Observability
Cezar Craciunoiu
D.monitoring
<p>Unikraft, and similar unikernels, offer isolation by running a single application inside a separated virtual machine. As such, extracting information from the machine can prove difficult. Moreover, because Unikraft offers support for running a single process at a time, alternate solutions had to be found for exporting data. Prometheus is a common tool used to collect and visualize data that offers decoupling from the observed system, as such, we saw it as a prime candidate for exporting ...