Monitoring and Observability

Are You Testing Your Observability? Patterns for Instrumenting Your Services

UD2.120 (Chavanne)
Observability is the key to understand how your application runs and behaves in action. This is especially true for distributed environments like Kubernetes, where users run Cloud-Native microservices. Among many other observability signals like logs and traces, the metrics signal has a substantial role. Sampled measurements observed throughout the system are crucial for monitoring the health of the applications and, they enable real-time, actionable alerting. While there are many open-source robust libraries, in various languages, that allow us to easily instrument services for backends like Prometheus, there are still numerous possibilities to make a mistake or misuse those tools. During this talk, two engineers from Red Hat: Kemal and Bartek (Prometheus and Thanos project maintainer) will discuss valuable patterns and best practices for instrumenting your application. The speakers will go through common pitfalls and failure cases while sharing valuable insights and methods to avoid those mistakes. In addition, this talk will demonstrate, how to leverage unit testing to verify the correctness of your observability signals. How it helps and why it is important. Last but not least, the talk will cover a demo of the example instrumented application based on the experience and projects we maintain. The audience will leave knowing how to answer the following important questions: What are the essential metrics that services should have? Should you test your observability? What are the ways to test it on a unit-test level? What are the common mistakes while instrumenting services and how to avoid them? And more!
The end goal of this talk is to demonstrate to the audience, how to harvest the powers of metric-based instrumentation in their applications. We would like to share some pragmatic, best practices and common patterns that we learned while maintaining several open-source projects. During this talk: We will discuss valuable patterns and best practices for instrumenting libraries and applications. We will go through a set of common pitfalls failure cases, and methods to avoid those mistakes. Some of the topics we plan to mention: common cardinality issues, summaries vs histograms, choosing histogram bucket, testing, instrumenting libraries vs applications, common middlewares etc We will demonstrate, why, when and how to leverage unit testing to verify your observability signals. We plan to present a demo of the example instrumented application. We plan to use Go as an example language of such application but the talk should be mostly language agnostic.

Additional information

Type devroom

More sessions

2/2/20
Monitoring and Observability
Richard Hartmann
UD2.120 (Chavanne)
Introduction and welcome to the monitoring and observability devroom
2/2/20
Monitoring and Observability
Juraci Paixão Kröhling
UD2.120 (Chavanne)
Distributed tracing is a tool that belongs to every developer's tool belt, but what it actually can do remains a mystery to most developers. In this slideless talk, we will introduce you to the world of distributed tracing by developing a cloud native application from scratch and applying all important distributed tracing concepts in practice, at first by hand and then by using existing libraries to automate our work. You will learn not only what distributed tracing is, but how it works, what it ...
2/2/20
Monitoring and Observability
Andrej Ocenas
UD2.120 (Chavanne)
This talk presents current capabilities of Grafana to integrate metrics, logs and traces and shows how to setup both Grafana and application code to be able to correlate all 3 in Grafana. It assumes some familiarity with Grafana to follow the How To steps but should be suitable for beginner users. Afterwards it shows future direction of Grafana in context of "Experiences", for even more seamless experience when correlating data from multiple data sources.
2/2/20
Monitoring and Observability
Deepika Upadhyay
UD2.120 (Chavanne)
Jaeger and Opentracing provide ready to use tracing services for distributed systems and are becoming widely used de-facto standard because of their ease of use. Making use of these libraries, Ceph, can reach to a much-improved monitoring state, supporting visibility to its background distributed processes. This would, in turn, add up to the way Ceph is being debugged, “making Ceph more transparent” in identifying abnormalities. In this session, the audience will get to learn about using ...
2/2/20
Monitoring and Observability
Richard Hartmann
UD2.120 (Chavanne)
Society would end if all ModBus stopped working overnight. Good thing it has zero security built in. Still, it's useful to get data out of industrial systems, be they a datacenter or a power plant.
2/2/20
Monitoring and Observability
Jean-Marc Davril
UD2.120 (Chavanne)
According to the United Nations, 2.5 billion more people will be living in cities by 2050. This trend has caused indoor farming to draw a lot of attention and effort in recent years, in an attempt to scale the production of highly nutritious, healthy food inside cities. Over the past 3 years, Agricool has recycled 20 industrial containers into farms that grow strawberries, herbs and salads, in the very heart of cities, and without any pesticide. These urban farms are currently operated in Paris ...
2/2/20
Monitoring and Observability
Rob Skillington
UD2.120 (Chavanne)
The cardinality of monitoring data we are collecting today continues to rise, in no small part due to the ephemeral nature of containers and compute platforms like Kubernetes. Querying a flat dataset comprised of an increasing number of metrics requires searching through millions and in some cases billions of metrics to select a subset to display or alert on. The ability to use wildcards or regex within the tag name and values of these metrics and traces are becoming less of a nice-to-have ...