Monitoring and Observability

Querying millions to billions of metrics with M3DB's inverted index

UD2.120 (Chavanne)
Rob Skillington
The cardinality of monitoring data we are collecting today continues to rise, in no small part due to the ephemeral nature of containers and compute platforms like Kubernetes. Querying a flat dataset comprised of an increasing number of metrics requires searching through millions and in some cases billions of metrics to select a subset to display or alert on. The ability to use wildcards or regex within the tag name and values of these metrics and traces are becoming less of a nice-to-have feature and more useful for the growing popularity of ad-hoc exploratory queries. In this talk we will look at how Prometheus introduced the concept of a reverse index existing side-by-side with a traditional column based TSDB in a single process. We will then walk through the evolution of M3’s metric index, starting with ElasticSearch and evolving over the years to the current M3DB reverse index. We will give an in depth overview of the alternate designs and dive deep into the architecture of the current distributed index and the optimizations we’ve made in order to fulfill wildcards and regex queries across billions of metrics.

Additional information

Type devroom

More sessions

2/2/20
Monitoring and Observability
Richard Hartmann
UD2.120 (Chavanne)
Introduction and welcome to the monitoring and observability devroom
2/2/20
Monitoring and Observability
Juraci Paixão Kröhling
UD2.120 (Chavanne)
Distributed tracing is a tool that belongs to every developer's tool belt, but what it actually can do remains a mystery to most developers. In this slideless talk, we will introduce you to the world of distributed tracing by developing a cloud native application from scratch and applying all important distributed tracing concepts in practice, at first by hand and then by using existing libraries to automate our work. You will learn not only what distributed tracing is, but how it works, what it ...
2/2/20
Monitoring and Observability
Andrej Ocenas
UD2.120 (Chavanne)
This talk presents current capabilities of Grafana to integrate metrics, logs and traces and shows how to setup both Grafana and application code to be able to correlate all 3 in Grafana. It assumes some familiarity with Grafana to follow the How To steps but should be suitable for beginner users. Afterwards it shows future direction of Grafana in context of "Experiences", for even more seamless experience when correlating data from multiple data sources.
2/2/20
Monitoring and Observability
Deepika Upadhyay
UD2.120 (Chavanne)
Jaeger and Opentracing provide ready to use tracing services for distributed systems and are becoming widely used de-facto standard because of their ease of use. Making use of these libraries, Ceph, can reach to a much-improved monitoring state, supporting visibility to its background distributed processes. This would, in turn, add up to the way Ceph is being debugged, “making Ceph more transparent” in identifying abnormalities. In this session, the audience will get to learn about using ...
2/2/20
Monitoring and Observability
Richard Hartmann
UD2.120 (Chavanne)
Society would end if all ModBus stopped working overnight. Good thing it has zero security built in. Still, it's useful to get data out of industrial systems, be they a datacenter or a power plant.
2/2/20
Monitoring and Observability
Jean-Marc Davril
UD2.120 (Chavanne)
According to the United Nations, 2.5 billion more people will be living in cities by 2050. This trend has caused indoor farming to draw a lot of attention and effort in recent years, in an attempt to scale the production of highly nutritious, healthy food inside cities. Over the past 3 years, Agricool has recycled 20 industrial containers into farms that grow strawberries, herbs and salads, in the very heart of cities, and without any pesticide. These urban farms are currently operated in Paris ...
2/2/20
Monitoring and Observability
Björn Rabenstein (Beorn)
UD2.120 (Chavanne)
Representing distributions in a metrics-based monitoring system is both important and hard. Doing it right unlocks many powerful use cases that would otherwise require expensive event processing. Prometheus offers the somewhat weirdly named Histogram and Summary metric types for distributions. How have they become what they are today with all their weal and woe? To help understand the present, let's shed light on the past. Studying this piece of Prometheus's history will also allow a glimpse of ...