Lightning Talks

DuckDB

An Embeddable Analytical Database

February 2, 2020
12:40 PM – 12:55 PM

H.2215 (Ferrer)

Hannes Mühleisen

We present DuckDB, our new, Open Source embedded analytical data management system.

Data management systems have evolved into large monolithic database servers running as stand-alone processes. This is partly a result of the need to serve requests from many clients simultaneously and partly due to data integrity requirements. While powerful, stand-alone systems require considerable effort to set up properly and data access is constricted by their client protocols. There exists a completely separate use case for data management systems, those that are embedded into other processes where the database system is a linked library that runs completely within a ``host'' process. The most well-known representative of this group is SQLite, the most widely deployed SQL database engine with more than a trillion databases in active use. SQLite strongly focuses on transactional (OLTP) workloads, and contains a row-major execution engine operating on a B-Tree storage format. As a consequence, SQLite's performance on analytical (OLAP) workloads is very poor. There is a clear need for embeddable analytical data management. This needs stems from two main sources: Interactive data analysis and edge computing. Interactive data analysis is performed using tools such as R or Python. The basic data management operators available in these environments through extensions (dplyr, Pandas, etc.) closely resemble stacked relational operators, much like in SQL queries, but lack full-query optimization and transactional storage. Embedded analytical data management is also desirable for edge computing scenarios. For example, connected power meters currently forward data to a central location for analysis. This is problematic due to bandwidth limitations especially on radio interfaces, and also raises privacy concerns. An embeddable analytical database is very well-equipped to support this use case, with data analyzed on the edge node. The two use cases of interactive analysis and edge computing appear orthogonal. But surprisingly, the different use cases yield similar requirements. In this talk, we present our new system, DuckDB. DuckDB is a new purpose-built embeddable relational database management system created at the Database Architectures group of the CWI. DuckDB is available as Open-Source software under the permissive MIT license. To the best of our knowledge, there currently exists no purpose-built embeddable analytical database despite the clear need outlined above. DuckDB is no research prototype but built to be widely used, with millions of test queries run on each commit to ensure correct operation and completeness of the SQL interface. DuckDB is built from the ground up with analytical query processing in mind. As storage, DuckDB uses a single-file format with tables partitioned into columnar segments. Data is loaded into memory using a traditional buffer manager, however, the blocks that are loaded are significantly larger than that of a traditional OLTP system to allow for efficient random seeks of blocks. Queries are processed using a vectorized query processing engine to allow for high performance batch processing and SIMD optimizations.

Additional information

Type	lightningtalk

More sessions

2/1/20	Civil society needs Free Software hackers Lightning Talks Matthias Kirschner H.2215 (Ferrer) More and more traditionally processes in our society now incorporate, and are influenced by software.
2/1/20	A tool for Community Supported Agriculture (CSA) management, OpenOlitor Lightning Talks Mikel Cordovilla H.2215 (Ferrer) OpenOlitor is a SaaS open-source tool facilitating the organization and management of CSAs (Community Supported Agriculture) communities. This tool covers a large spectrum of functionalities needed for CSAs such as member management, emailing, invoicing, share planning and delivery, absence scheduling, etc. This software is organized and monitored by an international community that promotes the tool, helps operate it and support the interested communities. In order to promote the sustainability ...
2/1/20	What's in my food ? Open Food Facts, the Wikipedia of Food Lightning Talks Pierre Slamich H.2215 (Ferrer) Open Food Facts is a collaborative and crowdsourced database of food products from the whole planet, licensed under the Open Database License (ODBL). It was launched in 2012, and today it is powered by 27000 contributors who have collected data and images for over 1 million products in 178 countries (and growing strong…) This is the opportunity to learn more about Open Food Facts, and the latest developments of the project.
2/1/20	Web3 - the Internet of Freedom, Value, and Trust Lightning Talks Bruno Škvorc H.2215 (Ferrer) For as long as human society has existed, humans have been unable to trust each other. For millennia, we relied on middlemen to establish business or legal relationships. With the advent of Web2.0, we also relayed the establishment of personal connections, and the system has turned against us. The middlemen abuse our needs and their power and we find ourselves chained to convenience at the expense of our own thoughts, our own privacy. Web3 is a radical new frontier ready to turn the status quo ...
2/1/20	Next, the programmable web browser Lightning Talks Atlas Engineer H.2215 (Ferrer) While actual browsers expose their internals through an API and limit access to the host system, Next doesn't, allowing for infinite extensibility and inviting the users to program their web browser. On top of that, it doesn't tie itself to a particular platform (we currently provide bindings to WebKit and WebEngine) and allows for live code reloads, thanks to the Common Lisp language, about which we'll share our experience too.
2/1/20	AMENDMENT Weblate: open-source continuous localization platform Lightning Talks Michal Čihař H.2215 (Ferrer) Please note that this talk will now be given by Michal Čihař instead of Václav Zbránek. You will learn how to localize your project easily with little effort, open-source way. No repetitive work, no manual work with translation files anymore. Weblate is unique for its tight integration to VCS. Set it up once and start engaging the community of translators. More languages translated means more happy users of your software. Be like openSUSE, Fedora, and many more, and speak your users' ...
2/1/20	Kapow! A Web Framework for the Shell Lightning Talks Roberto Abdelkader Martínez Pérez H.2215 (Ferrer) This talk is about "Kapow!" an open source webframework for the shell developed by BBVA Innovation Labs. We will talk about the current development of the project including an overview of Kapow!'s technology stack and the recent release of the first stable version.

FOSDEM 2020

2/1/20 – 2/2/20

Event

FOSS Events

Created by @foss_events 25 Follower

Event Calendar