Session
Schedule FOSDEM 2022
Python

SPyQL - SQL with Python in the middle

Making command-line data processing more intuitive, readable and powerful
D.python
Daniel Moura
<p><a href="https://github.com/dcmoura/spyql">SPyQL</a> is SQL with Python in the middle, an open-source project fully written in Python for making command-line data processing more intuitive, readable and powerful. Try mixing in the same pot: a SQL SELECT for providing the structure, Python expressions for defining transformations and conditions, the essence of awk as a data-processing language, and the JSON handling capabilities of jq.</p> <p>In this event I will describe the SPyQL language, highlighting its unique features. By the end of this presentation you will know how to write SPyQL queries (probably you already know :-) ), and you will be looking forward to starting using it! I will be solving the task of calculating aggregations in awk (for a CSV), in jq (for a JSON) and in SPyQL (for both). I will then show you a couple more examples where we will be using SPyQL 1) to automate a scaling operation of k8s pods, and 2) to continuously calculate statistics from a Kafka data stream.</p>
How does a SPyQL query looks like? IMPORT pendulum AS p SELECT (p.now() - p.from_timestamp(purchase_ts)).in_days() AS days_ago, sum_agg(price * quantity) AS total FROM csv WHERE department.upper() == 'IT' and purchase_ts is not Null GROUP BY 1 ORDER BY 1 TO json Simple, readable, and, as all SpyQL programs, it's an 1-liner. In a single statement we are 1) reading a CSV (of purchases) with automatic header detection, dialect detection, type inference and casting, 2) filtering out records that do not belong to the IT department or do not have a purchase timestamp 3) summing the total purchases and grouping by how many days ago they happened, 4) sorting from the most to the least recent day and 5) writing the result in JSON format. All this without loading the dataset into memory. SpyQL will change data-processing in the terminal, making it accessible to anyone who knows a little-bit of Python and understands the basics of a SQL SELECT. On the other hand, it will give super-powers to experienced users. The possibilities are endless as you can import any Python library, and pipe data from/to any command-line tool. From querying APIs and Kafka, to write to files or databases, SpyQL will be the tool of choice for processing data in the command-line!

Additional information

Type devroom

More sessions

2/6/22
Python
Francesco Tisiot
D.python
<p>Apache Kafka is recognised as the best data streaming platform around, but it can be difficult to observe what is happening when you are just getting started with this excellent technology platform. In this session, you will get a tour of key Kafka features using the delightful web UI of Jupyter notebooks.</p> <p>Use the notebooks to see Kafka and Python in action, producing and consuming records. We’ll also cover how to get the best from your application by making good use of topic ...
2/6/22
Python
Sebastiaan Zeeff
D.python
<p>Driven by the immense popularity of asynchronous frameworks, such as FastAPI, asynchronous database support suddenly became a hot topic in the Python community. As talking to your database often forms a significant portion of the input and output of your application, it's important to do that asynchronously as well. With the release of version 1.4, SQLAlchemy added support for Asynchronous I/O for both its core and ORM features. This means that you can now use the popular SQL toolkit for ...
2/6/22
Python
Haki Benita
D.python
<p>Concurrency in web applications is so easy to get wrong, and so hard to identify and debug when it comes to bite you. In this talk I'm going to present common concurrency issues with even the simplest application, and suggest ways to identify and prevent them!</p>
2/6/22
Python
Jerry Pussinen
D.python
<p>Type hints are an essential part of modern Python. By combining type hints with a static type checker and libraries which enable runtime type checking, it is possible to achieve runtime type-safe Python applications.</p> <p>This talk discusses the motivation for extensive usage of type hints, how to gradually add types to existing projects, how to deal with untyped dependencies, and finally, how to achieve runtime type-safety without sacrificing performance.</p>
2/6/22
Python
Julin Shaji
D.python
<p>Let's look at a few 'tricks' with unicode that can make a program look like it's doing (or not doing, for that matter) something it doesn't. Based on the findings in a recent publication, these are well worth being aware of; both from a security point of view and for simply being on your guard against friends who may be trying to pull a prank on you :-D.</p> <p>These tricks are well suited for trojan attacks as it can be difficult to detect even with a manual code review thanks to aspects of ...
2/6/22
Python
Maarten De Paepe
D.python
<p>https://github.com/maarten-dp/mimics is a tool with the intention to defer actions done on objects or classes. These actions can then be executed at a later date when the subject, to which these actions should be applied to, is available. This is mostly a joke project with no real world applications, but it has some neat implementations showcasing the power of python.</p>
2/6/22
Python
Mehdi Raddadi
D.python
<p>When developing a plateform with a large code base, multiple Django applications in a monolith, feature flags are a must have to keep your release cycle short. Those allow teams to develop a feature throughout multiple releases without users being aware that this feature is under development. Shorter release cycle are still possible without hindering quality teams or users.</p> <p>At GitGuardian, we use feature flags for multiple purposes: distinguishing between code deployment and feature ...