Session
Schedule FOSDEM 2020
Python

How to write a scikit-learn compatible estimator/transformer

Tips and tricks, testing your estimator, and must-watch related current developments
UB2.252A (Lameere)
Adrin Jalali
This is a hands-on short tutorial on how to write your own estimator or transformer which can be used in a scikit-learn pipeline, and works seamlessly with the other meta-estimators of the library. It also includes how they can be conveniently tested with a simple set of tests.

In many data science related tasks, the use-case specific requirements require us to slightly manipulate the behavior of some of the estimators or transformers present in scikit-learn. Some of the tips and requirements are not necessarily well documented by the library, and it can be cumbersome to find those details.

In this short tutorial, we go through an example of writing our own estimator, test it against the scikit-learn's common tests, and see how it behaves inside a pipeline and a grid search.

There has also been recent developments related to the general API of the estimators which require slight modifications by the third party developers. I will cover these changes and point you to the activities to watch as well as some of the private utilities which you can use to improve your experience of developing an estimator.

The materials of the talk will be available on github as a jupyter notebook.

Additional information

Type devroom

More sessions

2/1/20
Python
Peter Czanik
UB2.252A (Lameere)
From my talk you will learn about some lesser-known features of sudo, and how you can make your security more flexible by extending sudo using Python.
2/1/20
Python
Raphaël Gomès
UB2.252A (Lameere)
While working on the Mercurial version control system, we hit our heads against the limits of Python's performance. In this talk we will see how Python and Rust can cohabit to play off of each other's strenghts to improve a big open-source project, and what advances have been made in bridging the two languages.
2/1/20
Python
Rémy Hubscher
UB2.252A (Lameere)
For almost 20 years, we relied on a CGI based protocol called WSGI to use Python to handle HTTP requests and responses software. Because Python is singled threaded we relied on a couple of hacks such as Gunicorn or uWSGI to share a socket through multiple processes. However the cost of all these multiple processes was a bit heavy and error prone. Through Django Channels Andrew Godwin paved the way for a better way of creating web services with Python. This work landed in Django 3.0. Let's ...
2/1/20
Python
Stephen Finucane
UB2.252A (Lameere)
How does one manage and document change in Python projects, be that new features or deprecation or removal of a feature? Let's explore some of the tools a Python developer can keep in their toolbox for just this purpose.
2/1/20
Python
Lionel Lonkap Tsamba
UB2.252A (Lameere)
We, as developer, aim to provide code that, almost matches our team code style, looks better and behaves right. Static code analysis (SCA) tools are one of the way to achieves that. But, with multi-programming languages projects and all kinds of code related needs, It's difficult to address all thoses usecases without dealing with a vast majority of SCA tools. Coala is a — language agnostic — static code analysis framework that provides a common command-line interface for linting and fixing ...
2/1/20
Python
Miguel-Ángel Fernández
UB2.252A (Lameere)
SortingHat is an open source Python tool that helps to manage the different contributor identities within an open source project. Under the hood SortingHat relies on a relational database, which can be queried via SQL, command line or directly via its Python interface. However, these ways of interacting with SortingHat hinder its integration with external tools, web interfaces and new web technologies (e.g., Django, REST services). To overcome these obstacles, we have evolved SortingHat's ...
2/1/20
Python
Nicolas Crocfer
UB2.252A (Lameere)
All Python developer who want to run asynchronous tasks should know Celery. If you have already used it, you know how great it is ! But you also discovered how it can be complicated to follow the state of a complex workflow. Celery Director is a tool we created at OVH to fix this problem : using some concepts of Event Sourcing, Celery Director helps us to follow the whole lifecycle of our workflows. It allows us to check when a problem occurred and relaunch the whole DAG (or just a subpart if ...