Open Media

Spleeter by Deezer

Open-Sourcing a Machine-Learning Music Source Separation Software
Source separation, stem separation, de-mixing are all different ways of referring to the same problem of recovering the mono-instruments tracks that were mixed together to produce a music file. Recently, the research team at Deezer released a free and open source software as well as trained models to perform multi-source separation of music, with state-of-the-art accuracy. In this presentation we come back on our journey to open sourcing the Spleeter library, from doing the ground research, training the models, to releasing them. We put emphasis on the technological challenges that had to be solved as well as the practical and legal considerations that came into play.
Released on october 29th 2019, the Spleeter (https://github.com/deezer/spleeter) github repository received more than 5000 stars on its first week online and numerous positive feedbacks as well as press coverage. This talk will explain how we went from research code to this fairly easy to use open Python library, that integrates pre-trained models for inference and re-training. While not a broadly known topic, the problem of source separation has interested a large community of music signal researchers for a couple of decades now. It starts from a simple observation: music recordings are usually a mix of several individual instrument tracks (lead vocal, drums, bass, piano etc..). The task of music source separation is: given a mix can we recover these separate tracks (sometimes called stems)? This has many potential applications: think remixes, upmixing, active listening, educational purposes, but also pre-processing for other tasks such as transcription. The current state-of-the-art systems start to give convincing results on very wide catalogs of tracks, but the possibility of training such models remains largely bound by training data availability. In the case of copyrighted material like music, getting access to enough data is a pain point, and a source of inequality between research teams. Beside, an essential feature of good scientific research is that it must be reproducible by others. For these reasons and to even the playing field, we decided to not only release the code, but also our models pretrained on a carefully crafter in-house dataset. Specific topics on which our presentation will dwell on are: - technical aspects of the models architecture and training - software design, and how to leverage tensorflow’s API in a user facing python library - how to package and version a code that leverages pre-trained models and that can be run on different architectures: CPU and GPU. - licensing and legal concerns - what we learned along the way - legacy

Additional information

Type devroom

More sessions

2/2/20
Open Media
Arnaud Pichon
UB2.147
Tesselle is an open source image viewer allowing anyone to open, annotate and share big images on the web. It is part of the "Quinoa" project family, a suite of digital storytelling tools tailored for the FORCCAST teaching program and the scientific activities of Sciences Po's médialab. (list tools with links ?)
2/2/20
Open Media
Aaron Boxer
UB2.147
JPEG 2000 was developed to replace the very successful JPEG standard, but it has instead remained a niche code. With recent updates to the standard speeding up decode by 10X, is world domination around the corner ? This talk will describe many of the sophisticated features that JPEG 2000 offers, and discuss why a 20 year old standard may be the codec of the future.
2/2/20
Open Media
Akhil Gangadharan Kurungadathil
UB2.147
How QML, a language prominently used for designing UI, could be used to create title video clips containing text and/or images which can then be rendered and composited over videos in the video editing process. Kdenlive's Google Summer of Code 2019 project tried to achieve this and is still under active development.
2/2/20
Open Media
Xavier Claessens
UB2.147
Magic Leap One is an augmented reality glasses. Let's run an Open Source Browser (Mozilla Servo) using GStreamer multimedia framework on it.
2/2/20
Open Media
Jean Le Feuvre
UB2.147
In this talk, we present the next release of GPAC, the complete rearchitecture of its streaming core, the many new features and possibilities of the multimedia framework. Get ready for a lot of OTT/IP streaming and broadcast, encryption, packaging and media composition!
2/2/20
Open Media
Andreas Tai
UB2.147
IMSC is the Internet Media Subtitle and Caption Profile of the W3C Timed Text Markup Languages. The presentation will show how to combine different open source tools to create, render and validate IMSC subtitles. The focus will be on an open-source editor for IMSC.
2/2/20
Open Media
Olivier Crête
UB2.147
Open source stacks such as GStreamer, ffmpeg and UPipe now implement a large number of different ways to stream audio & video over a network. Just to name a few, there are RTSP, SRT, RIST, WebRTC, HLS, DASH, AES67, SmoothStreaming, RTMP! Some are for local networks and some target the Internet, depending on the use-case, these protocols have different upsides and downsides. To create a successful project, one needs to select the best suited technology. I'll go over the various protocols and ...