Bioinformatics & Computational Biology

Gen: Git for genomes

K.4.601
Bob Van Hove
<p>Advances in DNA sequencing and synthesis have made reading and writing genetic code faster and cheaper than ever. Yet most labs run experiments at the same scale they did a decade ago, not because the biology is limiting, but because the software hasn't caught up.</p> <p>The conventional digital representation of a genome is a string of nucleotides. This works well enough for simple projects, but the model breaks down as complexity grows. Sequences aren't constant: they evolve, mutate, and are iterated on. Unlike software, there's no instant feedback loop to tell you if an edit worked; wetlab experiments take time. You gain some of that time back by working with multiple sequences in parallel. But keeping track of thousands of sequences and coordinate frames is tricky at best when a researcher is working solo, and far harder when collaborating with other people or agents on the same genetic codebase.</p> <p>Gen is a version control system built specifically for biological sequences (http://github.com/genhub-bio/gen). It models genomic data as a graph rather than flat text, preserving the full structure of variation, editing history, and experimental lineage. On top of this, projects are organized into repositories with branching, diffing, and merging, just like git. Git was first released 20 years ago and transformed how software teams collaborate on shared codebases. Gen brings that same workflow to biology.</p> <p>This talk will introduce Gen's design philosophy and walk through a real-world use case. Gen is open source under the Apache 2.0 license, implemented in Rust with a terminal interface and Python bindings, and designed to integrate with existing bioinformatics pipelines.</p>

Additional information

Live Stream https://live.fosdem.org/watch/k4601
Type devroom
Language English

More sessions

1/31/26
Bioinformatics & Computational Biology
K.4.601
<p>Nextflow is a workflow manager that enables scalable and reproducible workflows. Nextflow is complemented by the nf-core community effort that aims at developing and supporting a curated collection of Nextflow pipelines, developed according to a well-defined standard, and their components. Since its inception, nf-core has set rigorous standards for documentation, testing, versioning and packaging of workflows, ensuring that pipelines can be "run anywhere" with confidence.</p> <p>In order to ...
1/31/26
Bioinformatics & Computational Biology
K.4.601
<p>Modern research workflows are often fragmented, requiring scientists to navigate a complex path from the lab bench to computational analysis. The journey typically involves documenting experiments in an electronic lab notebook and then manually transferring data to a separate computational platform for analysis. This process creates inefficiencies, introduces errors, and complicates provenance tracking. To address this challenge, we have developed a tight, two-way integration between two ...
1/31/26
Bioinformatics & Computational Biology
László Kupcsik
K.4.601
<p>I will share how adopting <a href="https://nixos.org/">Nix</a> transformed my bioinformatics practice, turning fragile, environment‑dependent pipelines into reliable, reproducible workflows. I will walk the audience through the practical challenges of traditional Docker‑centric setups, introduce the core concepts of Nix and its package collection (nixpkgs), and explain how tools such as <a href="https://docs.ropensci.org/rix/">rix</a> and <a ...
1/31/26
Bioinformatics & Computational Biology
Jose Espinosa-Carrasco
K.4.601
<p>The release of AlphaFold2 paved the way for a new generation of prediction tools for studying unknown proteomes. These tools enable highly accurate protein structure predictions by leveraging advances in deep learning. However, their implementation can pose technical challenges for users, who must navigate a complex landscape of dependencies and large reference databases. Providing the community with a standardized workflow framework to run these tools could ease adoption.</p> <p>Thanks to ...
1/31/26
Bioinformatics & Computational Biology
Aurélien Luciani
K.4.601
<p><strong>ProtVista</strong> is an open-source protein feature visualisation tool used by UniProt, the high-quality, comprehensive, and freely accessible resource of protein sequence and functional information. It is built upon the suite of modular <strong>standard and reusable web components</strong> called Nightingale, a <strong>collaborative open-source</strong> library. It enables integration of protein sequence features, variants, and structural data in a unified viewer. These components ...
1/31/26
Bioinformatics & Computational Biology
Ben Busby
K.4.601
<p>As our tools evolve from scripts and pipelines to intelligent, context-aware systems, the interfaces we use to interact with data are being reimagined.</p> <p>This talk will explore how accelerated and integrated compute is reshaping the landscape of biobank-scale datasets, weaving together genomics, imaging, and phenotypic data with and feeding validatable models. Expect a whirlwind tour through: · Ultra-fast sequence alignment and real-time discretization · Estimating cis/trans effects on ...
1/31/26
Bioinformatics & Computational Biology
Vissarion Fisikopoulos
K.4.601
<p>dingo is a Python package that brings advanced scientific-computing techniques into the hands of developers and researchers. It focuses on modelling metabolic networks — complex systems describing how cells process nutrients and energy — by simulating the full range of possible biochemical flux states. Historically, exploring these possibilities in large-scale networks has been computationally prohibitive. dingo introduces state-of-the-art Monte Carlo sampling algorithms that dramatically ...