Bioinformatics & Computational Biology

nf-core proteinfold: a community-driven open source pipeline for deep learning based protein structure prediction methods

K.4.601
Jose Espinosa-Carrasco
<p>The release of AlphaFold2 paved the way for a new generation of prediction tools for studying unknown proteomes. These tools enable highly accurate protein structure predictions by leveraging advances in deep learning. However, their implementation can pose technical challenges for users, who must navigate a complex landscape of dependencies and large reference databases. Providing the community with a standardized workflow framework to run these tools could ease adoption.</p> <p>Thanks to its adherence to nf-core guidelines, the nf-core/proteinfold pipeline simplifies the application of state-of-the-art protein structure modeling techniques by taking advantage of the optimized execution Nextflow’s capabilities on both cloud providers and HPC infrastructures. The pipeline integrates several popular methods, namely AlphaFold 2 and 3, Boltz 1 and 2, ColabFold, ESMFold, HelixFold, RosettaFoldAA, and RosettaFold2NA. Following structure prediction, nf-core/proteinfold generates an interactive report that allows users to explore and compare predicted models together with standardized confidence metrics, harmonized across methods for consistent interpretation. The workflow also integrates Foldseek-based structural search, enabling the identification of known protein structures similar to the predicted models.</p> <p>The pipeline is developed through an international collaboration that includes Australian BioCommons, the Centre for Genomic Regulation, Pompeu Fabra University, and the European Bioinformatics Institute, and it already serves as a central resource for structure prediction at several of these organisations and others. This broad adoption demonstrates how nf-core/proteinfold, through its open-source and community-driven development model, is lowering the barrier to using deep learning based approaches for protein structure prediction in everyday research.</p> <p>Interestingly, nf core proteinfold represents a new generation of Nextflow workflows designed to place multiple alternative methods for the same task within one coherent framework. This design makes it possible to compare the different procedures, providing a basis for developing combined approaches that may mature into meta-methods.</p> <h3>More info</h3> <p><a href="https://nf-co.re/">nf-core project</a></p> <p><a href="https://nf-co.re/proteinfold">nf-core/proteinfold pipeline</a></p> <p><a href="https://github.com/nf-core/proteinfold">nf-core/proteinfold GitHub repository</a></p> <p><a href="https://nf-co.re/join">Join nf-core</a></p> <p><a href="https://bsky.app/profile/josesca.bsky.social">My bluesky</a></p>

Additional information

Live Stream https://live.fosdem.org/watch/k4601
Type devroom
Language English

More sessions

1/31/26
Bioinformatics & Computational Biology
K.4.601
<p>Nextflow is a workflow manager that enables scalable and reproducible workflows. Nextflow is complemented by the nf-core community effort that aims at developing and supporting a curated collection of Nextflow pipelines, developed according to a well-defined standard, and their components. Since its inception, nf-core has set rigorous standards for documentation, testing, versioning and packaging of workflows, ensuring that pipelines can be "run anywhere" with confidence.</p> <p>In order to ...
1/31/26
Bioinformatics & Computational Biology
K.4.601
<p>Modern research workflows are often fragmented, requiring scientists to navigate a complex path from the lab bench to computational analysis. The journey typically involves documenting experiments in an electronic lab notebook and then manually transferring data to a separate computational platform for analysis. This process creates inefficiencies, introduces errors, and complicates provenance tracking. To address this challenge, we have developed a tight, two-way integration between two ...
1/31/26
Bioinformatics & Computational Biology
László Kupcsik
K.4.601
<p>I will share how adopting <a href="https://nixos.org/">Nix</a> transformed my bioinformatics practice, turning fragile, environment‑dependent pipelines into reliable, reproducible workflows. I will walk the audience through the practical challenges of traditional Docker‑centric setups, introduce the core concepts of Nix and its package collection (nixpkgs), and explain how tools such as <a href="https://docs.ropensci.org/rix/">rix</a> and <a ...
1/31/26
Bioinformatics & Computational Biology
Aurélien Luciani
K.4.601
<p><strong>ProtVista</strong> is an open-source protein feature visualisation tool used by UniProt, the high-quality, comprehensive, and freely accessible resource of protein sequence and functional information. It is built upon the suite of modular <strong>standard and reusable web components</strong> called Nightingale, a <strong>collaborative open-source</strong> library. It enables integration of protein sequence features, variants, and structural data in a unified viewer. These components ...
1/31/26
Bioinformatics & Computational Biology
Ben Busby
K.4.601
<p>As our tools evolve from scripts and pipelines to intelligent, context-aware systems, the interfaces we use to interact with data are being reimagined.</p> <p>This talk will explore how accelerated and integrated compute is reshaping the landscape of biobank-scale datasets, weaving together genomics, imaging, and phenotypic data with and feeding validatable models. Expect a whirlwind tour through: · Ultra-fast sequence alignment and real-time discretization · Estimating cis/trans effects on ...
1/31/26
Bioinformatics & Computational Biology
Bob Van Hove
K.4.601
<p>Advances in DNA sequencing and synthesis have made reading and writing genetic code faster and cheaper than ever. Yet most labs run experiments at the same scale they did a decade ago, not because the biology is limiting, but because the software hasn't caught up.</p> <p>The conventional digital representation of a genome is a string of nucleotides. This works well enough for simple projects, but the model breaks down as complexity grows. Sequences aren't constant: they evolve, mutate, and ...
1/31/26
Bioinformatics & Computational Biology
Vissarion Fisikopoulos
K.4.601
<p>dingo is a Python package that brings advanced scientific-computing techniques into the hands of developers and researchers. It focuses on modelling metabolic networks — complex systems describing how cells process nutrients and energy — by simulating the full range of possible biochemical flux states. Historically, exploring these possibilities in large-scale networks has been computationally prohibitive. dingo introduces state-of-the-art Monte Carlo sampling algorithms that dramatically ...