Session
Schedule FOSDEM 2020
Open Research Tools and Technologies

AMENDMENT Transforming scattered analyses into a documented, reproducible and shareable workflow

AW1.126
Sébastien Rochette
This presentation is a feedback from experience on helping a researcher transforming a series of scattered analyses into a documented, reproducible and shareable workflow. Time allocated by researchers to program / code the analyses required to answer their scientific questions is usually low compared to other tasks. As a result, multiple small experiments are developed and outputs are gathered as best as possible to be presented in a scientific paper. However, science is not only about sharing results but also sharing methods. How can we make our results reproducible when we developed multiple, usually undocumented analyses? What do we do if the program is only applicable to our computer directory architecture? This is always possible to take time to rewrite, re-arrange and document analyses at the time we want/have to share them. Here, I will take the exemple of a "collaboration fest" where we dissected R scripts of a researcher in ecology. We started a reproducible, documented and open-source R-package along with its website, automatically built using continuous integration: https://cesco-lab.github.io/Vigie-Chiro_scripts/. However, can we think, earlier in the process, a better way to use our small programming time slots by adopting a method that will save time in our future? In this aim, I will present a documentation-first method using little time while writing analyses, but saving a lot when the time has come to share your work.

Session type (Lecture or Lightning Talk)

Lecture

Session length (20-40 min, 10 min for a lightning talk)

30 min

Expected prior knowledge / intended audience

No prior knowledge expected. Example will be about building documentation for R software but any developper, using any programming language may be interested in the method adopted.

Speaker bio

Sébastien Rochette has a PhD in marine ecology. After a few years has a researcher in ecology, he joined ThinkR (https://rtask.thinkr.fr), a company giving courses and consultancy around the R-software. Along with commercial activities, he is highly involved in the development of open-source R packages. He also shares his experience with the R-community through free tutorials, blog posts, online help and other conferences. https://statnmap.com/

Links to code / slides / material for the talk (optional)

I wrote a blog post in French about what I am planning to present: https://thinkr.fr/transformer-plusieurs-scripts-eparpilles-en-beau-package-r/

This topic is also related to another blog post: https://rtask.thinkr.fr/when-development-starts-with-documentation/

Links to previous talks by the speaker

Talks about R are in my Github repository: https://github.com/statnmap/prez/. The "README" lists talks that have a live recorded video.

As a researcher, I also gave multiple talks about marine science, modelling and other topics related to my research.

Please note that this talk was originally scheduled to be at 17h. The talk originally in this slot was "Developing from the field." by Audrey Baneyx and Robin de Mourat which will now take place at 17h.

Additional information

Type devroom

More sessions

2/1/20
Open Research Tools and Technologies
Jan Grewe
AW1.126
The reproducibility crisis has shocked the scientific community. Different papers describe this issue and the scientific community has taken steps to improve on it. For example, several initiatives have been founded to foster openness and standardisation in different scientific communities (e.g. the INCF[1] for the neurosciences). Journals encourage sharing of the data underlying the presented results, some even make it a requirement. What is the role of open source solutions in this respect? ...
2/1/20
Open Research Tools and Technologies
Julia Sprenger
AW1.126
The approaches used in software development in an industry setting and a scientific environment are exhibit a number of fundamental differences. In the former industry setting modern team development tools and methods are used (version control, continuous integration, Scrum, ...) to develop software in teams with a focus on the final software product. In contrast, in the latter scientific environment a large fraction of scientific code is produced by individual scientists lacking thorough ...
2/1/20
Open Research Tools and Technologies
Aniket Pradhan
AW1.126
NeuroFedora is an initiative to provide a ready to use Fedora-based Free/Open source software platform for neuroscience. We believe that similar to Free software; science should be free for all to use, share, modify, and study. The use of Free software also aids reproducibility, data sharing, and collaboration in the research community. By making the tools used in the scientific process more comfortable to use, NeuroFedora aims to take a step to enable this ideal.
2/1/20
Open Research Tools and Technologies
AW1.126
Health Data is traditionally held and processed in large and complex mazes of hospital information systems. The market is dominated by vendors offering monolithic and proprietary software due to the critical nature of the supported processes and - in some cases - due to legal requirements. The “digital transformation”, “big data” and “artificial intelligence” are some of the hypes that demand for improved exchange of health care data in routine health care and medical research alike. ...
2/1/20
Open Research Tools and Technologies
Michael Hanke
AW1.126
Contemporary sciences are heavily data-driven, but today's data management technologies and sharing practices fall at least a decade behind software ecosystem counterparts. Merely providing file access is insufficient for a simple reason: data are not static. Data often (and should!) continue to evolve; file formats can change, bugs will be fixed, new data are added, and derived data needs to be integrated. While (distributed) version control systems are a de-facto standard for open source ...
2/1/20
Open Research Tools and Technologies
Lilly Winfree
AW1.126
Generating insight and conclusions from research data is often not a straightforward process. Data can be hard to find, archived in difficult to use formats, poorly structured and/or incomplete. These issues create “friction” and make it difficult to use, publish and share data. The Frictionless Data initiative (https://frictionlessdata.io/) at Open Knowledge Foundation (http://okfn.org) aims to reduce friction in working with data, with a goal to make it effortless to transport data among ...
2/1/20
Open Research Tools and Technologies
Mateusz Kuzak
AW1.126
ELIXIR is an intergovernmental organization that brings together life science resources across Europe. These resources include databases, software tools, training materials, cloud storage, and supercomputers.