Open Research Tools and Technologies

Journalists are researchers like any others

1. Februar 2020
16:00 – 16:30

AW1.126

We are not journalists. But we are developers working for journalists. When we receive leaks, we are flooded by the huge amount of documents and the huge amount of questions that journalists have, trying to dig into this leak. Among others : * Where to begin ? * How many documents mention "tax avoidance" ? * How many languages are in this leaks ? * How many documents are in CSV ? Journalists have more or less the same questions as researchers ! So to help them answer all these questions, we developed Datashare. In a nutshell, Datashare is a tool to answer all your questions about a corpus of documents : just like Google but without Google and without sending information to Google. That means that it extracts content and metadata from all types of documents and index it. Then, it detects any people, locations, organizations and email addresses. The web interface expose all of that to let you have a complete overview of your corpus and search through it. Plus Datashare lets you star and tag documents. We didn't want to reinvent the wheel, and use assets that has been proved to work well. How did we end up with Datashare from an heterogeneous environment ? Initially we had : - a command line tool to extract text from huge document corpus - a proof of concept of NLP pipelines in java - a shared index based on blacklight / RoR and SolR - opensource tools and frameworks Issues we had to fix : - UX - scalability of SolR with millions of documents - integration of all the tools in one - maintainability and robustness while increasing code base

Weitere Infos

Format	devroom

Weitere Sessions

01.02.20	The good and the bad sides of developing open source tools for neuroscience Open Research Tools and Technologies Jan Grewe AW1.126 The reproducibility crisis has shocked the scientific community. Different papers describe this issue and the scientific community has taken steps to improve on it. For example, several initiatives have been founded to foster openness and standardisation in different scientific communities (e.g. the INCF[1] for the neurosciences). Journals encourage sharing of the data underlying the presented results, some even make it a requirement. What is the role of open source solutions in this respect? ...
01.02.20	Challenges and opportunities in scientific software development Open Research Tools and Technologies Julia Sprenger AW1.126 The approaches used in software development in an industry setting and a scientific environment are exhibit a number of fundamental differences. In the former industry setting modern team development tools and methods are used (version control, continuous integration, Scrum, ...) to develop software in teams with a focus on the final software product. In contrast, in the latter scientific environment a large fraction of scientific code is produced by individual scientists lacking thorough ...
01.02.20	NeuroFedora: Enabling Free/Open Neuroscience Open Research Tools and Technologies Aniket Pradhan AW1.126 NeuroFedora is an initiative to provide a ready to use Fedora-based Free/Open source software platform for neuroscience. We believe that similar to Free software; science should be free for all to use, share, modify, and study. The use of Free software also aids reproducibility, data sharing, and collaboration in the research community. By making the tools used in the scientific process more comfortable to use, NeuroFedora aims to take a step to enable this ideal.
01.02.20	Spotlight on Free Software Building Blocks for a Secure Health Data Infrastructure Open Research Tools and Technologies AW1.126 Health Data is traditionally held and processed in large and complex mazes of hospital information systems. The market is dominated by vendors offering monolithic and proprietary software due to the critical nature of the supported processes and - in some cases - due to legal requirements. The “digital transformation”, “big data” and “artificial intelligence” are some of the hypes that demand for improved exchange of health care data in routine health care and medical research alike. ...
01.02.20	DataLad Open Research Tools and Technologies Michael Hanke AW1.126 Contemporary sciences are heavily data-driven, but today's data management technologies and sharing practices fall at least a decade behind software ecosystem counterparts. Merely providing file access is insufficient for a simple reason: data are not static. Data often (and should!) continue to evolve; file formats can change, bugs will be fixed, new data are added, and derived data needs to be integrated. While (distributed) version control systems are a de-facto standard for open source ...
01.02.20	Frictionless Data for Reproducible Research Open Research Tools and Technologies Lilly Winfree AW1.126 Generating insight and conclusions from research data is often not a straightforward process. Data can be hard to find, archived in difficult to use formats, poorly structured and/or incomplete. These issues create “friction” and make it difficult to use, publish and share data. The Frictionless Data initiative (https://frictionlessdata.io/) at Open Knowledge Foundation (http://okfn.org) aims to reduce friction in working with data, with a goal to make it effortless to transport data among ...
01.02.20	On the road to sustainable research software. Open Research Tools and Technologies Mateusz Kuzak AW1.126 ELIXIR is an intergovernmental organization that brings together life science resources across Europe. These resources include databases, software tools, training materials, cloud storage, and supercomputers.

FOSDEM 2020

01.02.20 – 02.02.20

Event

FOSS Events

Erstellt von @foss_events 25 Abonnierende

Veranstaltungskalender