Open Research Tools and Technologies

Empowering social scientists with web mining tools

Why and how to enable researchers to perform complex web mining tasks

February 1, 2020
2:30 PM – 3:00 PM

AW1.126

Guillaume Plique

Web mining, as represented mostly by the scraping & crawling practices, is not a straightforward task and requires a variety of skills related to web technologies. However, web mining can be incredibly useful to social sciences since it enables researchers to tap into a formidable source of information about society. But researchers may not have the possibility to invest copious amount of times into learning web technologies in and out. They usually rely on engineers to collect data from the web. The object of this talk is to explain how Sciences Po's médialab designed & developed tools to empower researchers and enable them to perform web mining tasks to answer their research questions. Here is an example of issues we will tackle during this talk: How a social sciences laboratory life can be a very fruitful context for tool R&D regarding webmining How to create performant & effective webmining tools that anyone can use (multithreading, parallelism, JS execution, complex spiders etc.) How to re-localize data collection: researchers should be able to conduct their own collections without being dependent on external servers or resources How to teach researchers the necessary skills: HTML, the DOM, CSS selection etc. Examples will be taken mainly from the minet CLI tool and the artoo.js bookmarklet. Speaker Guillaume Plique is a research engineer working for SciencesPo's médialab. He assists social sciences researchers daily with their methods and maintain a variety of FOSS tools geared toward the social sciences community and also developers.

Additional information

Type	devroom

More sessions

2/1/20	The good and the bad sides of developing open source tools for neuroscience Open Research Tools and Technologies Jan Grewe AW1.126 The reproducibility crisis has shocked the scientific community. Different papers describe this issue and the scientific community has taken steps to improve on it. For example, several initiatives have been founded to foster openness and standardisation in different scientific communities (e.g. the INCF[1] for the neurosciences). Journals encourage sharing of the data underlying the presented results, some even make it a requirement. What is the role of open source solutions in this respect? ...
2/1/20	Challenges and opportunities in scientific software development Open Research Tools and Technologies Julia Sprenger AW1.126 The approaches used in software development in an industry setting and a scientific environment are exhibit a number of fundamental differences. In the former industry setting modern team development tools and methods are used (version control, continuous integration, Scrum, ...) to develop software in teams with a focus on the final software product. In contrast, in the latter scientific environment a large fraction of scientific code is produced by individual scientists lacking thorough ...
2/1/20	NeuroFedora: Enabling Free/Open Neuroscience Open Research Tools and Technologies Aniket Pradhan AW1.126 NeuroFedora is an initiative to provide a ready to use Fedora-based Free/Open source software platform for neuroscience. We believe that similar to Free software; science should be free for all to use, share, modify, and study. The use of Free software also aids reproducibility, data sharing, and collaboration in the research community. By making the tools used in the scientific process more comfortable to use, NeuroFedora aims to take a step to enable this ideal.
2/1/20	Spotlight on Free Software Building Blocks for a Secure Health Data Infrastructure Open Research Tools and Technologies AW1.126 Health Data is traditionally held and processed in large and complex mazes of hospital information systems. The market is dominated by vendors offering monolithic and proprietary software due to the critical nature of the supported processes and - in some cases - due to legal requirements. The “digital transformation”, “big data” and “artificial intelligence” are some of the hypes that demand for improved exchange of health care data in routine health care and medical research alike. ...
2/1/20	DataLad Open Research Tools and Technologies Michael Hanke AW1.126 Contemporary sciences are heavily data-driven, but today's data management technologies and sharing practices fall at least a decade behind software ecosystem counterparts. Merely providing file access is insufficient for a simple reason: data are not static. Data often (and should!) continue to evolve; file formats can change, bugs will be fixed, new data are added, and derived data needs to be integrated. While (distributed) version control systems are a de-facto standard for open source ...
2/1/20	Frictionless Data for Reproducible Research Open Research Tools and Technologies Lilly Winfree AW1.126 Generating insight and conclusions from research data is often not a straightforward process. Data can be hard to find, archived in difficult to use formats, poorly structured and/or incomplete. These issues create “friction” and make it difficult to use, publish and share data. The Frictionless Data initiative (https://frictionlessdata.io/) at Open Knowledge Foundation (http://okfn.org) aims to reduce friction in working with data, with a goal to make it effortless to transport data among ...
2/1/20	On the road to sustainable research software. Open Research Tools and Technologies Mateusz Kuzak AW1.126 ELIXIR is an intergovernmental organization that brings together life science resources across Europe. These resources include databases, software tools, training materials, cloud storage, and supercomputers.

FOSDEM 2020

2/1/20 – 2/2/20

Event

FOSS Events

Created by @foss_events 25 Follower

Event Calendar