Session
FOSDEM Schedule 2021
Open Research Tools and Technologies

Metrics in Context: A Data Specification For Scholarly Metrics

D.research
Asura Enkhbayar
Google Scholar, Web of Science, Scopus, Dimensions, Crossref, Scite.ai, ... What used to be the home turf of for-profit publishers has become a buzzing field of technological innovation. Scholarly metrics, not only limited to citations and altmetrics, come from a host of data providers using an even wider range of technologies to capture and disseminate their data. Citations come as closed or open data, using traditional text processing or AI methods by private corporations, research projects or NGOs. What is missing is a language and standard to talk about the provenance of scholarly metrics. In this lightning talk, I will present an argument why we need to pay more attention to the processes of tracing and patterning that go into the creation of the precious data that determine our academic profiles, influence hiring and promotion decitions, and even national funding strategies. Furthmermore, I present the an early prototype of Metrics in Context, a data specification for scholarly metrics implemented in Frictionless Data. Additionally, the benefits and application of Metrics in Context is presented using both traditional citation data and a selection of common altmetrics such as the number of Tweets or FB shares.

In this lightning talk, I want to present Metrics in Context, a data specification implemented using Frictionless Data. It addresses a common theme within the critique of modern technology in our data-driven world: the lack of context for data and, often related, biases in databases. Algorithmic and database biases have moved into the spotlight of critical thought on how technology exacerbates systemic inequalities. Following these insights, I want to address the need for different (rather than simply more) context and metadata for scholarly metrics in the face of racial, gender, and geographic biases which plague modern academia.

It isn’t controversial to say that scholarly metrics have become an integral part of scholarship and probably they are here to stay. Controversy usually comes into play once we discuss how and for which purposes metrics are used. This typically refers to the (mis)use of citation counts and citation-based indicators1 for research assessment and governance, which also led to a considerable number of initiatives and movements calling for a responsible use of metrics. However, I would like to take a step back and redirect the attention to the origin of the data underlying citation counts.

These conversations about the inherent biases of citation databases are not entirely new and scholars across disciplines have been highlighting the consequential systemic issues. However, in this project I am not proposing a solution to overcome or abolish these biases per se, but rather I want to shine light on the opaque mechanism of capturing metrics which lead to the aforementioned inequalities. In other words, I propose to develop an open data standard3 for scholarly metrics which documents the context in which the data was captured. This metadata describes the properties of the capturing apparatus of a scholarly event (e.g., a citation, news mention, or tweet of an article) such as the limitations of document coverage (what kind of articles are indexed?), the kind of events captured (tweets, retweets, or the both maybe?) or other technicalities (is Facebook considered as a whole or only a subset of public pages?).

While metrics in context don’t remove systemic inequality, they make the usually hidden and inaccessible biases visible and explicit. In doing so, they facilitate conversations about structural issues in academia and eventually contribute to the development of better infrastructures for the future.

Additional information

Type devroom

More sessions

2/6/21
Open Research Tools and Technologies
Albert Yumol
D.research
As technology advances, so as our maps. In this talk, we will explore the ever growing open map data that can help us understand, validate, and explore socio-economic indicators with the aid of network theory and machine learning techniques.
2/6/21
Open Research Tools and Technologies
Olivier Aubert
D.research
We will describe in this talk how to combine crowdsourcing approaches with scientific expertise in Digital Humanities projects, and some of the issues that are at stake. The talk will focus on Recital, a Digital Humanities project aiming at gaining insights on 18th-century theater through the analysis of its accounting books. It combines crowdsourcing, using the ScribeAPI free software, producing results that need to be evaluated and validated by scientific expertise, which requires appropriate ...
2/6/21
Open Research Tools and Technologies
D.research
This talk will focus on our experiences with making open source tools for the study of social media platforms (amongst others, DMI-TCAT for Twitter, the YouTube Data Tools, and 4CAT for forum-like platforms such as Reddit and 4chan) in the context of social science and humanities research. We will discuss questions of reliability and reproducibility, but also how tools are taking part in shaping which questions are being asked and how research is done in practice - making open source ...
2/6/21
Open Research Tools and Technologies
Benjamin Ooghe-Tabanou
D.research
The World Wide Web’s original design as a vast open documentary space built around the concept of hypertext made it a fantastic research field to study networks of actors of a specific field or controversy and analyse their connectivity. Navicrawler, IssueCrawler, Hyphe... Over the past 15 years, a variety of web crawling tools, most often free and open source, have been developped by or for social sciences research labs across the world. They provide means to engage with the web as a research ...
2/6/21
Open Research Tools and Technologies
Béatrice Mazoyer
D.research
Many open-source libraries provide an interface for the Twitter API. However, most people use these tools in temporary scripts for a one-time tweets collection. Moving to a robust application for collecting and indexing tweets over long periods of time requires some programming knowledge that most social science researchers do not master. In order to meet this need, the medialab has developed gazouilloire, a tool that makes it possible to easily configure the collection parameters (keywords ...
2/6/21
Open Research Tools and Technologies
Guillaume Levrier
D.research
PANDORÆ : Retrieving, curating and exploring enhanced corpi through time and space Mapping the state of research in a particular field has been made easier through commercial services providing API-based bibliometric-enhanced corpuses retrieval. Common assertions such as “the use of CRISPR technologies has skyrocketed in laboratories all around the world since 2012” can now be easily verified in both quantitative and qualitative perspectives using those platforms. Such services as ...
2/6/21
Open Research Tools and Technologies
D.research
This is a live panel session which gathers speakers from three lightning talks about web mining tools and technologies.