9 October 2023

Redefining Research Impact: a chat with César Parra, SIRIS Academic

How can open data be used to measure research impact? We recently sat down with César Parra and asked him to share some insights on how SIRIS Academic and SciLake plan to redefine research impact.

César Parra, a data scientist with a background in Physics, is uniquely positioned to shed light on this topic. For the past three years, César has been working at SIRIS Academic, a research-intensive consulting firm specializing in higher education, science, technology, and innovation policy. There, he coordinates SIRIS' technical and scientific contribution to SciLake, bringing his team's expertise in impact analysis using natural language processing (NLP).

Here are the main takeaways from that encounter.

“At SIRIS Academic, we are dedicated to helping people make sense of the vast amount of research data out there. We develop solutions for analysing scientific impact through research mapping based on natural language processing techniques such as topic modelling.”

César Parra, SIRIS Academic

César, which methods and technologies are you developing to characterize research impact?

SIRIS Academic is working on a double approach. On one hand, we work directly with stakeholders to get information on their specific domain, along with known ontologies and taxonomies, to design a “controlled vocabulary” of keywords.

Our in-house library, VocTagger, performs NLP tasks (for those who are familiar, these include lemmatization, permutation of words, reverse order, etc.) to identify textual documents belonging to the domain, while capturing variants of a concept and allowing multi-word keywords from the vocabulary to have a certain distance between words.

On the other hand, we map research using more advanced techniques, such as textual classifiers with predefined domains, whereby we train a classification model on a set of documents for which the topic is known and we use the model to classify new documents, or in a bottom-up fashion via topic modelling.

What is your role in SciLake?

We are working on the SciLake impact assessment service. One of the greatest advantages of the project is that it involves pilot communities in neuroscience, transportation, energy, and cancer research, each with a unique expertise of what impact means in their respective fields.

SIRIS is also leading the development of the SciLake reproducibility assistance service, which will help researchers improve the reproducibility of their work. A major challenge on this front is identifying the research artifacts (software, datasets, methods, etc.) that are important to the pilots' use cases. Currently, for example, it is possible to identify concepts fairly well in certain domain-specific tasks (e.g. the type of cancer for the cancer research domain) while recognizing research methodologies (e.g. RNA-sequencing analysis) is still a challenge. The involvement of the pilots in the design process is therefore crucial.

Who will benefit from these services and how?

Our goal is to enable research funders, universities, and governmental agencies to have a better understanding of research impact through the use of open data. We believe this will help create a more informed and accurate picture of the research landscape.