Skip to main content

Open Science Knowledge Graphs

 ∙ Stefania Amodeo

Scientific Knowledge Graphs (SKGs) are of great value to the research community in converting data into knowledge. In a recent workshop at the Open Science Fair in Madrid, experts from various disciplines came together to discuss the potential and challenges of SKGs.

This blog post highlights the key insights from the workshop, including the presentations, discussion highlights, and the next steps in advancing SKGs.

Presentations

The workshop featured five speakers who presented compelling cases of SKGs and their applications in different domains. Thanasis Vergoulis from Athena RC discussed the status of the OpenAIRE Graph and its enrichment through the EU SciLake project. Ingrid Reiten from the University of Oslo highlighted the synergies between the EBRAINS data and knowledge service and SciLake, specifically in the neuroscience research domain. Leily Rabbani from the Karolinska Institute shared the roadmap for building a cancer knowledge graph through SciLake. Joaquín López Lérida from LifeWatch ERIC introduced the LifeBlock tool for the construction of SKGs for the biodiversity research domain. Finally, Max Novelli from the European Spallation Source presented the PaNOSC data portal for the photon and neutron community.

All the presentations are accessible on Zenodo: https://zenodo.org/record/8402580

Discussion Highlights

The round table discussion provided valuable insights into the challenges and potential of SKGs. Participants actively engaged with the speakers. An online survey was conducted to gather participants' roles, main uses of SKGs, and suggestions for improvement. Some notable highlights from the discussion include:

  • SKGs enhance research productivity and enable quicker translation of hypotheses into results. They serve as a foundation for powerful tools that aid researchers and stakeholders in making informed decisions based on factual information.

  • SKGs catalyze the development of services for advanced knowledge extraction and exploration. By leveraging the interconnectedness of data, SKGs enable researchers to uncover hidden relationships and patterns, leading to new discoveries and insights.

  • Interoperability between graphs is a significant area of progress. Efforts are being made to ensure that SKGs from specific domains can seamlessly integrate and exchange information with cross-domain graphs, like the OpenAIRE Graph, fostering interdisciplinary research.
  • Incorporating sensitive data into SKGs presents a challenge. However, blockchain technology offers a promising solution by providing a secure and transparent framework for managing sensitive information while maintaining data integrity and privacy.

 


Next Steps

Building on the momentum of the workshop, the participants identified key next steps to further advance SKGs. These steps include:

  • Exploiting the synergies between the different initiatives presented during the workshop to create domain-specific, interlinked SKGs.

  • Addressing various challenges in delivering high-quality SKGs:
    • ensuring broad coverage of scientific knowledge,
    • promoting interoperability between domain-specific and cross-domain SKGs,
    • ensuring long-term sustainability,
    • improving the accuracy and reliability of data sources,
    • incorporating multilingual content,
    • enabling computational reproducibility,
    • adopting good curation practices for domain-specific SKGs.

The scientific community can harness the full potential of SKGs by pursuing these next steps, transforming the way we discover and assess scientific knowledge.

Redefining Research Impact

9 October 2023

Redefining Research Impact: a chat with César Parra, SIRIS Academic

How can open data be used to measure research impact?  We recently sat down with César Parra and asked him to share some insights on how SIRIS Academic and SciLake plan to redefine research impact.  

César Parra, a data scientist with a background in Physics, is uniquely positioned to shed light on this topic. For the past three years, César has been working at SIRIS Academic, a research-intensive consulting firm specializing in higher education, science, technology, and innovation policy. There, he coordinates SIRIS' technical and scientific contribution to SciLake, bringing his team's expertise in impact analysis using natural language processing (NLP)

Here are the main takeaways from that encounter.


“At SIRIS Academic, we are dedicated to helping people make sense of the vast amount of research data out there. We develop solutions for analysing scientific impact through research mapping based on natural language processing techniques such as topic modelling.”

César Parra, SIRIS Academic


César, which methods and technologies are you developing to characterize research impact?

SIRIS Academic is working on a double approach. On one hand, we work directly with stakeholders to get information on their specific domain, along with known ontologies and taxonomies, to design a “controlled vocabulary” of keywords.

Our in-house library, VocTagger, performs NLP tasks (for those who are familiar, these include lemmatization, permutation of words, reverse order, etc.) to identify textual documents belonging to the domain, while capturing variants of a concept and allowing multi-word keywords from the vocabulary to have a certain distance between words.

On the other hand, we map research using more advanced techniques, such as textual classifiers with predefined domains, whereby we train a classification model on a set of documents for which the topic is known and we use the model to classify new documents, or in a bottom-up fashion via topic modelling.

What is your role in SciLake?

We are working on the SciLake impact assessment service. One of the greatest advantages of the project is that it involves pilot communities in neuroscience, transportation, energy, and cancer research, each with a unique expertise of what impact means in their respective fields. 

SIRIS is also leading the development of the SciLake reproducibility assistance service, which will help researchers improve the reproducibility of their work. A major challenge on this front is identifying the research artifacts (software, datasets, methods, etc.) that are important to the pilots' use cases. Currently, for example, it is possible to identify concepts fairly well in certain domain-specific tasks (e.g. the type of cancer for the cancer research domain) while recognizing research methodologies (e.g. RNA-sequencing analysis) is still a challenge. The involvement of the pilots in the design process is therefore crucial.

Who will benefit from these services and how?

Our goal is to enable research funders, universities, and governmental agencies to have a better understanding of research impact through the use of open data. We believe this will help create a more informed and accurate picture of the research landscape.

Meet our partners, impact articles

Read more …Redefining Research Impact

SciLake at OSFAIR 2023

SciLake at OSFAIR 2023

SciLake will be at the Open Science Fair (OS FAIR) 2023.

Our members will participate in the workshop "Open Science Knowledge Graphs (SKGs): Transforming the Way we Manage, Explore, and Analyze Scientific Knowledge", presenting our mission of building a comprehensive scholarly communication graph and our technical solutions under development. The workshop, organised by OpenAIRE and Athena Research Center, will be an excellent opportunity to explore potential areas of cooperation and common goals with ESFRI's research communities and hear about their ongoing work to create and maintain domain-specific SKGs and their current challenges. The workshop aims to provide insight into SKG use in Open Science activities and their impact on research outputs and collaborations and is targeted at research infrastructures, research communities, publishers, content providers, research administrators, policy makers, funders, service providers and innovators, as well as EOSC organizations. 

Our SIRIS partners will give a lightning talk entitled "Exploring trends and impact of scientific publications based on open access journals: an application in the archaeological research domain", where they will present significant advancements in our smart impact-driven discovery service.

Our ISTI-CNR partners will present a poster on methodologies for disambiguating multiple entities in a graph entitled “The three processes for de-duplication of organisations, data sources, and results of the OpenAIRE graph”.

OS FAIR will be held in Madrid from 25 to 27 September 2023. The event aims to bring together and empower open science communities, identify common practices related to open science, explore synergies for creating and operating services that work for many, and learn from each other's experiences from around the world.

For more information visit https://www.opensciencefair.eu/

Event

Read more …SciLake at OSFAIR 2023