Skip to main content

The OpenAIRE Graph: What's in it for Science Communities?


SciLake technical components

The OpenAIRE Graph: What's in it for Science Communities?

By Stefania Amodeo

In a webinar for SciLake partners, Miriam Baglioni, researcher at the National Research Council of Italy (CNR) and one of the OpenAIRE Graph developers, introduced the OpenAIRE Graph and discussed its benefits for science communities. This article recaps the key points from the webinar.

In the era of Open Science, it has become crucial to track how scientists conduct their research. The concept of "discovery" has evolved, and now we aim to enable reproducibility and assess the quality of research beyond just publications. The OpenAIRE Graph was developed for this purpose. This graph is a collection of metadata describing various objects in the research life cycle, forming a network of interconnected elements.

Motivation and concept

The OpenAIRE Graph aims to be a complete and open collection of metadata describing research objects. It includes data from various big players, such as Crossref, to be as comprehensive as possible. To maintain accuracy, the graph is de-duplicated, meaning that when metadata from different sources are available for the same research result, only one entity is counted for statistical purposes. Transparency is also a key aspect, as provenance information is marked and traced within the graph. Additionally, the OpenAIRE Graph is built to be participatory, allowing anyone to contribute their data following the provided guidelines. The graph also strives to be decentralized, enriching information from repositories and pushing it back to the original sources. By including trusted providers, the graph becomes a valuable resource for researchers throughout the research life cycle.

Graph Concept: open, complete, de-duplicated, transparent, participatory, decentralized, trusted

Data Sources and Data Model

Everyone is free to share their data with the graph by registering on one of our services and sharing the metadata. We currently have more than 2,000 active data sources. These include institutional and thematic repositories, funder databases, entity registries, organizations, ORCID, and many more sources. All the metadata from these different entities are interconnected.

OpenAIRE Graph Data Model
The OpenAIRE Graph Data Model

Building Process

The OpenAIRE Graph is built upon metadata provided voluntarily by data sources. Regular snapshots of the metadata are taken and combined with full-text mining of Open Access publications to enrich the relationships among entities. Duplicates are handled by creating a representative metadata object that points to all replicas. The graph then goes through an enrichment process, utilizing the existing information to further enhance the relationships and results. Finally, the graph is cleaned and indexed, making it accessible through the API and OpenAIRE's value-added services.

The OpenAIRE Graph supply chain
The OpenAIRE Graph supply chain

Connection to Science Communities

The OpenAIRE Graph has significant relevance and connections to various science communities. SciLake's pilots will receive the following benefits:

  • For Cancer research, the graph imports metadata from PubMed and plans to integrate citation links between PubMed articles.
  • For Energy research, there is already a gateway called enermaps.eu that provides access to relevant information and the graph will add further linkage options.
  • For Neuroscience, interoperability options between the OpenAIRE Graph and the EBRAINS-KG will be offered.
  • For the Transportation research, two paths are envisaged:
    • access products related to the TOPOS gateway (beopen.openaire.eu), which contains all the relevant information for transportation research included in the graph,
    • investigate interoperability options between the OpenAIRE Graph and the Knowledge Base on Connected and Automated Driving (CAD)

The OpenAIRE Graph continues to evolve and welcomes ideas and collaborations from all science communities.

Challenges and perspectives

Building and maintaining the OpenAIRE Graph comes with its own set of challenges. Combining domain-specific knowledge with domain-agnostic knowledge can be complex, especially when dealing with unstructured files and non-English texts. The format and organization of data vary across communities, making it difficult and unsustainable to include everything in the graph.

While challenges exist, the SciLake project plays a pivotal role in improving and expanding the OpenAIRE Graph to accommodate new entities ensuring its relevance and usefulness for the scientific community.

To learn more about the OpenAIRE Graph, visit the website graph.openaire.eu and explore the documentation on data sources and the graph construction pipeline.

Read more …The OpenAIRE Graph: What's in it for Science Communities?

  • Created on .

Insights into the Potential of Scientific Knowledge Graphs

Survey Results: Insights into the Potential of Scientific Knowledge Graphs

Scientific Knowledge Graphs (SKGs) have been gaining attention in the research community for their ability to convert data into knowledge. To understand the perspectives and expectations surrounding SKGs, an online survey was conducted during the Open Science Knowledge Graph workshop held during the OSFAIR 2023.

In this blog post, we will delve into the survey results and highlight the key insights regarding the participants' roles, main uses of SKGs, and the features that should be improved or added to enhance SKG effectiveness.

Role of survey's participants within the research community

Participants' roles

The survey participants consisted of 61 individuals from various roles within the research community: service providers (31%), researchers (28%), research administrators (28%), policy makers (6%), publishers (5%), and funders (2%).

Main Uses of SKGs

When it comes to the main use or interest in SKGs, the survey revealed a wide range of applications and benefits. These included:

  • Providing an alternative to proprietary graphs
  • Implementing FAIR (Findable, Accessible, Interoperable, and Reusable) principles
  • Enhancing decision-making processes
  • Mapping data stewardship services in a Knowledge Graph
  • Helping researchers track the impact of their research and maximize its use
  • Improving research discovery and dissemination
  • Leveraging NLP (Natural Language Processing) techniques to enhance search capabilities
  • Facilitating research assessment, monitoring, and reporting
  • Exploring the science of science: understanding the scientific ecosystem and generating new knowledge about research evolution
  • Analyzing and feeding internal reports on institutional behavior and its context
  • Obtaining a complete picture of one's research area of interest at any given time
  • Gaining insights into the interests and working areas of researchers within a country to provide better research opportunities and effectively connect them with funders
  • Retrieving high-quality metadata for semantic analysis
  • Harnessing the capabilities of SKGs to visualize the scientific workflow and unlock new possibilities for information discovery and correlation creation

Additionally, participants noted that SKGs underlie a number of services that research communities use, highlighting the importance of understanding how SKGs operate for those involved in research support.

Improvements and Additional Features

To fully harness the potential of SKGs, participants identified certain improvements and additional features that they believed would enhance their effectiveness. These suggestions included:

  • Implementing persistent identifiers (PIDs) for organizations to account for historical changes such as institution name changes or mergers
  • Supporting multilingualism to facilitate the accessibility of SKGs across different language communities
  • Ensuring reliability, curation, and monitoring of metadata quality to maintain the integrity and usefulness of SKGs
  • Streamlining the querying process to make it easier and more user-friendly
  • Empowering business intelligence (BI) with multiple options to enable comprehensive analysis and decision-making capabilities
  • Providing information about retractions to ensure the accuracy and reliability of research findings
  • Addressing the challenges associated with scholarly publication workflows, particularly the unruled and uncontrolled manner in which researchers publish papers, data, and software in open science
  • Involving the community in data curation to leverage collective expertise and ensure the accuracy and relevance of SKGs
  • Offering a simple data model that does not compromise information, semantics, and provenance while making it easier to navigate and understand the SKGs

Participants also emphasized the importance of a user-friendly interface to enhance the accessibility and usability of SKGs.

Conclusions

The survey results shed light on the roles, main uses, and expectations surrounding Scientific Knowledge Graphs (SKGs). From enabling better research discovery to supporting decision-making processes, SKGs have the potential to transform the way we manage, explore, and analyze scientific knowledge. By addressing the suggested improvements and adding the desired features, the scientific community can fully leverage the power of SKGs and unlock new possibilities for research and knowledge discovery.

Read more …Insights into the Potential of Scientific Knowledge Graphs

Open Science Knowledge Graphs

 ∙ Stefania Amodeo

Scientific Knowledge Graphs (SKGs) are of great value to the research community in converting data into knowledge. In a recent workshop at the Open Science Fair in Madrid, experts from various disciplines came together to discuss the potential and challenges of SKGs.

This blog post highlights the key insights from the workshop, including the presentations, discussion highlights, and the next steps in advancing SKGs.

Presentations

The workshop featured five speakers who presented compelling cases of SKGs and their applications in different domains. Thanasis Vergoulis from Athena RC discussed the status of the OpenAIRE Graph and its enrichment through the EU SciLake project. Ingrid Reiten from the University of Oslo highlighted the synergies between the EBRAINS data and knowledge service and SciLake, specifically in the neuroscience research domain. Leily Rabbani from the Karolinska Institute shared the roadmap for building a cancer knowledge graph through SciLake. Joaquín López Lérida from LifeWatch ERIC introduced the LifeBlock tool for the construction of SKGs for the biodiversity research domain. Finally, Max Novelli from the European Spallation Source presented the PaNOSC data portal for the photon and neutron community.

All the presentations are accessible on Zenodo: https://zenodo.org/record/8402580

Discussion Highlights

The round table discussion provided valuable insights into the challenges and potential of SKGs. Participants actively engaged with the speakers. An online survey was conducted to gather participants' roles, main uses of SKGs, and suggestions for improvement. Some notable highlights from the discussion include:

  • SKGs enhance research productivity and enable quicker translation of hypotheses into results. They serve as a foundation for powerful tools that aid researchers and stakeholders in making informed decisions based on factual information.

  • SKGs catalyze the development of services for advanced knowledge extraction and exploration. By leveraging the interconnectedness of data, SKGs enable researchers to uncover hidden relationships and patterns, leading to new discoveries and insights.

  • Interoperability between graphs is a significant area of progress. Efforts are being made to ensure that SKGs from specific domains can seamlessly integrate and exchange information with cross-domain graphs, like the OpenAIRE Graph, fostering interdisciplinary research.
  • Incorporating sensitive data into SKGs presents a challenge. However, blockchain technology offers a promising solution by providing a secure and transparent framework for managing sensitive information while maintaining data integrity and privacy.

 


Next Steps

Building on the momentum of the workshop, the participants identified key next steps to further advance SKGs. These steps include:

  • Exploiting the synergies between the different initiatives presented during the workshop to create domain-specific, interlinked SKGs.

  • Addressing various challenges in delivering high-quality SKGs:
    • ensuring broad coverage of scientific knowledge,
    • promoting interoperability between domain-specific and cross-domain SKGs,
    • ensuring long-term sustainability,
    • improving the accuracy and reliability of data sources,
    • incorporating multilingual content,
    • enabling computational reproducibility,
    • adopting good curation practices for domain-specific SKGs.

The scientific community can harness the full potential of SKGs by pursuing these next steps, transforming the way we discover and assess scientific knowledge.

Redefining Research Impact

9 October 2023

Redefining Research Impact: a chat with César Parra, SIRIS Academic

How can open data be used to measure research impact?  We recently sat down with César Parra and asked him to share some insights on how SIRIS Academic and SciLake plan to redefine research impact.  

César Parra, a data scientist with a background in Physics, is uniquely positioned to shed light on this topic. For the past three years, César has been working at SIRIS Academic, a research-intensive consulting firm specializing in higher education, science, technology, and innovation policy. There, he coordinates SIRIS' technical and scientific contribution to SciLake, bringing his team's expertise in impact analysis using natural language processing (NLP)

Here are the main takeaways from that encounter.


“At SIRIS Academic, we are dedicated to helping people make sense of the vast amount of research data out there. We develop solutions for analysing scientific impact through research mapping based on natural language processing techniques such as topic modelling.”

César Parra, SIRIS Academic


César, which methods and technologies are you developing to characterize research impact?

SIRIS Academic is working on a double approach. On one hand, we work directly with stakeholders to get information on their specific domain, along with known ontologies and taxonomies, to design a “controlled vocabulary” of keywords.

Our in-house library, VocTagger, performs NLP tasks (for those who are familiar, these include lemmatization, permutation of words, reverse order, etc.) to identify textual documents belonging to the domain, while capturing variants of a concept and allowing multi-word keywords from the vocabulary to have a certain distance between words.

On the other hand, we map research using more advanced techniques, such as textual classifiers with predefined domains, whereby we train a classification model on a set of documents for which the topic is known and we use the model to classify new documents, or in a bottom-up fashion via topic modelling.

What is your role in SciLake?

We are working on the SciLake impact assessment service. One of the greatest advantages of the project is that it involves pilot communities in neuroscience, transportation, energy, and cancer research, each with a unique expertise of what impact means in their respective fields. 

SIRIS is also leading the development of the SciLake reproducibility assistance service, which will help researchers improve the reproducibility of their work. A major challenge on this front is identifying the research artifacts (software, datasets, methods, etc.) that are important to the pilots' use cases. Currently, for example, it is possible to identify concepts fairly well in certain domain-specific tasks (e.g. the type of cancer for the cancer research domain) while recognizing research methodologies (e.g. RNA-sequencing analysis) is still a challenge. The involvement of the pilots in the design process is therefore crucial.

Who will benefit from these services and how?

Our goal is to enable research funders, universities, and governmental agencies to have a better understanding of research impact through the use of open data. We believe this will help create a more informed and accurate picture of the research landscape.

Meet our partners, impact articles

Read more …Redefining Research Impact

SciLake at OSFAIR 2023

SciLake at OSFAIR 2023

SciLake will be at the Open Science Fair (OS FAIR) 2023.

Our members will participate in the workshop "Open Science Knowledge Graphs (SKGs): Transforming the Way we Manage, Explore, and Analyze Scientific Knowledge", presenting our mission of building a comprehensive scholarly communication graph and our technical solutions under development. The workshop, organised by OpenAIRE and Athena Research Center, will be an excellent opportunity to explore potential areas of cooperation and common goals with ESFRI's research communities and hear about their ongoing work to create and maintain domain-specific SKGs and their current challenges. The workshop aims to provide insight into SKG use in Open Science activities and their impact on research outputs and collaborations and is targeted at research infrastructures, research communities, publishers, content providers, research administrators, policy makers, funders, service providers and innovators, as well as EOSC organizations. 

Our SIRIS partners will give a lightning talk entitled "Exploring trends and impact of scientific publications based on open access journals: an application in the archaeological research domain", where they will present significant advancements in our smart impact-driven discovery service.

Our ISTI-CNR partners will present a poster on methodologies for disambiguating multiple entities in a graph entitled “The three processes for de-duplication of organisations, data sources, and results of the OpenAIRE graph”.

OS FAIR will be held in Madrid from 25 to 27 September 2023. The event aims to bring together and empower open science communities, identify common practices related to open science, explore synergies for creating and operating services that work for many, and learn from each other's experiences from around the world.

For more information visit https://www.opensciencefair.eu/

Event

Read more …SciLake at OSFAIR 2023