Skip to main content


Scientific Lake service bundle

The service bundle is designed to collect, manage, and query heterogeneous scholarly content. It provides functionalities, components and open APIs to support research activities.

The bundle has the following components

Enhancing metadata through text and data mining

Information Inference Service
Information Inference Service (IIS) is a flexible data processing system for handling big data based on Apache Hadoop technologies. It is a subsystem of the OpenAIRE system and it uses algorithms to extract new entities and relations from full texts to enrich SKGs.

Unlocking knowledge through PDF acquisition

PDFfetcher is a tool designed to acquire the full text of publications by collecting PDFs from URL links. With a coverage of over 20 million PDF articles, it provides a comprehensive resource for researchers.

Domain-Specific Machine Translation

Machine Translation System
The Machine Translation system ensures accurate and contextually appropriate translations by fine-tuning general-purpose machine translation models with domain-specific scientific data.

Data Science Tool for Heterogeneous Network Mining

SciNeM is data science tool for metapath-based querying and analysis of Heterogeneous Information Networks. It enables entity ranking, similarity searches, and community detection.

Simplifying access to knowledge

Open API
Open API is an initiative that aims to hide technical complexities and provide a user-friendly interface for accessing information.

Discovering Dependencies, Enriching Knowledge

KG creation assistant & Interlinking
The Knowledge Graph creation assistant & Interlinking tool is designed to extract knowledge graphs from unstructured or semi-structured data sources and enrich their content.

Enriching research through comprehensive resource description

Data Catalogue
A data catalog is an organized inventory of resources. It leverages rich metadata descriptions to support data discovery and governance. It is a single point of access for all the relevant resources, independent of the place they are stored or running.

High-performance graph analytics

AvantGraph is a tool that supports on-top services to perform analytics on graphs. It offers a high-performance graph processing engine for scientific data lakes, allowing a wide range of data processing tasks.