Skip to main content
Technical component

DoStRe

Scientific text structure revealed

DoStRe (Document Structure Recognition) is a tool that recognizes the structure of scientific documents.

This tool is designed to accurately identify and analyze the structure of scientific documents, focusing on recognizing sections and titles. In addition to identifying document sections, the tool also classifies these sections into predefined types. For instance, conclusions may be labeled as "Conclusions", "Discussion", or "Conclusions and Future Work".

Code repository: https://gitlab.com/dfki-scilake/dsr

Functionalities

Scientific Article Structure Detection

With advanced technology powered by GROBID, the tool processes PDF documents to identify elements like headings, text, tables, figures, and references. This ensures a comprehensive breakdown of each paper, helping users navigate complex information efficiently.

Scientific Article Section Classification

The tool also classifies text into predefined categories, making it easier to understand the purpose of each section. Using data from the PubMed dataset, we’ve trained a robust system to group sections into eight classes: introduction, background (i.e., background, review and related work), case (i.e., case reports), method, result, discussion, conclusion and additional information (such as conflicts of interest, financial support and acknowledgements).

For

Service Providers
Research Communities

Provided by

Contacts

Julian Moreno Schneider