Technical component
DoStRe
Scientific text structure revealed

DoStRe (Document Structure Recognition) is a tool that recognizes the structure of scientific documents.
This tool is designed to accurately identify and analyze the structure of scientific documents, focusing on recognizing sections and titles. In addition to identifying document sections, the tool also classifies these sections into predefined types. For instance, conclusions may be labeled as "Conclusions", "Discussion", or "Conclusions and Future Work".
Code repository: https://gitlab.com/dfki-scilake/dsr
Functionalities
Scientific Article Structure Detection
With advanced technology powered by GROBID, the tool processes PDF documents to identify elements like headings, text, tables, figures, and references. This ensures a comprehensive breakdown of each paper, helping users navigate complex information efficiently.
Scientific Article Section Classification
The tool also classifies text into predefined categories, making it easier to understand the purpose of each section. Using data from the PubMed dataset, we’ve trained a robust system to group sections into eight classes: introduction, background (i.e., background, review and related work), case (i.e., case reports), method, result, discussion, conclusion and additional information (such as conflicts of interest, financial support and acknowledgements).