Cancer

- Leily Rabbani, This email address is being protected from spambots. You need JavaScript enabled to view it.
- Georgios Gavriilidis, This email address is being protected from spambots. You need JavaScript enabled to view it.
- Daniel Hägerstrand, This email address is being protected from spambots. You need JavaScript enabled to view it.
- Konstantinos Kardamiliotis, This email address is being protected from spambots. You need JavaScript enabled to view it.
CS Organisations:
- Karolinska Institute , CS Organisation Logo: , https://ki.se/en
- CERTH , CS Organisation Logo: , https://www.certh.gr/root.en.aspx

The SciLake Cancer Pilot has been developed to study how to create accessible, interconnected scientific resources within the cancer research community. Led by teams from the Karolinska Institutet (KI) and the Centre for Research and Technology HELLAS (CERTH), this pilot focuses on enhancing the understanding of cancer biology and treatment, specifically targeting Chronic Lymphocytic Leukemia (CLL), the most common form of adult leukaemia.

Innovation in Personalized Medicine

The Cancer Pilot addresses critical challenges in personalised medicine, particularly identifying key biomarkers for tailored treatment approaches. This pilot aims to build a targeted scientific knowledge graph by combining data from biomedical knowledge graphs with new insights extracted through text and graph mining algorithms. The resulting graph will provide a deeper understanding of CLL's heterogeneous nature and other cancers, ultimately improving treatment strategies and patient outcomes.

The CLL Knowledge Graph (CLL-KG)

Our CLL Knowledge Graph brings together vital data from multiple sources:

Genetic information
Protein interactions
Metabolic pathways
Drug development data
Clinical trial results

This semantic structure of knowledge graphs is anticipated to be pivotal in enabling researchers and clinicians to gain deeper insights into patient subtypes, treatment responses, and emerging therapies.

Advanced Technology Integration

We're leveraging cutting-edge tools in:

Text mining and analysis
Entity recognition
Graph mining

To accelerate knowledge discovery and advance precision medicine in cancer research.

What we've achieved so far

Our team has made significant progress on several key fronts:

Developed a comprehensive definition of the Knowledge Space:
- key domain-specific entities: proteins, genes, pathways, diseases, and scientific publications.
- Investigated third-party Knowledge Graphs for integration into our CLL-KG.
Built and customized a Cancer Research OpenAIRE Gateway to optimise the identification of cancer-specific research outputs.
Built an extensive article database for testing components and services, and provided feedback on AvantGraph and BIP! Spaces prototypes.
Developed the first version of the CLL-KG that maps key domain entities, their relationships, and valuable metadata attributes, using a subset of the data.
- CLL-KG Demo

Related News

Presentation: Roadmap for a Cancer Knolwledge Graph, presented at the workshop “Open Science Knowledge Graphs: Transforming the Way we Manage, Explore, and Analyze Scientific Knowledge” at Open Science FAIR 2023.
Workshop: Defining the Roadmap for a European Cancer Data Space, organised by the EOSC4Cancer project in Brussels, October 2023.
Presentation: Unlocking insights in Cancer Research through Knowledge Graphs at the Cancer Landscale Partnering meeting, February 2024.
Press release: Connecting the Dots between Cancer Data Networks: The SciLake Cancer Knowledge Graph, April 2024.
Presentation: SciLake Cancer Knowledge Graph for data-driven precision Oncology, poster at the international conference on Intelligent Systems for Molecular Biology, July 2025

Discover SciLake Cancer Pilot

Scilake Pilots

The SciLake Cancer Knowledge Graph

SciLake is in full swing with its pilot programs in the fields of neuroscience, cancer research, transportation, and energy. These initiatives aim to create or enrich domain-specific Scientific Knowledge Graphs that capture valuable knowledge from each scientific field.

June 11, 2024

The SciLake Cancer Pilot is developing a first-of-its-kind cancer knowledge graph, with the aim to make public resources in biology and cancer more accessible to the research community.

The case study in focus isChronic Lymphocytic Leukemia (CLL), the most prevalent adult leukemia. The cancer knowledge graph will assist in discovering essential biomarkers for personalised treatment and care, a critical step towards achieving precision medicine.

Leading the pilot are researchers from the Centre for Research and Technology in Greece and the Karolinska Institutet in Sweden.

Read the press release

Unlocking insights in Cancer Research through Knowledge Graphs

Case Study

Unlocking insights in Cancer Research through Knowledge Graphs

By Stefania Amodeo

In a recent meeting with theEOSC4Cancer Cancer Landscape Partnering (CLP), SciLake took center stage as it introduced its vision and roadmap for unlocking insights in cancer research. Project coordinator,Thanasis Vergoulis (Athena RC), andLeily Rabbani, bioinformatician at the Department of Molecular Medicine and Surgery in Karolinska Institute, discussed the ongoing work towards creating aCancer Research Knowledge Graph. This innovative tool will provide context and connections for what is known about specific research questions, helping researchers as they design new experiments.

Mar 04, 2024

TheSciLake Cancer Research pilot involves the Institute of Applied Bioscience(INAB-CERTH) in Greece andKarolinska Institutet in Sweden. Focused on meeting the needs of researchers and clinicians, the project aims to harness the wealth of information available in public resources to address ongoing research questions.

The ultimate goal? To deepen our understanding of the molecular biology and immunopathology ofChronic Lymphocytic Leukemia (CLL) and study the potential effects of different mutations.

With the assistance of SciLake technical partners, members of the pilot project are utilising advanced algorithms to discover new insights from the knowledge graph. For example, one interesting question they are exploring is, "how might a specific genetic mutation forecast a patient's overall health status and what insights might related literature offer in this regard?"

Chronic Lymphocytic Leukemia (CLL)

Characterized by the accumulation of neoplastic B cells in the bone marrow, blood cells, and secondary lymph nodes.

Patients can have a very diverse genetic landscape leading to heterogeneous clinical outcomes. This means progression rates and responses to drugs can vary greatly among patients.

The most common type of leukemia in adults.

Currently incurable.

Knowledge Graph: Benefits

The use of a scientific knowledge graph offers several benefits. It empowers research in precision medicine and diagnostics by facilitating the discovery of potential associations between identified biomarkers and other elements, such as genes, biological or functional pathways, and drugs. Furthermore, it is easily deployable and flexible, capable of integrating data from various sources, thereby offering a comprehensive view of the research landscape.

Challenges

Developing a knowledge graph comes with its own set of challenges. The objective is to provide tools for creating and enriching the graph, and the primary concern is extracting latent knowledge to create the graph. Another significant challenge is establishing a common language among people of different expertise, such as clinicians and technical developers. This is crucial to facilitate effective communication and collaboration in the development and application of the knowledge graph. Finally, an important step to validate the graph involves manual curation to assess hidden associations and existing connections and ensure they are relevant to the specific biology experiment.

Where are we

The development of the knowledge graph is progressing by leveraging several pre-existing state-of-the-art knowledge graphs. One isPrimeKG, which is used to query networks of genes or proteins connected to a specific disease. For example, the graph shows connections between CLL and TP53, a gene known for its potential to increase the risk of various forms of cancer significantly when altered. Other larger state-of-the-art knowledge graphs, based on various biomedical databases, are also being integrated along with Prime KG. This strategy aims to capitalise on a broader set of databases and underlying connections, potentially uncovering new missing links. An example of this is the revealed relation between CLL and the gene SOD1, known for being overexpressed in many human cancers.

A variety of knowledge graphs exist, each drawing from a different biomedical source, and we can collect more information through their combination. In fact, many details are unique to a particular graph and there is minimal overlap between them.

Our data flow involves using a variety of knowledge graphs, including those previously mentioned, along with different ontologies and other data sources. We utilize tools provided by SciLake to establish connections among them and generate a comprehensive cancer knowledge graph.

Dataflow towards a CLL KG

Goals

The ultimate vision is to create a comprehensive network of interconnected nodes and relationships. These nodes can represent various entities such as institutions, grants, patents, publications, software, anatomical structures, diseases, drugs, compounds, gene targets, and many more. These relationships can take on different forms and can signify different types of connections such as mentions, associations, or other types of relationships. By creating this extensive web of connections, the network can be navigated and queried in a semi-automated manner to answer specific research questions. Moreover, the impact and reproducibility analysis services offered by SciLake can be utilized to prioritize findings.

This approach will enhance the understanding of the molecular biology and immunopathology of Chronic Lymphocytic Leukemia. It will also assist in studying the potential effects of different mutations. The advantage of this method lies in its ability to incorporate information from multiple sources simultaneously, offering a comprehensive and insightful analysis.