First version of the CLL-KG
Chronic Lymphocytic Leukemia - Knowledge Graph
Mapping key domain entities, their relationships, and valuable metadata attributes, using a subset of the data.
The CLL-KG is a curated subgraph of the Clinical Knowledge Graph (CKG), an open-source platform encompassing over 16 million nodes and 220 million relationships that integrate experimental data, public databases, and scientific literature. The graph is hosted on Neo4j, the world’s leading graph database platform.
Node labels include categories such as Disease, Drug, Gene and Protein, among others. Each node is enriched with specific properties that provide detailed information about its entity. The Disease node for angiosarcoma – a rare cancer originating in the inner lining of blood and lymphatic vessels – serves as an example of the detailed information represented in the graph.
Nodes in the graph are interconnected through edges, which represent various types of relationships. The example shown highlights a new relationship type, RELATED_TO, which we have introduced to represent Gene-Gene associations. In this case, the graph indicates that the gene IL15 is related to MYCN. Additional details about this connection can be explored through the relationship properties. According to the provenance property, this relationship comes from the BCMO dataset. Statistical measures such as p-value and person correlation related to this connection are also included.
Users can query the graph using Cypher, the query language for Neo4j. An example Cypher query is shown here, designed to extract a subnetwork centered around the disease chronic lymphocytic leukemia. The query retrieves Drug, Gene, and Protein nodes that are either directly or indirectly connected to this disease. Such a query yields a large and complex network with numerous nodes and relationships. For the purpose of this video, we’ve chosen to limit the number of nodes and connections displayed to enhance clarity.