RDF export & SPARQL queries¶
SPARQL is a query language used to retrieve and manipulate data stored in Resource Description Framework (RDF) format. In this tutorial, we demonstrate how lamindb registries can be queried with SPARQL.
Show code cell content
import warnings
warnings.filterwarnings("ignore")
# pip install 'lamindb[bionty]' rdflib
!lamin connect laminlabs/lamindata
→ connected lamindb: laminlabs/lamindata
import bionty as bt
from rdflib import Graph, Literal, RDF, URIRef
Generally, we need to build a directed RDF Graph composed of triple statements. Such a graph statement is represented by:
a node for the subject
an arc that goes from a subject to an object for the predicate
a node for the object.
Each of the three parts can be identified by a URI.
We can use the DataFrame
representation of lamindb registries to build a RDF graph.
Building a RDF graph¶
diseases = bt.Disease.df()
diseases.head()
→ connected lamindb: laminlabs/lamindata
uid | name | ontology_id | abbr | synonyms | description | space_id | source_id | run_id | created_at | created_by_id | _aux | _branch_code | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | |||||||||||||
172 | 4c6NK4On | acute disease | MONDO:0020683 | None | disease, acute|acute disease|acute diseases | Disease Having A Short And Relatively Severe C... | 1 | 76 | NaN | 2025-01-08 13:33:31.718016+00:00 | 2 | None | 1 |
171 | IfbzfDzV | non-Hodgkin lymphoma | MONDO:0018908 | None | non-Hodgkin lymphoma|non-Hodgkin's lymphoma|no... | Distinct From Hodgkin Lymphoma Both Morphologi... | 1 | 76 | NaN | 2025-01-08 13:33:31.717997+00:00 | 2 | None | 1 |
170 | 2AhKtWA4 | lymphoid hemopathy | MONDO:0015757 | None | None | None | 1 | 76 | NaN | 2025-01-08 13:33:31.717978+00:00 | 2 | None | 1 |
169 | 7EIZsogb | acute leukemia | MONDO:0010643 | None | acute leukaemia (disease)|acute leukemia|acute... | A Clonal (Malignant) Hematopoietic Disorder Wi... | 1 | 76 | NaN | 2025-01-08 13:33:31.717959+00:00 | 2 | None | 1 |
168 | 5fuD5lYR | T-cell leukemia | MONDO:0005525 | None | leukaemia (disease) of T cell|T cell leukemia ... | A Malignant Disease Of The T-Lymphocytes In Th... | 1 | 76 | NaN | 2025-01-08 13:33:31.717940+00:00 | 2 | None | 1 |
We convert the DataFrame to RDF by generating triples.
rdf_graph = Graph()
namespace = URIRef("http://sparql-example.org/")
for _, row in diseases.iterrows():
subject = URIRef(namespace + str(row["ontology_id"]))
rdf_graph.add((subject, RDF.type, URIRef(namespace + "Disease")))
rdf_graph.add((subject, URIRef(namespace + "name"), Literal(row["name"])))
rdf_graph.add(
(subject, URIRef(namespace + "description"), Literal(row["description"]))
)
rdf_graph
<Graph identifier=Nde160a2346d74052afc693948898f0bd (<class 'rdflib.graph.Graph'>)>
Now we can query the RDF graph using SPARQL for the name and associated description:
query = """
SELECT ?name ?description
WHERE {
?disease a <http://sparql-example.org/Disease> .
?disease <http://sparql-example.org/name> ?name .
?disease <http://sparql-example.org/description> ?description .
}
LIMIT 5
"""
for row in rdf_graph.query(query):
print(f"Name: {row.name}, Description: {row.description}")
Name: acute disease, Description: Disease Having A Short And Relatively Severe Course.
Name: non-Hodgkin lymphoma, Description: Distinct From Hodgkin Lymphoma Both Morphologically And Biologically, Non-Hodgkin Lymphoma (Nhl) Is Characterized By The Absence Of Reed-Sternberg Cells, Can Occur At Any Age, And Usually Presents As A Localized Or Generalized Lymphadenopathy Associated With Fever And Weight Loss. The Clinical Course Varies According To The Morphologic Type. Nhl Is Clinically Classified As Indolent, Aggressive, Or Having A Variable Clinical Course. Nhl Can Be Of B-Or T-/Nk-Cell Lineage.
Name: lymphoid hemopathy, Description: None
Name: acute leukemia, Description: A Clonal (Malignant) Hematopoietic Disorder With An Acute Onset, Affecting The Bone Marrow And The Peripheral Blood. The Malignant Cells Show Minimal Differentiation And Are Called Blasts, Either Myeloid Blasts (Myeloblasts) Or Lymphoid Blasts (Lymphoblasts).
Name: T-cell leukemia, Description: A Malignant Disease Of The T-Lymphocytes In The Bone Marrow, Thymus, And/Or Blood.