Disease¶
lamindb provides access to the following public Disease ontologies through bionty:
Here we show how to access and search Disease ontologies to standardize new data.
import bionty as bt
import pandas as pd
PublicOntology objects¶
Let us create a public ontology accessor with .public
method, which chooses a default public ontology source from Source
.
It’s a PublicOntology object, which you can think about as a public registry:
diseases = bt.Disease.public(organism="all")
diseases
→ connected lamindb: testuser1/test-public-ontologies
PublicOntology
Entity: Disease
Organism: all
Source: mondo, 2025-06-03
#terms: 30128
As for registries, you can export the ontology as a DataFrame
:
df = diseases.df()
df.head()
name | definition | synonyms | parents | |
---|---|---|---|---|
ontology_id | ||||
MONDO:0000001 | disease | A Disease Is A Disposition To Undergo Patholog... | disease or disorder|diseases|other disease|dis... | [] |
MONDO:0000002 | obsolete 46,XX sex reversal | None | None | [] |
MONDO:0000003 | obsolete 17-hydroxysteroid dehydrogenase defic... | None | None | [] |
MONDO:0000004 | adrenocortical insufficiency | An Endocrine Or Hormonal Disorder That Occurs ... | adrenal gland insufficiency|hypocortisolemia|a... | [MONDO:0002816] |
MONDO:0000005 | alopecia, isolated | None | None | [MONDO:0004907, MONDO:0100118] |
Unlike registries, you can also export it as a Pronto object via public.ontology
.
Look up terms¶
As for registries, terms can be looked up with auto-complete:
lookup = diseases.lookup()
The .
accessor provides normalized terms (lower case, only contains alphanumeric characters and underscores):
lookup.alzheimer_disease
Disease(ontology_id='MONDO:0004975', name='Alzheimer disease', definition='A Progressive, Neurodegenerative Disease Characterized By Loss Of Function And Death Of Nerve Cells In Several Areas Of The Brain Leading To Loss Of Cognitive Function Such As Memory And Language.', synonyms="Alzheimer dementia|Alzheimers disease|Alzheimer disease|Alzheimer's disease|AD|Alzheimers dementia|presenile and senile dementia|Alzheimer's dementia", parents=array(['MONDO:0001627', 'MONDO:0005574'], dtype=object))
To look up the exact original strings, convert the lookup object to dict and use the []
accessor:
lookup_dict = lookup.dict()
lookup_dict["Alzheimer disease"]
Disease(ontology_id='MONDO:0004975', name='Alzheimer disease', definition='A Progressive, Neurodegenerative Disease Characterized By Loss Of Function And Death Of Nerve Cells In Several Areas Of The Brain Leading To Loss Of Cognitive Function Such As Memory And Language.', synonyms="Alzheimer dementia|Alzheimers disease|Alzheimer disease|Alzheimer's disease|AD|Alzheimers dementia|presenile and senile dementia|Alzheimer's dementia", parents=array(['MONDO:0001627', 'MONDO:0005574'], dtype=object))
By default, the name
field is used to generate lookup keys. You can specify another field to look up:
lookup = diseases.lookup(diseases.ontology_id)
lookup.mondo_0004975
Disease(ontology_id='MONDO:0004975', name='Alzheimer disease', definition='A Progressive, Neurodegenerative Disease Characterized By Loss Of Function And Death Of Nerve Cells In Several Areas Of The Brain Leading To Loss Of Cognitive Function Such As Memory And Language.', synonyms="Alzheimer dementia|Alzheimers disease|Alzheimer disease|Alzheimer's disease|AD|Alzheimers dementia|presenile and senile dementia|Alzheimer's dementia", parents=array(['MONDO:0001627', 'MONDO:0005574'], dtype=object))
Search terms¶
Search behaves in the same way as it does for registries:
diseases.search("parkinson disease").head(3)
name | definition | synonyms | parents | |
---|---|---|---|---|
ontology_id | ||||
MONDO:0005180 | Parkinson disease | A Progressive Degenerative Disorder Of The Cen... | paralysis agitans|Parkinson disease|PD|Parkins... | [MONDO:0021095, MONDO:0100545] |
MONDO:0008199 | late-onset Parkinson disease | A Parkinson Disease That Begins After Around T... | LOPD|Parkinson disease, susceptibility to, Mul... | [MONDO:0005180] |
MONDO:0011613 | autosomal recessive early-onset Parkinson dise... | Any Parkinson Disease In Which The Cause Of Th... | PINK1 Parkinson disease|Parkinson disease caus... | [MONDO:0017279] |
By default, search also covers synonyms and all other fileds containing strings:
diseases.search("paralysis agitans").head(3)
name | definition | synonyms | parents | |
---|---|---|---|---|
ontology_id | ||||
MONDO:0005180 | Parkinson disease | A Progressive Degenerative Disorder Of The Cen... | paralysis agitans|Parkinson disease|PD|Parkins... | [MONDO:0021095, MONDO:0100545] |
MONDO:0008193 | paralysis agitans, juvenile, of Hunt | None | paralysis agitans, juvenile, of Hunt | [MONDO:0009830] |
Search specific field (by default, search is done on all fields containing strings):
diseases.search(
"progressive degenerative disorder of the central nervous system",
field=diseases.definition,
).head()
name | definition | synonyms | parents | |
---|---|---|---|---|
ontology_id | ||||
MONDO:0005180 | Parkinson disease | A Progressive Degenerative Disorder Of The Cen... | paralysis agitans|Parkinson disease|PD|Parkins... | [MONDO:0021095, MONDO:0100545] |
Standardize Disease identifiers¶
Let us generate a DataFrame
that stores a number of Disease identifiers, some of which corrupted:
df_orig = pd.DataFrame(
index=[
"supraglottis cancer",
"alexia",
"trigonitis",
"paranasal sinus disorder",
"This disease does not exist",
]
)
df_orig
supraglottis cancer |
---|
alexia |
trigonitis |
paranasal sinus disorder |
This disease does not exist |
We can check whether any of our values are validated against the ontology reference:
validated = diseases.validate(df_orig.index, diseases.name)
df_orig.index[~validated]
! 1 unique term (20.00%) is not validated: 'This disease does not exist'
Index(['This disease does not exist'], dtype='object')
Ontology source versions¶
For any given entity, we can choose from a number of versions:
bt.Source.filter(entity="bionty.Disease").df()
# only lists the sources that are currently used
bt.Source.filter(entity="bionty.Disease", currently_used=True).df()
uid | entity | organism | name | in_db | currently_used | description | url | md5 | source_website | space_id | dataframe_artifact_id | version | run_id | created_at | created_by_id | _aux | branch_id | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | ||||||||||||||||||
18 | IGIkseWQ | bionty.Disease | all | mondo | False | True | Mondo Disease Ontology | http://purl.obolibrary.org/obo/mondo/releases/... | None | https://mondo.monarchinitiative.org | 1 | None | 2025-06-03 | None | 2025-07-14 06:41:44.843000+00:00 | 1 | None | 1 |
19 | 4kswnHVF | bionty.Disease | human | doid | False | True | Human Disease Ontology | http://purl.obolibrary.org/obo/doid/releases/2... | None | https://disease-ontology.org | 1 | None | 2024-05-29 | None | 2025-07-14 06:41:44.843000+00:00 | 1 | None | 1 |
When instantiating a Bionty object, we can choose a source or version:
source = bt.Source.filter(
name="mondo", organism="all"
).first()
diseases= bt.Disease.public(source=source)
diseases
PublicOntology
Entity: Disease
Organism: all
Source: mondo, 2025-06-03
#terms: 30128
The currently used ontologies can be displayed using:
bt.Source.filter(currently_used=True).df()