Manage biological registries¶
This guide shows how to manage metadata for basic biological entities based on plugin bionty
.
# !pip install 'lamindb[bionty]'
!lamin init --storage ./test-registries --schema bionty
Show code cell output
→ connected lamindb: testuser1/test-registries
import lamindb as ln
import bionty as bt
→ connected lamindb: testuser1/test-registries
Seed registries with public ontologies¶
Let’s first populate our CellType
registry with the configured public ontology (Cell Ontology):
# check configured public ontology
bt.Source.filter(entity="bionty.CellType", currently_used=True).one()
Source(uid='1Lhf', entity='bionty.CellType', organism='all', name='cl', version='2024-05-15', in_db=False, currently_used=True, description='Cell Ontology', url='http://purl.obolibrary.org/obo/cl/releases/2024-05-15/cl.owl', md5='8a8638a9e79567935793e5007704c650', source_website='https://obophenotype.github.io/cell-ontology', created_by_id=1, created_at=2024-11-21 05:37:17 UTC)
# populate the database with the public ontology
bt.CellType.import_source()
This is now your in-house CellType registry:
# all public cell types are now available in LaminDB
bt.CellType.df()
Show code cell output
uid | name | ontology_id | abbr | synonyms | description | source_id | run_id | created_at | created_by_id | |
---|---|---|---|---|---|---|---|---|---|---|
id | ||||||||||
2931 | 6GaGU793 | subpial interlaminar astrocyte | CL:4042011 | None | None | An Interlaminar Astrocyte Type Whose Soma Is P... | 32 | None | 2024-11-21 05:37:28.629546+00:00 | 1 |
2930 | 3afgdSa3 | pial interlaminar astrocyte | CL:4042010 | None | None | An Interlaminar Astrocyte Whose Soma Is Part O... | 32 | None | 2024-11-21 05:37:28.629469+00:00 | 1 |
2929 | n7ezKRlq | interlaminar astrocyte | CL:4042009 | None | None | An Astrocyte Type That Presents Radial Protrus... | 32 | None | 2024-11-21 05:37:28.629391+00:00 | 1 |
2928 | 6OXayYqP | fibrous astrocyte | CL:4042008 | None | None | A Cell Type Located In The First Layer Of The ... | 32 | None | 2024-11-21 05:37:28.629314+00:00 | 1 |
2927 | 1mO1QVeh | protoplasmic astrocyte | CL:4042007 | None | None | An Astrocyte With Highly Branched Protrusions,... | 32 | None | 2024-11-21 05:37:28.629237+00:00 | 1 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
2836 | 42PLafOv | L2/3 intratelencephalic projecting glutamaterg... | CL:4030059 | None | None | A Transcriptomically Distinct Intratelencephal... | 32 | None | 2024-11-21 05:37:28.618185+00:00 | 1 |
2835 | 16Y7GlBk | TCR-positive macrophage | CL:4030058 | None | T cell receptor positive macrophage|TCR+ macro... | A Macrophage That Expresses The T Cell Recepto... | 32 | None | 2024-11-21 05:37:28.618108+00:00 | 1 |
2834 | 4wmW6vv9 | eccentric medium spiny neuron | CL:4030057 | None | eccentric spiny projection neuron | A Medium Spiny Neuron That Exhibits Transcript... | 32 | None | 2024-11-21 05:37:28.618030+00:00 | 1 |
2833 | 7mYcRDsu | umbrella cell of urothelium | CL:4030056 | None | facet cell of urothelium|superficial cell of u... | A Urothelial Cell That Is Terminally Different... | 32 | None | 2024-11-21 05:37:28.617953+00:00 | 1 |
2832 | 1Qd9Bmkj | intermediate cell of urothelium | CL:4030055 | None | urothelial intermediate cell | A Urothelial Cell That Is Part Of The Regenera... | 32 | None | 2024-11-21 05:37:28.617875+00:00 | 1 |
100 rows × 10 columns
# similarly, let's populate the Gene registry with human and mouse genes
bt.Gene.import_source(organism="human")
bt.Gene.import_source(organism="mouse")
Access records in in-house registries¶
Search key words:
bt.CellType.search("gamma-delta T").df().head(2)
uid | name | ontology_id | abbr | synonyms | description | source_id | run_id | created_at | created_by_id | |
---|---|---|---|---|---|---|---|---|---|---|
id | ||||||||||
780 | 1HuNn2EP | gamma-delta T cell | CL:0000798 | None | gammadelta T cell|gamma-delta T-cell|gamma-del... | A T Cell That Expresses A Gamma-Delta T Cell R... | 32 | None | 2024-11-21 05:37:28.367570+00:00 | 1 |
781 | 70lHcCNw | immature gamma-delta T cell | CL:0000799 | None | immature gamma-delta T lymphocyte|immature gam... | A Gamma-Delta T Cell That Has An Immature Phen... | 32 | None | 2024-11-21 05:37:28.367647+00:00 | 1 |
Or look up with auto-complete:
cell_types = bt.CellType.lookup()
hsc_record = cell_types.hematopoietic_stem_cell
hsc_record
CellType(uid='2U8xapxu', name='hematopoietic stem cell', ontology_id='CL:0000037', synonyms='blood forming stem cell|hemopoietic stem cell', description='A Stem Cell From Which All Cells Of The Lymphoid And Myeloid Lineages Develop, Including Blood Cells And Cells Of The Immune System. Hematopoietic Stem Cells Lack Cell Markers Of Effector Cells (Lin-Negative). Lin-Negative Is Defined By Lacking One Or More Of The Following Cell Surface Markers: Cd2, Cd3 Epsilon, Cd4, Cd5 ,Cd8 Alpha Chain, Cd11B, Cd14, Cd19, Cd20, Cd56, Ly6G, Ter119.', created_by_id=1, source_id=32, created_at=2024-11-21 05:37:28 UTC)
Filter by fields and relationships:
gdt_cell = bt.CellType.get(ontology_id="CL:0000798", created_by__handle="testuser1")
gdt_cell
CellType(uid='1HuNn2EP', name='gamma-delta T cell', ontology_id='CL:0000798', synonyms='gammadelta T cell|gamma-delta T-cell|gamma-delta T-lymphocyte|gamma-delta T lymphocyte', description='A T Cell That Expresses A Gamma-Delta T Cell Receptor Complex.', created_by_id=1, source_id=32, created_at=2024-11-21 05:37:28 UTC)
View the ontological hierarchy:
gdt_cell.view_parents() # pass with_children=True to also view children
Or access the parents and children directly:
gdt_cell.parents.df()
Show code cell output
uid | name | ontology_id | abbr | synonyms | description | source_id | run_id | created_at | created_by_id | |
---|---|---|---|---|---|---|---|---|---|---|
id | ||||||||||
83 | 22LvKd01 | T cell | CL:0000084 | None | T lymphocyte|T-lymphocyte|T-cell | A Type Of Lymphocyte Whose Defining Characteri... | 32 | None | 2024-11-21 05:37:28.282099+00:00 | 1 |
gdt_cell.children.df()
Show code cell output
uid | name | ontology_id | abbr | synonyms | description | source_id | run_id | created_at | created_by_id | |
---|---|---|---|---|---|---|---|---|---|---|
id | ||||||||||
781 | 70lHcCNw | immature gamma-delta T cell | CL:0000799 | None | immature gamma-delta T lymphocyte|immature gam... | A Gamma-Delta T Cell That Has An Immature Phen... | 32 | None | 2024-11-21 05:37:28.367647+00:00 | 1 |
782 | 3W6NKGpW | mature gamma-delta T cell | CL:0000800 | None | mature gamma-delta T-cell|mature gamma-delta T... | A Gamma-Delta T Cell That Has A Mature Phenoty... | 32 | None | 2024-11-21 05:37:28.367723+00:00 | 1 |
1465 | 26icgrTr | gamma-delta thymocyte | CL:0002405 | None | gammadelta thymocyte|gd thymocyte | A Post-Natal Thymocyte Expressing Components O... | 32 | None | 2024-11-21 05:37:28.452088+00:00 | 1 |
It is also possible to recursively query parents or children, getting direct parents (children), their parents, and so forth.
gdt_cell.query_parents().df()
Show code cell output
uid | name | ontology_id | abbr | synonyms | description | source_id | run_id | created_at | created_by_id | |
---|---|---|---|---|---|---|---|---|---|---|
id | ||||||||||
1 | 4bKGljt0 | cell | CL:0000000 | None | None | A Material Entity Of Anatomical Origin (Part O... | 32 | None | 2024-11-21 05:37:28.275470+00:00 | 1 |
83 | 22LvKd01 | T cell | CL:0000084 | None | T lymphocyte|T-lymphocyte|T-cell | A Type Of Lymphocyte Whose Defining Characteri... | 32 | None | 2024-11-21 05:37:28.282099+00:00 | 1 |
214 | 2K93w3xO | motile cell | CL:0000219 | None | None | A Cell That Moves By Its Own Activities. | 32 | None | 2024-11-21 05:37:28.300453+00:00 | 1 |
221 | 2cXC7cgF | single nucleate cell | CL:0000226 | None | None | A Cell With A Single Nucleus. | 32 | None | 2024-11-21 05:37:28.301009+00:00 | 1 |
250 | 4WnpvUTH | eukaryotic cell | CL:0000255 | None | None | Any Cell That Only Exists In Eukaryota. | 32 | None | 2024-11-21 05:37:28.303224+00:00 | 1 |
529 | X6c7osZ5 | lymphocyte | CL:0000542 | None | None | A Lymphocyte Is A Leukocyte Commonly Found In ... | 32 | None | 2024-11-21 05:37:28.336392+00:00 | 1 |
721 | 3VEAlFdi | leukocyte | CL:0000738 | None | leucocyte|white blood cell | An Achromatic Cell Of The Myeloid Or Lymphoid ... | 32 | None | 2024-11-21 05:37:28.362975+00:00 | 1 |
822 | 2Jgr5Xx4 | mononuclear cell | CL:0000842 | None | mononuclear leukocyte | A Leukocyte With A Single Non-Segmented Nucleu... | 32 | None | 2024-11-21 05:37:28.374659+00:00 | 1 |
967 | 4Ilrnj9U | hematopoietic cell | CL:0000988 | None | haemopoietic cell|haematopoietic cell|hemopoie... | A Cell Of A Hematopoietic Lineage. | 32 | None | 2024-11-21 05:37:28.389773+00:00 | 1 |
1303 | u3sr1Gdf | nucleate cell | CL:0002242 | None | None | A Cell Containing At Least One Nucleus. | 32 | None | 2024-11-21 05:37:28.431640+00:00 | 1 |
gdt_cell.query_children().df()
Show code cell output
uid | name | ontology_id | abbr | synonyms | description | source_id | run_id | created_at | created_by_id | |
---|---|---|---|---|---|---|---|---|---|---|
id | ||||||||||
781 | 70lHcCNw | immature gamma-delta T cell | CL:0000799 | None | immature gamma-delta T lymphocyte|immature gam... | A Gamma-Delta T Cell That Has An Immature Phen... | 32 | None | 2024-11-21 05:37:28.367647+00:00 | 1 |
782 | 3W6NKGpW | mature gamma-delta T cell | CL:0000800 | None | mature gamma-delta T-cell|mature gamma-delta T... | A Gamma-Delta T Cell That Has A Mature Phenoty... | 32 | None | 2024-11-21 05:37:28.367723+00:00 | 1 |
783 | 2xXcHDQq | gamma-delta intraepithelial T cell | CL:0000801 | None | gamma-delta intraepithelial T-lymphocyte|gamma... | A Mature Gamma-Delta T Cell That Is Found In T... | 32 | None | 2024-11-21 05:37:28.367802+00:00 | 1 |
784 | 6fdlvmJ3 | CD8-alpha alpha positive, gamma-delta intraepi... | CL:0000802 | None | CD8-positive, gamma-delta intraepithelial T-ce... | A Gamma-Delta Intraepithelial T Cell That Has ... | 32 | None | 2024-11-21 05:37:28.367879+00:00 | 1 |
785 | 1mNzVotO | CD4-negative CD8-negative gamma-delta intraepi... | CL:0000803 | None | CD4-positive, gamma-delta intraepithelial T-ly... | A Gamma-Delta Intraepithelial T Cell That Has ... | 32 | None | 2024-11-21 05:37:28.367956+00:00 | 1 |
895 | 1tYOPZxH | dendritic epidermal T cell | CL:0000916 | None | DETC|dendritic epidermal T-lymphocyte|dendriti... | A Mature Gamma-Delta T Cell Located In The Epi... | 32 | None | 2024-11-21 05:37:28.380323+00:00 | 1 |
1189 | 7MDv71IV | CD27-positive gamma-delta T cell | CL:0002124 | None | gd27-positive|gammadelta27-positive | A Circulating Gamma-Delta T Cell That Is Cd27-... | 32 | None | 2024-11-21 05:37:28.418812+00:00 | 1 |
1190 | 1DEERh4L | CD27-negative gamma-delta T cell | CL:0002125 | None | gammadelta-17 cells | A Circulating Gamma-Delta T Cell That Expresse... | 32 | None | 2024-11-21 05:37:28.418890+00:00 | 1 |
1191 | 3efemme8 | CD25-positive, CD27-positive immature gamma-de... | CL:0002126 | None | None | A Cd25-Positive, Cd27-Positive Immature Gamma-... | 32 | None | 2024-11-21 05:37:28.418967+00:00 | 1 |
1465 | 26icgrTr | gamma-delta thymocyte | CL:0002405 | None | gammadelta thymocyte|gd thymocyte | A Post-Natal Thymocyte Expressing Components O... | 32 | None | 2024-11-21 05:37:28.452088+00:00 | 1 |
1466 | 4hrSce5T | immature Vgamma2-positive thymocyte | CL:0002406 | None | None | A Double Negative Post-Natal Thymocyte That Ha... | 32 | None | 2024-11-21 05:37:28.452166+00:00 | 1 |
1467 | 76CEFg3A | mature Vgamma2-positive thymocyte | CL:0002407 | None | Vgamma2-positive | A Thymocyte That Has A T Cell Receptor Consist... | 32 | None | 2024-11-21 05:37:28.452243+00:00 | 1 |
1468 | 3ABJ1l1O | immature Vgamma2-negative thymocyte | CL:0002408 | None | None | A Double Negative Post-Natal Thymocyte That Ha... | 32 | None | 2024-11-21 05:37:28.452321+00:00 | 1 |
1469 | 6RBJq86b | mature Vgamma2-negative thymocyte | CL:0002409 | None | Vgamma2-negative | A Thymocyte That Has A T Cell Receptor Consist... | 32 | None | 2024-11-21 05:37:28.452403+00:00 | 1 |
1471 | 64PCjpkJ | Vgamma1.1-positive, Vdelta6.3-negative thymocyte | CL:0002411 | None | Vg1.1-positive, Vd6.3-negative T cell | A Gamma-Delta Receptor That Expresses Vgamma1.... | 32 | None | 2024-11-21 05:37:28.452576+00:00 | 1 |
1472 | 6vYlL7zk | Vgamma1.1-positive, Vdelta6.3-positive thymocyte | CL:0002412 | None | Vg1.1+Vd6.3+ T cell | A Gamma-Delta Receptor That Expresses Vgamma1.... | 32 | None | 2024-11-21 05:37:28.452658+00:00 | 1 |
1473 | 4cYNDr25 | mature Vgamma1.1-positive, Vdelta6.3-negative ... | CL:0002413 | None | None | A Vgamma1.1-Positive, Vdelta6.3-Negative Thymo... | 32 | None | 2024-11-21 05:37:28.452736+00:00 | 1 |
1474 | 5pDjyjfF | immature Vgamma1.1-positive, Vdelta6.3-negativ... | CL:0002414 | None | None | A Vgamma1.1-Positive, Vdelta6.3-Negative Thymo... | 32 | None | 2024-11-21 05:37:28.452813+00:00 | 1 |
1475 | 2SYX59uO | immature Vgamma1.1-positive, Vdelta6.3-positiv... | CL:0002415 | None | immature Vg1.1+Vd6.3+ T cell | A Vgamma1.1-Positive, Vdelta6.3-Positive Thymo... | 32 | None | 2024-11-21 05:37:28.452891+00:00 | 1 |
1476 | 6JxpxGgM | mature Vgamma1.1-positive, Vdelta6.3-positive ... | CL:0002416 | None | mature Vg1.1+Vd6.3+ T cell | A Vgamma1.1-Positive, Vdelta6.3-Positive Thymo... | 32 | None | 2024-11-21 05:37:28.452968+00:00 | 1 |
1572 | 1jlK4jJ9 | Vgamma5-positive CD8alpha alpha positive gamma... | CL:0002513 | None | tgd.vg5+.IEL | A Cd8Alpha Alpha Positive Gamma-Delta Intraepi... | 32 | None | 2024-11-21 05:37:28.464312+00:00 | 1 |
1573 | E2koIf0l | Vgamma5-negative CD8alpha alpha positive gamma... | CL:0002514 | None | tgd.vg5-.IEL | A Cd8Alpha Alpha Positive Gamma-Delta Intraepi... | 32 | None | 2024-11-21 05:37:28.464390+00:00 | 1 |
You can construct custom hierarchies of records:
# register a new cell type
my_celltype = bt.CellType(name="my new T-cell subtype").save()
# specify "gamma-delta T cell" as a parent
my_celltype.parents.add(gdt_cell)
# visualize hierarchy
gdt_cell.view_parents(distance=2, with_children=True)
Create records from values¶
When accessing datasets, one often encounters bulk references to entities that might be corrupted or standardized using different standardization schemes.
Let’s consider an example based on an AnnData
object, in the cell_type
annotations of this AnnData
object, we find 4 references to cell types:
adata = ln.core.datasets.anndata_with_obs()
adata.obs.cell_type.value_counts()
Show code cell output
cell_type
T cell 10
hematopoietic stem cell 10
hepatocyte 10
my new cell type 10
Name: count, dtype: int64
We’d like to load the corresponding records in our in-house registry to annotate a dataset.
To this end, you’ll typically use from_values
, which will both validate & retrieve records that match the values.
cell_types = bt.CellType.from_values(adata.obs.cell_type)
cell_types
Show code cell output
! did not create CellType record for 1 non-validated name: 'my new cell type'
[CellType(uid='22LvKd01', name='T cell', ontology_id='CL:0000084', synonyms='T lymphocyte|T-lymphocyte|T-cell', description='A Type Of Lymphocyte Whose Defining Characteristic Is The Expression Of A T Cell Receptor Complex.', created_by_id=1, source_id=32, created_at=2024-11-21 05:37:28 UTC),
CellType(uid='2U8xapxu', name='hematopoietic stem cell', ontology_id='CL:0000037', synonyms='blood forming stem cell|hemopoietic stem cell', description='A Stem Cell From Which All Cells Of The Lymphoid And Myeloid Lineages Develop, Including Blood Cells And Cells Of The Immune System. Hematopoietic Stem Cells Lack Cell Markers Of Effector Cells (Lin-Negative). Lin-Negative Is Defined By Lacking One Or More Of The Following Cell Surface Markers: Cd2, Cd3 Epsilon, Cd4, Cd5 ,Cd8 Alpha Chain, Cd11B, Cd14, Cd19, Cd20, Cd56, Ly6G, Ter119.', created_by_id=1, source_id=32, created_at=2024-11-21 05:37:28 UTC),
CellType(uid='7hggmgo1', name='hepatocyte', ontology_id='CL:0000182', description='The Main Structural Component Of The Liver. They Are Specialized Epithelial Cells That Are Organized Into Interconnected Plates Called Lobules. Majority Of Cell Population Of Liver, Polygonal In Shape, Arranged In Plates Or Trabeculae Between Sinusoids; May Have Single Nucleus Or Binucleated.', created_by_id=1, source_id=32, created_at=2024-11-21 05:37:28 UTC)]
Logging informed us that 3 cell types were validated. Since we loaded these records at the same time, we could readily use them to annotate a dataset.
What happened under-the-hood?
.from_values()
performs the following look ups:
If registry records match the values, load these records
If values match synonyms of registry records, load these records
If no record in the registry matches, attempt to load records from a public ontology
Same as 3. but based on synonyms
No records will be returned if all 4 look ups are unsuccessful.
Sometimes, it’s useful to treat validated records differently from non-validated records. Here is a way:
original_values = ["gut", "gut2"]
inspector = bt.Tissue.inspect(original_values)
records_from_validated_values = bt.Tissue.from_values(inspector.validated)
Alternatively, we can retrieve records based on ontology ids:
adata.obs.cell_type_id.unique().tolist()
Show code cell output
['CL:0000084', 'CL:0000037', 'CL:0000182', '']
bt.CellType.from_values(adata.obs.cell_type_id, field=bt.CellType.ontology_id)
Show code cell output
[CellType(uid='2U8xapxu', name='hematopoietic stem cell', ontology_id='CL:0000037', synonyms='blood forming stem cell|hemopoietic stem cell', description='A Stem Cell From Which All Cells Of The Lymphoid And Myeloid Lineages Develop, Including Blood Cells And Cells Of The Immune System. Hematopoietic Stem Cells Lack Cell Markers Of Effector Cells (Lin-Negative). Lin-Negative Is Defined By Lacking One Or More Of The Following Cell Surface Markers: Cd2, Cd3 Epsilon, Cd4, Cd5 ,Cd8 Alpha Chain, Cd11B, Cd14, Cd19, Cd20, Cd56, Ly6G, Ter119.', created_by_id=1, source_id=32, created_at=2024-11-21 05:37:28 UTC),
CellType(uid='22LvKd01', name='T cell', ontology_id='CL:0000084', synonyms='T lymphocyte|T-lymphocyte|T-cell', description='A Type Of Lymphocyte Whose Defining Characteristic Is The Expression Of A T Cell Receptor Complex.', created_by_id=1, source_id=32, created_at=2024-11-21 05:37:28 UTC),
CellType(uid='7hggmgo1', name='hepatocyte', ontology_id='CL:0000182', description='The Main Structural Component Of The Liver. They Are Specialized Epithelial Cells That Are Organized Into Interconnected Plates Called Lobules. Majority Of Cell Population Of Liver, Polygonal In Shape, Arranged In Plates Or Trabeculae Between Sinusoids; May Have Single Nucleus Or Binucleated.', created_by_id=1, source_id=32, created_at=2024-11-21 05:37:28 UTC)]
Validate & standardize¶
Simple validation of an iterable of values works like so:
bt.CellType.validate(["fat cell", "blood forming stem cell"])
Show code cell output
! 2 unique terms (100.00%) are not validated for name: 'fat cell', 'blood forming stem cell'
array([False, False])
Because these values don’t comply with the registry, they’re not validated!
You can easily convert these values to validated standardized names based on synonyms like so:
bt.CellType.standardize(["fat cell", "blood forming stem cell"])
Show code cell output
['adipocyte', 'hematopoietic stem cell']
Alternatively, you can use .from_values()
, which will only ever return validated records and automatically standardize under-the-hood:
bt.CellType.from_values(["fat cell", "blood forming stem cell"])
Show code cell output
[CellType(uid='wdLgwUXo', name='adipocyte', ontology_id='CL:0000136', synonyms='fat cell|adipose cell', description='A Fat-Storing Cell Found Mostly In The Abdominal Cavity And Subcutaneous Tissue Of Mammals. Fat Is Usually Stored In The Form Of Triglycerides.', created_by_id=1, source_id=32, created_at=2024-11-21 05:37:28 UTC),
CellType(uid='2U8xapxu', name='hematopoietic stem cell', ontology_id='CL:0000037', synonyms='blood forming stem cell|hemopoietic stem cell', description='A Stem Cell From Which All Cells Of The Lymphoid And Myeloid Lineages Develop, Including Blood Cells And Cells Of The Immune System. Hematopoietic Stem Cells Lack Cell Markers Of Effector Cells (Lin-Negative). Lin-Negative Is Defined By Lacking One Or More Of The Following Cell Surface Markers: Cd2, Cd3 Epsilon, Cd4, Cd5 ,Cd8 Alpha Chain, Cd11B, Cd14, Cd19, Cd20, Cd56, Ly6G, Ter119.', created_by_id=1, source_id=32, created_at=2024-11-21 05:37:28 UTC)]
If you are now sure what to do, use .inspect()
to get instructions:
bt.CellType.inspect(["fat cell", "blood forming stem cell"]);
Show code cell output
! 2 unique terms (100.00%) are not validated for name: 'fat cell', 'blood forming stem cell'
detected 2 unique terms with synonyms: fat cell, blood forming stem cell
→ standardize terms via .standardize()
We can also add new synonyms to a record like so:
hsc_record.add_synonym("HSC")
And when we encounter this synonym as a value, it will now be standardized using synonyms-lookup, and mapped on the correct registry record:
bt.CellType.standardize(["HSC"])
Show code cell output
['hematopoietic stem cell']
A special synonym is .abbr
(short for abbreviation), which has its own field and can be assigned via:
hsc_record.set_abbr("HSC")
You can create a lookup object from the .abbr
field:
cell_types = bt.CellType.lookup("abbr")
hsc = cell_types.hsc
hsc
Show code cell output
CellType(uid='2U8xapxu', name='hematopoietic stem cell', ontology_id='CL:0000037', abbr='HSC', synonyms='hemopoietic stem cell|HSC|blood forming stem cell', description='A Stem Cell From Which All Cells Of The Lymphoid And Myeloid Lineages Develop, Including Blood Cells And Cells Of The Immune System. Hematopoietic Stem Cells Lack Cell Markers Of Effector Cells (Lin-Negative). Lin-Negative Is Defined By Lacking One Or More Of The Following Cell Surface Markers: Cd2, Cd3 Epsilon, Cd4, Cd5 ,Cd8 Alpha Chain, Cd11B, Cd14, Cd19, Cd20, Cd56, Ly6G, Ter119.', created_by_id=1, source_id=32, created_at=2024-11-21 05:37:28 UTC)
The same workflow works for all of bionty
’s registries.
Manage registries across organisms¶
Several registries are organism-aware (has a .organism
field), for instance, Gene
.
In this case, API calls that interact with multi-organism registries require an organism
argument when there’s ambiguity.
For instance, when validating gene symbols:
bt.Gene.validate(["TCF7", "ABC1"], organism="human")
Show code cell output
! 1 unique term (50.00%) is not validated for symbol: 'ABC1'
array([ True, False])
In contrary, working with Ensembl Gene IDs doesn’t require passing organism
, as there’s no ambiguity:
bt.Gene.validate(["ENSG00000000419", "ENSMUSG00002076988"], field=bt.Gene.ensembl_gene_id)
array([ True, True])
When working with the same organism throughout your analysis/workflow, you can omit the organism
argument by configuring it globally:
bt.settings.organism = "mouse"
bt.Gene.from_source(symbol="Ap5b1")
Gene(uid='3b8mHb0MRal4', symbol='Ap5b1', ensembl_gene_id='ENSMUSG00000049562', ncbi_gene_ids='381201', biotype='protein_coding', synonyms='Gm962', description='adaptor-related protein complex 5, beta 1 subunit ', created_by_id=1, source_id=15, organism_id=2, created_at=2024-11-21 05:39:46 UTC)
Track underlying ontology source versions¶
Under-the-hood, source ontology versions are automatically tracked for each registry:
bt.Source.filter(currently_used=True).df()
Show code cell output
uid | entity | organism | name | version | in_db | currently_used | description | url | md5 | source_website | dataframe_artifact_id | run_id | created_at | created_by_id | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | |||||||||||||||
1 | 33TU | bionty.Organism | vertebrates | ensembl | release-112 | False | True | Ensembl | https://ftp.ensembl.org/pub/release-112/specie... | 0ec37e77f4bc2d0b0b47c6c62b9f122d | https://www.ensembl.org | None | None | 2024-11-21 05:37:17.923282+00:00 | 1 |
6 | 6bbV | bionty.Organism | bacteria | ensembl | release-57 | False | True | Ensembl | https://ftp.ensemblgenomes.ebi.ac.uk/pub/bacte... | ee28510ed5586ea7ab4495717c96efc8 | https://www.ensembl.org | None | None | 2024-11-21 05:37:17.923737+00:00 | 1 |
7 | 6s9n | bionty.Organism | fungi | ensembl | release-57 | False | True | Ensembl | http://ftp.ensemblgenomes.org/pub/fungi/releas... | dbcde58f4396ab8b2480f7fe9f83df8a | https://www.ensembl.org | None | None | 2024-11-21 05:37:17.923813+00:00 | 1 |
8 | 2PmT | bionty.Organism | metazoa | ensembl | release-57 | False | True | Ensembl | http://ftp.ensemblgenomes.org/pub/metazoa/rele... | 424636a574fec078a61cbdddb05f9132 | https://www.ensembl.org | None | None | 2024-11-21 05:37:17.923888+00:00 | 1 |
9 | 7GPH | bionty.Organism | plants | ensembl | release-57 | False | True | Ensembl | https://ftp.ensemblgenomes.ebi.ac.uk/pub/plant... | eadaa1f3e527e4c3940c90c7fa5c8bf4 | https://www.ensembl.org | None | None | 2024-11-21 05:37:17.923964+00:00 | 1 |
10 | 4tsk | bionty.Organism | all | ncbitaxon | 2023-06-20 | False | True | NCBItaxon Ontology | s3://bionty-assets/df_all__ncbitaxon__2023-06-... | 00d97ba65627f1cd65636d2df22ea76c | https://github.com/obophenotype/ncbitaxon | None | None | 2024-11-21 05:37:17.924040+00:00 | 1 |
11 | 4UGN | bionty.Gene | human | ensembl | release-112 | False | True | Ensembl | s3://bionty-assets/df_human__ensembl__release-... | 4ccda4d88720a326737376c534e8446b | https://www.ensembl.org | None | None | 2024-11-21 05:37:17.924116+00:00 | 1 |
15 | 4r4f | bionty.Gene | mouse | ensembl | release-112 | False | True | Ensembl | s3://bionty-assets/df_mouse__ensembl__release-... | 519cf7b8acc3c948274f66f3155a3210 | https://www.ensembl.org | None | None | 2024-11-21 05:37:17.924418+00:00 | 1 |
19 | 4RPA | bionty.Gene | saccharomyces cerevisiae | ensembl | release-112 | False | True | Ensembl | s3://bionty-assets/df_saccharomyces cerevisiae... | 11775126b101233525a0a9e2dd64edae | https://www.ensembl.org | None | None | 2024-11-21 05:37:17.924765+00:00 | 1 |
22 | 3EYy | bionty.Protein | human | uniprot | 2024-03 | False | True | Uniprot | s3://bionty-assets/df_human__uniprot__2024-03_... | b5b9e7645065b4b3187114f07e3f402f | https://www.uniprot.org | None | None | 2024-11-21 05:37:17.924991+00:00 | 1 |
25 | 01RW | bionty.Protein | mouse | uniprot | 2024-03 | False | True | Uniprot | s3://bionty-assets/df_mouse__uniprot__2024-03_... | b1b6a196eb853088d36198d8e3749ec4 | https://www.uniprot.org | None | None | 2024-11-21 05:37:17.925219+00:00 | 1 |
28 | 3kDh | bionty.CellMarker | human | cellmarker | 2.0 | False | True | CellMarker | s3://bionty-assets/human_cellmarker_2.0_CellMa... | d565d4a542a5c7e7a06255975358e4f4 | http://bio-bigdata.hrbmu.edu.cn/CellMarker | None | None | 2024-11-21 05:37:17.925446+00:00 | 1 |
29 | 7bV5 | bionty.CellMarker | mouse | cellmarker | 2.0 | False | True | CellMarker | s3://bionty-assets/mouse_cellmarker_2.0_CellMa... | 189586732c63be949e40dfa6a3636105 | http://bio-bigdata.hrbmu.edu.cn/CellMarker | None | None | 2024-11-21 05:37:17.925521+00:00 | 1 |
30 | 6LyR | bionty.CellLine | all | clo | 2022-03-21 | False | True | Cell Line Ontology | https://data.bioontology.org/ontologies/CLO/su... | ea58a1010b7e745702a8397a526b3a33 | https://bioportal.bioontology.org/ontologies/CLO | None | None | 2024-11-21 05:37:17.925596+00:00 | 1 |
32 | 1Lhf | bionty.CellType | all | cl | 2024-05-15 | True | True | Cell Ontology | http://purl.obolibrary.org/obo/cl/releases/202... | 8a8638a9e79567935793e5007704c650 | https://obophenotype.github.io/cell-ontology | None | None | 2024-11-21 05:37:17.925749+00:00 | 1 |
40 | MUtA | bionty.Tissue | all | uberon | 2024-08-07 | False | True | Uberon multi-species anatomy ontology | http://purl.obolibrary.org/obo/uberon/releases... | http://obophenotype.github.io/uberon | None | None | 2024-11-21 05:37:17.926351+00:00 | 1 | |
49 | 2L2r | bionty.Disease | all | mondo | 2024-06-04 | False | True | Mondo Disease Ontology | http://purl.obolibrary.org/obo/mondo/releases/... | c47e8edb894c01f2511dfe0751fbc428 | https://mondo.monarchinitiative.org | None | None | 2024-11-21 05:37:17.927031+00:00 | 1 |
57 | 4ksw | bionty.Disease | human | doid | 2024-05-29 | False | True | Human Disease Ontology | http://purl.obolibrary.org/obo/doid/releases/2... | bbefd72247d638edfcd31ec699947407 | https://disease-ontology.org | None | None | 2024-11-21 05:37:17.927643+00:00 | 1 |
65 | 2a1H | bionty.ExperimentalFactor | all | efo | 3.70.0 | False | True | The Experimental Factor Ontology | http://www.ebi.ac.uk/efo/releases/v3.70.0/efo.owl | https://bioportal.bioontology.org/ontologies/EFO | None | None | 2024-11-21 05:37:17.932430+00:00 | 1 | |
72 | 48fB | bionty.Phenotype | human | hp | 2024-04-26 | False | True | Human Phenotype Ontology | https://github.com/obophenotype/human-phenotyp... | e0f2e534eb2ad44a4d45573ef27b508f | https://hpo.jax.org | None | None | 2024-11-21 05:37:17.932989+00:00 | 1 |
77 | 4t7Q | bionty.Phenotype | mammalian | mp | 2024-06-18 | False | True | Mammalian Phenotype Ontology | https://github.com/mgijax/mammalian-phenotype-... | 795d8378fe48ec13b41d01a86dd1c86c | https://github.com/mgijax/mammalian-phenotype-... | None | None | 2024-11-21 05:37:17.933356+00:00 | 1 |
80 | sqPX | bionty.Phenotype | zebrafish | zp | 2024-04-18 | False | True | Zebrafish Phenotype Ontology | https://github.com/obophenotype/zebrafish-phen... | 2231ebaa95becf8ff34a33c95a8d4350 | https://github.com/obophenotype/zebrafish-phen... | None | None | 2024-11-21 05:37:17.933574+00:00 | 1 |
84 | 6S4q | bionty.Phenotype | all | pato | 2024-03-28 | False | True | Phenotype And Trait Ontology | http://purl.obolibrary.org/obo/pato/releases/2... | 6b1eaacd3d453b34375ce2e31c16328a | https://github.com/pato-ontology/pato | None | None | 2024-11-21 05:37:17.933865+00:00 | 1 |
86 | 7Ent | bionty.Pathway | all | go | 2024-06-17 | False | True | Gene Ontology | https://data.bioontology.org/ontologies/GO/sub... | 7fa7ade5e3e26eab3959a7e4bc89ad4f | http://geneontology.org | None | None | 2024-11-21 05:37:17.934010+00:00 | 1 |
91 | 3rm9 | BFXPipeline | all | lamin | 1.0.0 | False | True | Bioinformatics Pipeline | s3://bionty-assets/df_all__lamin__1.0.0__BFXpi... | https://lamin.ai | None | None | 2024-11-21 05:37:17.934371+00:00 | 1 | |
92 | ugaI | Drug | all | dron | 2024-08-05 | False | True | Drug Ontology | https://data.bioontology.org/ontologies/DRON/s... | https://bioportal.bioontology.org/ontologies/DRON | None | None | 2024-11-21 05:37:17.934443+00:00 | 1 | |
96 | 1GbF | bionty.DevelopmentalStage | human | hsapdv | 2024-05-28 | False | True | Human Developmental Stages | https://github.com/obophenotype/developmental-... | https://github.com/obophenotype/developmental-... | None | None | 2024-11-21 05:37:17.934735+00:00 | 1 | |
98 | 10va | bionty.DevelopmentalStage | mouse | mmusdv | 2024-05-28 | False | True | Mouse Developmental Stages | https://github.com/obophenotype/developmental-... | https://github.com/obophenotype/developmental-... | None | None | 2024-11-21 05:37:17.934879+00:00 | 1 | |
100 | MJRq | bionty.Ethnicity | human | hancestro | 3.0 | False | True | Human Ancestry Ontology | https://github.com/EBISPOT/hancestro/raw/3.0/h... | 76dd9efda9c2abd4bc32fc57c0b755dd | https://github.com/EBISPOT/hancestro | None | None | 2024-11-21 05:37:17.935028+00:00 | 1 |
101 | 5JnV | BioSample | all | ncbi | 2023-09 | False | True | NCBI BioSample attributes | s3://bionty-assets/df_all__ncbi__2023-09__BioS... | 918db9bd1734b97c596c67d9654a4126 | https://www.ncbi.nlm.nih.gov/biosample/docs/at... | None | None | 2024-11-21 05:37:17.935101+00:00 | 1 |
Each record is linked to a versioned public source (if it was created from public):
hepatocyte = bt.CellType.get(name="hepatocyte")
hepatocyte.source
Show code cell output
Source(uid='1Lhf', entity='bionty.CellType', organism='all', name='cl', version='2024-05-15', in_db=True, currently_used=True, description='Cell Ontology', url='http://purl.obolibrary.org/obo/cl/releases/2024-05-15/cl.owl', md5='8a8638a9e79567935793e5007704c650', source_website='https://obophenotype.github.io/cell-ontology', created_by_id=1, created_at=2024-11-21 05:37:17 UTC)
Create records from specific source¶
By default, new records are imported or created from the "currently_used"
public sources which are configured during the instance initialization, e.g.:
bt.Source.filter(entity="bionty.Phenotype", currently_used=True).df()
Show code cell output
uid | entity | organism | name | version | in_db | currently_used | description | url | md5 | source_website | dataframe_artifact_id | run_id | created_at | created_by_id | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | |||||||||||||||
72 | 48fB | bionty.Phenotype | human | hp | 2024-04-26 | False | True | Human Phenotype Ontology | https://github.com/obophenotype/human-phenotyp... | e0f2e534eb2ad44a4d45573ef27b508f | https://hpo.jax.org | None | None | 2024-11-21 05:37:17.932989+00:00 | 1 |
77 | 4t7Q | bionty.Phenotype | mammalian | mp | 2024-06-18 | False | True | Mammalian Phenotype Ontology | https://github.com/mgijax/mammalian-phenotype-... | 795d8378fe48ec13b41d01a86dd1c86c | https://github.com/mgijax/mammalian-phenotype-... | None | None | 2024-11-21 05:37:17.933356+00:00 | 1 |
80 | sqPX | bionty.Phenotype | zebrafish | zp | 2024-04-18 | False | True | Zebrafish Phenotype Ontology | https://github.com/obophenotype/zebrafish-phen... | 2231ebaa95becf8ff34a33c95a8d4350 | https://github.com/obophenotype/zebrafish-phen... | None | None | 2024-11-21 05:37:17.933574+00:00 | 1 |
84 | 6S4q | bionty.Phenotype | all | pato | 2024-03-28 | False | True | Phenotype And Trait Ontology | http://purl.obolibrary.org/obo/pato/releases/2... | 6b1eaacd3d453b34375ce2e31c16328a | https://github.com/pato-ontology/pato | None | None | 2024-11-21 05:37:17.933865+00:00 | 1 |
Sometimes, the default source doesn’t contain the ontology term you are looking for.
You can then specify to create a record from a non-default source. For instance, instead of using untyped labels for iris organisms as Tutorial: Features & labels, we can use the ncbitaxon
ontology:
source = bt.Source.get(entity="bionty.Organism", name="ncbitaxon")
source
Source(uid='4tsk', entity='bionty.Organism', organism='all', name='ncbitaxon', version='2023-06-20', in_db=False, currently_used=True, description='NCBItaxon Ontology', url='s3://bionty-assets/df_all__ncbitaxon__2023-06-20__Organism.parquet', md5='00d97ba65627f1cd65636d2df22ea76c', source_website='https://github.com/obophenotype/ncbitaxon', created_by_id=1, created_at=2024-11-21 05:37:17 UTC)
# validate against the NCBI Taxonomy
bt.Organism.validate(["iris setosa", "iris versicolor", "iris virginica"], source=source)
Show code cell output
! Your Organism registry is empty, consider populating it first!
→ use `.import_source()` to import records from a source, e.g. a public ontology
array([False, False, False])
records = bt.Organism.from_values(
["iris setosa", "iris versicolor", "iris virginica"], source=source
)
# since we didn't seed the Organism registry with the NCBITaxon public ontology
# we need to save the records to the database
ln.save(records)
# now we can query a iris organism and view its parents and children
iris = bt.Organism.get(name="iris")
iris.view_parents(with_children=True)
Show code cell output
Show code cell content
# clean up test instance
!lamin delete --force test-registries
• deleting instance testuser1/test-registries