Transfer data¶
This guide shows how to transfer data from a source database instance into the current default database instance.
# !pip install 'lamindb[jupyter,aws,bionty]'
!lamin init --storage ./test-transfer --schema bionty
Show code cell output
! using anonymous user (to identify, call: lamin login)
→ connected lamindb: anonymous/test-transfer
import lamindb as ln
ln.track("ITeOtm7bhtdq0000")
Show code cell output
→ connected lamindb: anonymous/test-transfer
→ created Transform('ITeOtm7b'), started new Run('Ep80H0bD') at 2024-12-20 15:03:40 UTC
→ notebook imports: lamindb==0.77.3
Query all artifacts in the laminlabs/lamindata
instance and filter them to their latest versions.
# query all latest artifact versions
artifacts = ln.Artifact.using("laminlabs/lamindata").filter(is_latest=True)
# convert the QuerySet to a DataFrame and show the latest 5 versions
artifacts.df().head()
Show code cell output
! source schema has additional modules: {'ourprojects', 'wetlab'}
consider mounting these schema modules to transfer all metadata
uid | key | description | suffix | type | size | hash | n_objects | n_observations | _hash_type | _accessor | visibility | _key_is_virtual | storage_id | transform_id | version | is_latest | run_id | created_at | created_by_id | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | ||||||||||||||||||||
607 | sRapK07mMtToihzFeTaf | None | View Papalexi21 in Vitessce | .vitessce.json | None | 1527 | jfAtjNNzdvetUaEo5zhf0Q | NaN | NaN | md5 | None | 1 | True | 2 | 79.0 | None | True | 141.0 | 2024-04-30 12:51:16.348884+00:00 | 2 |
726 | HXJ4DDAw8012jVKwoxgd | None | View Kuppe2022 in Vitessce | .vitessce.json | None | 5258 | JsVK8X8EGRsyTEMnD3Z-6g | NaN | NaN | md5 | None | 1 | True | 2 | 79.0 | None | True | 198.0 | 2024-06-26 10:35:31.697669+00:00 | 2 |
895 | nbX7Pk0SAPHNlsQD0000 | devdata/params_2024-09-30_11-44-22.json | None | .json | None | 38084 | s6viX7LZ6KsjWcXigAn0eg | NaN | NaN | md5 | None | 1 | True | 2 | NaN | None | True | NaN | 2024-10-02 15:25:49.609268+00:00 | 9 |
815 | XmeH4JgiJFha7Nl90000 | schmidt22_perturbseq/schmidt22_perturbseq.h5ad | schmidt22 perturbseq counts | .h5ad | None | 20659936 | MwfMo7FUjrdk5mzTHx9RMw | NaN | NaN | md5-n | AnnData | 1 | False | 2 | 220.0 | None | True | 377.0 | 2024-06-18 09:26:45.885472+00:00 | 2 |
1010 | dP0F1fEQWtorhDaI0000 | example_datasets/small_dataset2.h5ad | None | .h5ad | dataset | 21224 | 7ok_2cIe73owydEGaj7m0A | NaN | 3.0 | md5 | AnnData | 1 | True | 2 | 195.0 | None | True | 347.0 | 2024-11-25 14:59:38.945319+00:00 | 9 |
You can now further subset or search the QuerySet
. Here we query by whether the description contains “tabula sapiens”.
artifact = artifacts.filter(description__contains="Tabula Sapiens").first()
artifact.describe()
Show code cell output
! source schema has additional modules: {'ourprojects', 'wetlab'}
consider mounting these schema modules to transfer all metadata
Artifact .h5ad ├── General │ ├── .uid = 'dPraor9rU1EofcFb6Wph' │ ├── .key = tabula_sapiens_lung.h5ad │ ├── .size = 3899435772 │ ├── .hash = '8mB1KK2wd51F6HQdvqipcQ' │ ├── .path = s3://lamindata/tabula_sapiens_lung.h5ad │ ├── .created_by = Koncopd (Sergei Rybakov) │ ├── .created_at = 2023-07-14 19:00:30 │ └── .transform = 'Ingest Tabula Sapiens Lung' └── Labels └── .tissues bionty.Tissue lung .cell_types bionty.CellType myofibroblast cell, B cell, capillary ae… .experimental_factors bionty.ExperimentalFactor anoxya, stroke .ulabels ULabel TSP1, TSP2, TSP14
By saving the artifact record that’s currently attached to the source database instance, you transfer it to the default database instance.
artifact.save()
Show code cell output
→ mapped records: Tissue(uid='7Tt4iEKc'), CellType(uid='5tiBvp96'), CellType(uid='7Crr32HI'), CellType(uid='6dzoXJ3Y'), CellType(uid='01NqvhnI'), CellType(uid='5NceZTYm'), CellType(uid='4PSMdO3I'), CellType(uid='3JO0EdVd'), CellType(uid='6rfrjhvo'), CellType(uid='37mWPv6o'), CellType(uid='5Z76sCep'), CellType(uid='2OWUH6Z1'), CellType(uid='5TU8SFt5'), CellType(uid='ryEtgi1y'), CellType(uid='1lMgAPE8'), CellType(uid='7m6Ruz32'), CellType(uid='42qbvc90'), CellType(uid='puGNwNrs'), CellType(uid='1T8bGe2I'), CellType(uid='6IC9NGJE'), CellType(uid='6ujMwy7s'), CellType(uid='3eecYgWR'), CellType(uid='zQ4dyjEs'), CellType(uid='7mNqzyFE'), CellType(uid='5A9EFjNB'), CellType(uid='3lsrLTv6'), CellType(uid='1HYtHpIc'), CellType(uid='6UmKFrzn'), CellType(uid='7eZArDpo'), CellType(uid='2KCFdGIk'), CellType(uid='1V5wVqK5'), CellType(uid='5i19XYug'), CellType(uid='2nPA0h4F'), CellType(uid='5Xi2OLvZ'), CellType(uid='3kaL3W1c'), ExperimentalFactor(uid='5YDCOg0V'), ExperimentalFactor(uid='7R1OhRJ7')
→ transferred records: Artifact(uid='dPraor9rU1EofcFb6Wph'), Storage(uid='D9BilDV2'), CellType(uid='4mZaXZQg'), CellType(uid='5rVn0X39'), CellType(uid='EWy46Sey'), CellType(uid='4yqLzwwm'), ULabel(uid='vfLXaHgD'), ULabel(uid='gk6w8qC5'), ULabel(uid='tZCTk48f')
Artifact(uid='dPraor9rU1EofcFb6Wph', is_latest=True, key='tabula_sapiens_lung.h5ad', description='Part of Tabula Sapiens, a benchmark, first-draft human cell atlas.', suffix='.h5ad', size=3899435772, hash='8mB1KK2wd51F6HQdvqipcQ', _hash_type='sha1-fl', visibility=1, _key_is_virtual=False, storage_id=2, transform_id=2, run_id=2, created_by_id=1, created_at=2024-12-20 15:03:46 UTC)
How do I know if a record is saved in the default database instance or not?
Every record has an attribute ._state.db
which can take the following values:
None
: the record has not yet been saved to any database"default"
: the record is saved on the default database instance"account/name"
: the record is save on a non-default database instance referenced byaccount/name
(e.g.,laminlabs/lamindata
)
The artifact record and all other feature & label records have been transferred to the current database.
artifact.describe()
Show code cell output
Artifact .h5ad ├── General │ ├── .uid = 'dPraor9rU1EofcFb6Wph' │ ├── .key = tabula_sapiens_lung.h5ad │ ├── .size = 3899435772 │ ├── .hash = '8mB1KK2wd51F6HQdvqipcQ' │ ├── .path = s3://lamindata/tabula_sapiens_lung.h5ad │ ├── .created_by = anonymous │ ├── .created_at = 2024-12-20 15:03:46 │ └── .transform = 'Transfer from `laminlabs/lamindata`' └── Labels └── .tissues bionty.Tissue lung .cell_types bionty.CellType type I pneumocyte, adventitial cell, bas… .experimental_factors bionty.ExperimentalFactor anoxya, stroke .ulabels ULabel TSP1, TSP2, TSP14
You see that the data itself remained in the original storage location, which has been added to the current instance’s storage location as a read-only location.
ln.Storage.df()
Show code cell output
uid | root | description | type | region | instance_uid | run_id | created_at | created_by_id | |
---|---|---|---|---|---|---|---|---|---|
id | |||||||||
2 | D9BilDV2 | s3://lamindata | None | s3 | us-east-1 | 4XIuR0tvaiXM | 2.0 | 2024-12-20 15:03:46.955319+00:00 | 1 |
1 | F4i8aJdwwXJI | /home/runner/work/lamindb/lamindb/docs/test-tr... | None | local | None | 1FHu5eE0uxm4 | NaN | 2024-12-20 15:03:32.654838+00:00 | 1 |
See the state of the database.
ln.view()
Show code cell output
****************
* module: core *
****************
Artifact
uid | key | description | suffix | type | size | hash | n_objects | n_observations | _hash_type | _accessor | visibility | _key_is_virtual | storage_id | transform_id | version | is_latest | run_id | created_at | created_by_id | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | ||||||||||||||||||||
1 | dPraor9rU1EofcFb6Wph | tabula_sapiens_lung.h5ad | Part of Tabula Sapiens, a benchmark, first-dra... | .h5ad | None | 3899435772 | 8mB1KK2wd51F6HQdvqipcQ | None | None | sha1-fl | None | 1 | False | 2 | 2 | None | True | 2 | 2024-12-20 15:03:46.960171+00:00 | 1 |
Run
uid | started_at | finished_at | is_consecutive | reference | reference_type | transform_id | report_id | environment_id | parent_id | created_at | created_by_id | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
id | ||||||||||||
1 | Ep80H0bDL4MbOeO5d6eq | 2024-12-20 15:03:40.523432+00:00 | None | True | None | None | 1 | None | None | NaN | 2024-12-20 15:03:40.523493+00:00 | 1 |
2 | JRjJxGMGDEvMRmm2YXOj | 2024-12-20 15:03:46.951056+00:00 | None | None | None | None | 2 | None | None | 1.0 | 2024-12-20 15:03:46.951109+00:00 | 1 |
Storage
uid | root | description | type | region | instance_uid | run_id | created_at | created_by_id | |
---|---|---|---|---|---|---|---|---|---|
id | |||||||||
2 | D9BilDV2 | s3://lamindata | None | s3 | us-east-1 | 4XIuR0tvaiXM | 2.0 | 2024-12-20 15:03:46.955319+00:00 | 1 |
1 | F4i8aJdwwXJI | /home/runner/work/lamindb/lamindb/docs/test-tr... | None | local | None | 1FHu5eE0uxm4 | NaN | 2024-12-20 15:03:32.654838+00:00 | 1 |
Transform
uid | name | key | description | type | source_code | hash | reference | reference_type | _source_code_artifact_id | version | is_latest | created_at | created_by_id | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | ||||||||||||||
2 | 4XIuR0tvaiXM0000 | Transfer from `laminlabs/lamindata` | transfers/4XIuR0tvaiXM | None | function | None | None | None | None | None | None | True | 2024-12-20 15:03:46.944250+00:00 | 1 |
1 | ITeOtm7bhtdq0000 | Transfer data | transfer.ipynb | None | notebook | None | None | None | None | None | None | True | 2024-12-20 15:03:40.512792+00:00 | 1 |
ULabel
uid | name | description | reference | reference_type | run_id | created_at | created_by_id | |
---|---|---|---|---|---|---|---|---|
id | ||||||||
3 | tZCTk48f | TSP14 | None | None | None | 2 | 2024-12-20 15:03:55.621382+00:00 | 1 |
2 | gk6w8qC5 | TSP2 | None | None | None | 2 | 2024-12-20 15:03:55.550915+00:00 | 1 |
1 | vfLXaHgD | TSP1 | None | None | None | 2 | 2024-12-20 15:03:55.479369+00:00 | 1 |
User
uid | handle | name | created_at | |
---|---|---|---|---|
id | ||||
1 | 00000000 | anonymous | None | 2024-12-20 15:03:32.614861+00:00 |
******************
* module: bionty *
******************
CellType
uid | name | ontology_id | abbr | synonyms | description | source_id | run_id | created_at | created_by_id | |
---|---|---|---|---|---|---|---|---|---|---|
id | ||||||||||
112 | 4yqLzwwm | bronchial vessel endothelial cell | None | None | None | None | NaN | 2 | 2024-12-20 15:03:53.130808+00:00 | 1 |
111 | EWy46Sey | respiratory mucous cell | None | None | None | None | NaN | 2 | 2024-12-20 15:03:52.783938+00:00 | 1 |
110 | 5rVn0X39 | capillary aerocyte | None | None | None | None | NaN | 2 | 2024-12-20 15:03:52.710645+00:00 | 1 |
109 | 4mZaXZQg | alveolar fibroblast | None | None | None | None | NaN | 2 | 2024-12-20 15:03:51.032870+00:00 | 1 |
108 | 3hXuCKYH | perivascular cell | CL:4033054 | None | None | A Cell That Is Adjacent To A Vessel. A Perivas... | 32.0 | 1 | 2024-12-20 15:03:50.434235+00:00 | 1 |
107 | 4qrbhCCl | respiratory ciliated cell | CL:4030034 | None | ciliated cell of the respiratory tract | A Ciliated Cell Of The Respiratory System. Cil... | 32.0 | 1 | 2024-12-20 15:03:50.434161+00:00 | 1 |
106 | 2aMXs0ko | microvascular endothelial cell | CL:2000008 | None | None | Any Blood Vessel Endothelial Cell That Is Part... | 32.0 | 1 | 2024-12-20 15:03:50.434098+00:00 | 1 |
ExperimentalFactor
uid | name | ontology_id | abbr | synonyms | description | molecule | instrument | measurement | source_id | run_id | created_at | created_by_id | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | |||||||||||||
8 | 1was9kRO | hypoxia | EFO:0009444 | None | None | A Decrease In The Amount Of Oxygen In The Body... | None | None | None | 65 | 1 | 2024-12-20 15:03:54.983397+00:00 | 1 |
7 | 2lctIHmn | central nervous system disease | EFO:0009386 | None | central nervous system disorder|central nervou... | A Disease Involving The Central Nervous System. | None | None | None | 65 | 1 | 2024-12-20 15:03:54.983315+00:00 | 1 |
6 | 68LLeA7O | brain disease | EFO:0005774 | None | disorder of brain|disease or disorder of brain... | A Disease Affecting The Brain Or Part Of The B... | None | None | None | 65 | 1 | 2024-12-20 15:03:54.983221+00:00 | 1 |
5 | 2xDSpjH7 | cerebrovascular disorder | EFO:0003763 | None | Vascular Disorder, Intracranial|Cerebrovascula... | A Disorder Resulting From Inadequate Blood Flo... | None | None | None | 65 | 1 | 2024-12-20 15:03:54.983147+00:00 | 1 |
4 | 6ISbvepx | nervous system disease | EFO:0000618 | None | nervous system disorder|neurologic disease|neu... | A Non-Neoplastic Or Neoplastic Disorder That A... | None | None | None | 65 | 1 | 2024-12-20 15:03:54.983068+00:00 | 1 |
3 | 20Nq3k7b | disease | EFO:0000408 | None | disease or disorder|diseases|medical condition... | A Disease Is A Disposition To Undergo Patholog... | None | None | None | 65 | 1 | 2024-12-20 15:03:54.982964+00:00 | 1 |
2 | 7R1OhRJ7 | stroke | EFO:0000712 | None | Cerebral Strokes|Acute Stroke|CVA (Cerebrovasc... | A Sudden Loss Of Neurological Function Seconda... | None | None | None | 65 | 1 | 2024-12-20 15:03:54.476852+00:00 | 1 |
Source
uid | entity | organism | name | in_db | currently_used | description | url | md5 | source_website | dataframe_artifact_id | version | run_id | created_at | created_by_id | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | |||||||||||||||
65 | 2a1H | bionty.ExperimentalFactor | all | efo | False | True | The Experimental Factor Ontology | http://www.ebi.ac.uk/efo/releases/v3.70.0/efo.owl | https://bioportal.bioontology.org/ontologies/EFO | None | 3.70.0 | None | 2024-12-20 15:03:32.840971+00:00 | 1 | |
32 | 1Lhf | bionty.CellType | all | cl | False | True | Cell Ontology | http://purl.obolibrary.org/obo/cl/releases/202... | 8a8638a9e79567935793e5007704c650 | https://obophenotype.github.io/cell-ontology | None | 2024-05-15 | None | 2024-12-20 15:03:32.834847+00:00 | 1 |
40 | MUtA | bionty.Tissue | all | uberon | False | True | Uberon multi-species anatomy ontology | http://purl.obolibrary.org/obo/uberon/releases... | http://obophenotype.github.io/uberon | None | 2024-08-07 | None | 2024-12-20 15:03:32.835467+00:00 | 1 | |
101 | 5JnV | BioSample | all | ncbi | False | True | NCBI BioSample attributes | s3://bionty-assets/df_all__ncbi__2023-09__BioS... | 918db9bd1734b97c596c67d9654a4126 | https://www.ncbi.nlm.nih.gov/biosample/docs/at... | None | 2023-09 | None | 2024-12-20 15:03:32.844842+00:00 | 1 |
100 | MJRq | bionty.Ethnicity | human | hancestro | False | True | Human Ancestry Ontology | https://github.com/EBISPOT/hancestro/raw/3.0/h... | 76dd9efda9c2abd4bc32fc57c0b755dd | https://github.com/EBISPOT/hancestro | None | 3.0 | None | 2024-12-20 15:03:32.844715+00:00 | 1 |
99 | 6vJm | bionty.DevelopmentalStage | mouse | mmusdv | False | False | Mouse Developmental Stages | http://aber-owl.net/media/ontologies/MMUSDV/9/... | 5bef72395d853c7f65450e6c2a1fc653 | https://github.com/obophenotype/developmental-... | None | 2020-03-10 | None | 2024-12-20 15:03:32.844588+00:00 | 1 |
98 | 10va | bionty.DevelopmentalStage | mouse | mmusdv | False | True | Mouse Developmental Stages | https://github.com/obophenotype/developmental-... | https://github.com/obophenotype/developmental-... | None | 2024-05-28 | None | 2024-12-20 15:03:32.844460+00:00 | 1 |
Tissue
uid | name | ontology_id | abbr | synonyms | description | source_id | run_id | created_at | created_by_id | |
---|---|---|---|---|---|---|---|---|---|---|
id | ||||||||||
23 | kkib4Wcs | lateral structure | UBERON:0015212 | None | None | Any Structure That Is Placed On One Side Of Th... | 40 | 1 | 2024-12-20 15:03:48.931974+00:00 | 1 |
22 | 4QeoxdKp | body proper | UBERON:0013702 | None | None | The Region Of The Organism Associated With The... | 40 | 1 | 2024-12-20 15:03:48.931909+00:00 | 1 |
21 | 3XuRxEhw | main body axis | UBERON:0013701 | None | None | A Principle Subdivision Of An Organism That In... | 40 | 1 | 2024-12-20 15:03:48.931845+00:00 | 1 |
20 | 7ZCdHnvN | subdivision of organism along main body axis | UBERON:0011676 | None | axial subdivision of organism | A Major Subdivision Of An Organism That Divide... | 40 | 1 | 2024-12-20 15:03:48.931782+00:00 | 1 |
19 | 4o2HviGe | multicellular anatomical structure | UBERON:0010000 | None | multicellular structure | An Anatomical Structure That Has More Than One... | 40 | 1 | 2024-12-20 15:03:48.931718+00:00 | 1 |
18 | 31GPuSXP | subdivision of trunk | UBERON:0009569 | None | trunk subdivision|region of trunk | None | 40 | 1 | 2024-12-20 15:03:48.931655+00:00 | 1 |
17 | 4IV77xkH | thoracic segment organ | UBERON:0005181 | None | None | An Organ That Part Of The Thoracic Segment Reg... | 40 | 1 | 2024-12-20 15:03:48.931588+00:00 | 1 |
View lineage:
artifact.view_lineage()
The transferred dataset is linked to a special type of transform that stores the slug and uid of the source instance:
artifact.transform.name
'Transfer from `laminlabs/lamindata`'
The transform key has shape f"transfers/{source_instance.uid}"
:
artifact.transform.key
'transfers/4XIuR0tvaiXM'
The current notebook run is linked as the parent of the “transfer run”:
artifact.run.parent.transform
Transform(uid='ITeOtm7bhtdq0000', is_latest=True, name='Transfer data', key='transfer.ipynb', type='notebook', created_by_id=1, created_at=2024-12-20 15:03:40 UTC)
Show code cell content
# test the last 3 cells here
assert artifact.transform.name == "Transfer from `laminlabs/lamindata`"
assert artifact.transform.key == "transfers/4XIuR0tvaiXM"
assert artifact.transform.uid == "4XIuR0tvaiXM0000"
assert artifact.run.parent.transform.name == "Transfer data"
# clean up test instance
!lamin delete --force test-transfer
! calling anonymously, will miss private instances
• deleting instance anonymous/test-transfer