Transfer data

This guide shows how to transfer data from a source database instance into the current default database instance.

# !pip install 'lamindb[jupyter,aws,bionty]'
!lamin init --storage ./test-transfer --schema bionty
Hide code cell output
! using anonymous user (to identify, call: lamin login)
 connected lamindb: anonymous/test-transfer
import lamindb as ln

ln.track("ITeOtm7bhtdq0000")
Hide code cell output
 connected lamindb: anonymous/test-transfer
 created Transform('ITeOtm7b'), started new Run('Ep80H0bD') at 2024-12-20 15:03:40 UTC
 notebook imports: lamindb==0.77.3

Query all artifacts in the laminlabs/lamindata instance and filter them to their latest versions.

# query all latest artifact versions 
artifacts = ln.Artifact.using("laminlabs/lamindata").filter(is_latest=True)

# convert the QuerySet to a DataFrame and show the latest 5 versions
artifacts.df().head()
Hide code cell output
! source schema has additional modules: {'ourprojects', 'wetlab'}
consider mounting these schema modules to transfer all metadata
uid key description suffix type size hash n_objects n_observations _hash_type _accessor visibility _key_is_virtual storage_id transform_id version is_latest run_id created_at created_by_id
id
607 sRapK07mMtToihzFeTaf None View Papalexi21 in Vitessce .vitessce.json None 1527 jfAtjNNzdvetUaEo5zhf0Q NaN NaN md5 None 1 True 2 79.0 None True 141.0 2024-04-30 12:51:16.348884+00:00 2
726 HXJ4DDAw8012jVKwoxgd None View Kuppe2022 in Vitessce .vitessce.json None 5258 JsVK8X8EGRsyTEMnD3Z-6g NaN NaN md5 None 1 True 2 79.0 None True 198.0 2024-06-26 10:35:31.697669+00:00 2
895 nbX7Pk0SAPHNlsQD0000 devdata/params_2024-09-30_11-44-22.json None .json None 38084 s6viX7LZ6KsjWcXigAn0eg NaN NaN md5 None 1 True 2 NaN None True NaN 2024-10-02 15:25:49.609268+00:00 9
815 XmeH4JgiJFha7Nl90000 schmidt22_perturbseq/schmidt22_perturbseq.h5ad schmidt22 perturbseq counts .h5ad None 20659936 MwfMo7FUjrdk5mzTHx9RMw NaN NaN md5-n AnnData 1 False 2 220.0 None True 377.0 2024-06-18 09:26:45.885472+00:00 2
1010 dP0F1fEQWtorhDaI0000 example_datasets/small_dataset2.h5ad None .h5ad dataset 21224 7ok_2cIe73owydEGaj7m0A NaN 3.0 md5 AnnData 1 True 2 195.0 None True 347.0 2024-11-25 14:59:38.945319+00:00 9

You can now further subset or search the QuerySet. Here we query by whether the description contains “tabula sapiens”.

artifact = artifacts.filter(description__contains="Tabula Sapiens").first()
artifact.describe()
Hide code cell output
! source schema has additional modules: {'ourprojects', 'wetlab'}
consider mounting these schema modules to transfer all metadata
Artifact .h5ad
├── General
│   ├── .uid = 'dPraor9rU1EofcFb6Wph'
│   ├── .key = tabula_sapiens_lung.h5ad
│   ├── .size = 3899435772
│   ├── .hash = '8mB1KK2wd51F6HQdvqipcQ'
│   ├── .path = s3://lamindata/tabula_sapiens_lung.h5ad
│   ├── .created_by = Koncopd (Sergei Rybakov)
│   ├── .created_at = 2023-07-14 19:00:30
│   └── .transform = 'Ingest Tabula Sapiens Lung'
└── Labels
    └── .tissues                    bionty.Tissue              lung                                     
        .cell_types                 bionty.CellType            myofibroblast cell, B cell, capillary ae…
        .experimental_factors       bionty.ExperimentalFactor  anoxya, stroke                           
        .ulabels                    ULabel                     TSP1, TSP2, TSP14                        

By saving the artifact record that’s currently attached to the source database instance, you transfer it to the default database instance.

artifact.save()
Hide code cell output
 mapped records: Tissue(uid='7Tt4iEKc'), CellType(uid='5tiBvp96'), CellType(uid='7Crr32HI'), CellType(uid='6dzoXJ3Y'), CellType(uid='01NqvhnI'), CellType(uid='5NceZTYm'), CellType(uid='4PSMdO3I'), CellType(uid='3JO0EdVd'), CellType(uid='6rfrjhvo'), CellType(uid='37mWPv6o'), CellType(uid='5Z76sCep'), CellType(uid='2OWUH6Z1'), CellType(uid='5TU8SFt5'), CellType(uid='ryEtgi1y'), CellType(uid='1lMgAPE8'), CellType(uid='7m6Ruz32'), CellType(uid='42qbvc90'), CellType(uid='puGNwNrs'), CellType(uid='1T8bGe2I'), CellType(uid='6IC9NGJE'), CellType(uid='6ujMwy7s'), CellType(uid='3eecYgWR'), CellType(uid='zQ4dyjEs'), CellType(uid='7mNqzyFE'), CellType(uid='5A9EFjNB'), CellType(uid='3lsrLTv6'), CellType(uid='1HYtHpIc'), CellType(uid='6UmKFrzn'), CellType(uid='7eZArDpo'), CellType(uid='2KCFdGIk'), CellType(uid='1V5wVqK5'), CellType(uid='5i19XYug'), CellType(uid='2nPA0h4F'), CellType(uid='5Xi2OLvZ'), CellType(uid='3kaL3W1c'), ExperimentalFactor(uid='5YDCOg0V'), ExperimentalFactor(uid='7R1OhRJ7')
 transferred records: Artifact(uid='dPraor9rU1EofcFb6Wph'), Storage(uid='D9BilDV2'), CellType(uid='4mZaXZQg'), CellType(uid='5rVn0X39'), CellType(uid='EWy46Sey'), CellType(uid='4yqLzwwm'), ULabel(uid='vfLXaHgD'), ULabel(uid='gk6w8qC5'), ULabel(uid='tZCTk48f')
Artifact(uid='dPraor9rU1EofcFb6Wph', is_latest=True, key='tabula_sapiens_lung.h5ad', description='Part of Tabula Sapiens, a benchmark, first-draft human cell atlas.', suffix='.h5ad', size=3899435772, hash='8mB1KK2wd51F6HQdvqipcQ', _hash_type='sha1-fl', visibility=1, _key_is_virtual=False, storage_id=2, transform_id=2, run_id=2, created_by_id=1, created_at=2024-12-20 15:03:46 UTC)
How do I know if a record is saved in the default database instance or not?

Every record has an attribute ._state.db which can take the following values:

  • None: the record has not yet been saved to any database

  • "default": the record is saved on the default database instance

  • "account/name": the record is save on a non-default database instance referenced by account/name (e.g., laminlabs/lamindata)

The artifact record and all other feature & label records have been transferred to the current database.

artifact.describe()
Hide code cell output
Artifact .h5ad
├── General
│   ├── .uid = 'dPraor9rU1EofcFb6Wph'
│   ├── .key = tabula_sapiens_lung.h5ad
│   ├── .size = 3899435772
│   ├── .hash = '8mB1KK2wd51F6HQdvqipcQ'
│   ├── .path = s3://lamindata/tabula_sapiens_lung.h5ad
│   ├── .created_by = anonymous
│   ├── .created_at = 2024-12-20 15:03:46
│   └── .transform = 'Transfer from `laminlabs/lamindata`'
└── Labels
    └── .tissues                    bionty.Tissue              lung                                     
        .cell_types                 bionty.CellType            type I pneumocyte, adventitial cell, bas…
        .experimental_factors       bionty.ExperimentalFactor  anoxya, stroke                           
        .ulabels                    ULabel                     TSP1, TSP2, TSP14                        

You see that the data itself remained in the original storage location, which has been added to the current instance’s storage location as a read-only location.

ln.Storage.df()
Hide code cell output
uid root description type region instance_uid run_id created_at created_by_id
id
2 D9BilDV2 s3://lamindata None s3 us-east-1 4XIuR0tvaiXM 2.0 2024-12-20 15:03:46.955319+00:00 1
1 F4i8aJdwwXJI /home/runner/work/lamindb/lamindb/docs/test-tr... None local None 1FHu5eE0uxm4 NaN 2024-12-20 15:03:32.654838+00:00 1

See the state of the database.

ln.view()
Hide code cell output
****************
* module: core *
****************
Artifact
uid key description suffix type size hash n_objects n_observations _hash_type _accessor visibility _key_is_virtual storage_id transform_id version is_latest run_id created_at created_by_id
id
1 dPraor9rU1EofcFb6Wph tabula_sapiens_lung.h5ad Part of Tabula Sapiens, a benchmark, first-dra... .h5ad None 3899435772 8mB1KK2wd51F6HQdvqipcQ None None sha1-fl None 1 False 2 2 None True 2 2024-12-20 15:03:46.960171+00:00 1
Run
uid started_at finished_at is_consecutive reference reference_type transform_id report_id environment_id parent_id created_at created_by_id
id
1 Ep80H0bDL4MbOeO5d6eq 2024-12-20 15:03:40.523432+00:00 None True None None 1 None None NaN 2024-12-20 15:03:40.523493+00:00 1
2 JRjJxGMGDEvMRmm2YXOj 2024-12-20 15:03:46.951056+00:00 None None None None 2 None None 1.0 2024-12-20 15:03:46.951109+00:00 1
Storage
uid root description type region instance_uid run_id created_at created_by_id
id
2 D9BilDV2 s3://lamindata None s3 us-east-1 4XIuR0tvaiXM 2.0 2024-12-20 15:03:46.955319+00:00 1
1 F4i8aJdwwXJI /home/runner/work/lamindb/lamindb/docs/test-tr... None local None 1FHu5eE0uxm4 NaN 2024-12-20 15:03:32.654838+00:00 1
Transform
uid name key description type source_code hash reference reference_type _source_code_artifact_id version is_latest created_at created_by_id
id
2 4XIuR0tvaiXM0000 Transfer from `laminlabs/lamindata` transfers/4XIuR0tvaiXM None function None None None None None None True 2024-12-20 15:03:46.944250+00:00 1
1 ITeOtm7bhtdq0000 Transfer data transfer.ipynb None notebook None None None None None None True 2024-12-20 15:03:40.512792+00:00 1
ULabel
uid name description reference reference_type run_id created_at created_by_id
id
3 tZCTk48f TSP14 None None None 2 2024-12-20 15:03:55.621382+00:00 1
2 gk6w8qC5 TSP2 None None None 2 2024-12-20 15:03:55.550915+00:00 1
1 vfLXaHgD TSP1 None None None 2 2024-12-20 15:03:55.479369+00:00 1
User
uid handle name created_at
id
1 00000000 anonymous None 2024-12-20 15:03:32.614861+00:00
******************
* module: bionty *
******************
CellType
uid name ontology_id abbr synonyms description source_id run_id created_at created_by_id
id
112 4yqLzwwm bronchial vessel endothelial cell None None None None NaN 2 2024-12-20 15:03:53.130808+00:00 1
111 EWy46Sey respiratory mucous cell None None None None NaN 2 2024-12-20 15:03:52.783938+00:00 1
110 5rVn0X39 capillary aerocyte None None None None NaN 2 2024-12-20 15:03:52.710645+00:00 1
109 4mZaXZQg alveolar fibroblast None None None None NaN 2 2024-12-20 15:03:51.032870+00:00 1
108 3hXuCKYH perivascular cell CL:4033054 None None A Cell That Is Adjacent To A Vessel. A Perivas... 32.0 1 2024-12-20 15:03:50.434235+00:00 1
107 4qrbhCCl respiratory ciliated cell CL:4030034 None ciliated cell of the respiratory tract A Ciliated Cell Of The Respiratory System. Cil... 32.0 1 2024-12-20 15:03:50.434161+00:00 1
106 2aMXs0ko microvascular endothelial cell CL:2000008 None None Any Blood Vessel Endothelial Cell That Is Part... 32.0 1 2024-12-20 15:03:50.434098+00:00 1
ExperimentalFactor
uid name ontology_id abbr synonyms description molecule instrument measurement source_id run_id created_at created_by_id
id
8 1was9kRO hypoxia EFO:0009444 None None A Decrease In The Amount Of Oxygen In The Body... None None None 65 1 2024-12-20 15:03:54.983397+00:00 1
7 2lctIHmn central nervous system disease EFO:0009386 None central nervous system disorder|central nervou... A Disease Involving The Central Nervous System. None None None 65 1 2024-12-20 15:03:54.983315+00:00 1
6 68LLeA7O brain disease EFO:0005774 None disorder of brain|disease or disorder of brain... A Disease Affecting The Brain Or Part Of The B... None None None 65 1 2024-12-20 15:03:54.983221+00:00 1
5 2xDSpjH7 cerebrovascular disorder EFO:0003763 None Vascular Disorder, Intracranial|Cerebrovascula... A Disorder Resulting From Inadequate Blood Flo... None None None 65 1 2024-12-20 15:03:54.983147+00:00 1
4 6ISbvepx nervous system disease EFO:0000618 None nervous system disorder|neurologic disease|neu... A Non-Neoplastic Or Neoplastic Disorder That A... None None None 65 1 2024-12-20 15:03:54.983068+00:00 1
3 20Nq3k7b disease EFO:0000408 None disease or disorder|diseases|medical condition... A Disease Is A Disposition To Undergo Patholog... None None None 65 1 2024-12-20 15:03:54.982964+00:00 1
2 7R1OhRJ7 stroke EFO:0000712 None Cerebral Strokes|Acute Stroke|CVA (Cerebrovasc... A Sudden Loss Of Neurological Function Seconda... None None None 65 1 2024-12-20 15:03:54.476852+00:00 1
Source
uid entity organism name in_db currently_used description url md5 source_website dataframe_artifact_id version run_id created_at created_by_id
id
65 2a1H bionty.ExperimentalFactor all efo False True The Experimental Factor Ontology http://www.ebi.ac.uk/efo/releases/v3.70.0/efo.owl https://bioportal.bioontology.org/ontologies/EFO None 3.70.0 None 2024-12-20 15:03:32.840971+00:00 1
32 1Lhf bionty.CellType all cl False True Cell Ontology http://purl.obolibrary.org/obo/cl/releases/202... 8a8638a9e79567935793e5007704c650 https://obophenotype.github.io/cell-ontology None 2024-05-15 None 2024-12-20 15:03:32.834847+00:00 1
40 MUtA bionty.Tissue all uberon False True Uberon multi-species anatomy ontology http://purl.obolibrary.org/obo/uberon/releases... http://obophenotype.github.io/uberon None 2024-08-07 None 2024-12-20 15:03:32.835467+00:00 1
101 5JnV BioSample all ncbi False True NCBI BioSample attributes s3://bionty-assets/df_all__ncbi__2023-09__BioS... 918db9bd1734b97c596c67d9654a4126 https://www.ncbi.nlm.nih.gov/biosample/docs/at... None 2023-09 None 2024-12-20 15:03:32.844842+00:00 1
100 MJRq bionty.Ethnicity human hancestro False True Human Ancestry Ontology https://github.com/EBISPOT/hancestro/raw/3.0/h... 76dd9efda9c2abd4bc32fc57c0b755dd https://github.com/EBISPOT/hancestro None 3.0 None 2024-12-20 15:03:32.844715+00:00 1
99 6vJm bionty.DevelopmentalStage mouse mmusdv False False Mouse Developmental Stages http://aber-owl.net/media/ontologies/MMUSDV/9/... 5bef72395d853c7f65450e6c2a1fc653 https://github.com/obophenotype/developmental-... None 2020-03-10 None 2024-12-20 15:03:32.844588+00:00 1
98 10va bionty.DevelopmentalStage mouse mmusdv False True Mouse Developmental Stages https://github.com/obophenotype/developmental-... https://github.com/obophenotype/developmental-... None 2024-05-28 None 2024-12-20 15:03:32.844460+00:00 1
Tissue
uid name ontology_id abbr synonyms description source_id run_id created_at created_by_id
id
23 kkib4Wcs lateral structure UBERON:0015212 None None Any Structure That Is Placed On One Side Of Th... 40 1 2024-12-20 15:03:48.931974+00:00 1
22 4QeoxdKp body proper UBERON:0013702 None None The Region Of The Organism Associated With The... 40 1 2024-12-20 15:03:48.931909+00:00 1
21 3XuRxEhw main body axis UBERON:0013701 None None A Principle Subdivision Of An Organism That In... 40 1 2024-12-20 15:03:48.931845+00:00 1
20 7ZCdHnvN subdivision of organism along main body axis UBERON:0011676 None axial subdivision of organism A Major Subdivision Of An Organism That Divide... 40 1 2024-12-20 15:03:48.931782+00:00 1
19 4o2HviGe multicellular anatomical structure UBERON:0010000 None multicellular structure An Anatomical Structure That Has More Than One... 40 1 2024-12-20 15:03:48.931718+00:00 1
18 31GPuSXP subdivision of trunk UBERON:0009569 None trunk subdivision|region of trunk None 40 1 2024-12-20 15:03:48.931655+00:00 1
17 4IV77xkH thoracic segment organ UBERON:0005181 None None An Organ That Part Of The Thoracic Segment Reg... 40 1 2024-12-20 15:03:48.931588+00:00 1

View lineage:

artifact.view_lineage()
_images/e456698f9ac08bf193f35c02e0b5d8a7e72e30a786ac3c0eaf3a99dcd111fc12.svg

The transferred dataset is linked to a special type of transform that stores the slug and uid of the source instance:

artifact.transform.name
'Transfer from `laminlabs/lamindata`'

The transform key has shape f"transfers/{source_instance.uid}":

artifact.transform.key
'transfers/4XIuR0tvaiXM'

The current notebook run is linked as the parent of the “transfer run”:

artifact.run.parent.transform
Transform(uid='ITeOtm7bhtdq0000', is_latest=True, name='Transfer data', key='transfer.ipynb', type='notebook', created_by_id=1, created_at=2024-12-20 15:03:40 UTC)
Hide code cell content
# test the last 3 cells here
assert artifact.transform.name == "Transfer from `laminlabs/lamindata`"
assert artifact.transform.key == "transfers/4XIuR0tvaiXM"
assert artifact.transform.uid == "4XIuR0tvaiXM0000"
assert artifact.run.parent.transform.name == "Transfer data"

# clean up test instance
!lamin delete --force test-transfer
! calling anonymously, will miss private instances
 deleting instance anonymous/test-transfer