Transfer data

This guide shows how to transfer data from a source database instance into the current default database instance.

# !pip install 'lamindb[jupyter,aws,bionty]'
!lamin init --storage ./test-transfer --schema bionty
Hide code cell output
→ connected lamindb: anonymous/test-transfer
import lamindb as ln

ln.track("ITeOtm7bhtdq0000")
Hide code cell output
→ connected lamindb: anonymous/test-transfer
→ created Transform('ITeOtm7b'), started new Run('vUaV37sz') at 2024-11-21 05:38:17 UTC
→ notebook imports: lamindb==0.76.16

Query all artifacts in the laminlabs/lamindata instance and filter them to their latest versions.

# query all latest artifact versions 
artifacts = ln.Artifact.using("laminlabs/lamindata").filter(is_latest=True)

# convert the QuerySet to a DataFrame and show the latest 5 versions
artifacts.df().head()
Hide code cell output
! source schema has additional modules: {'wetlab'}
consider mounting these schema modules to transfer all metadata
uid version is_latest description key suffix type size hash n_objects n_observations _hash_type _accessor visibility _key_is_virtual storage_id transform_id run_id created_at created_by_id
id
607 sRapK07mMtToihzFeTaf None True View Papalexi21 in Vitessce None .vitessce.json None 1527 jfAtjNNzdvetUaEo5zhf0Q NaN NaN md5 None 1 True 2 79.0 141.0 2024-04-30 12:51:16.348884+00:00 2
726 HXJ4DDAw8012jVKwoxgd None True View Kuppe2022 in Vitessce None .vitessce.json None 5258 JsVK8X8EGRsyTEMnD3Z-6g NaN NaN md5 None 1 True 2 79.0 198.0 2024-06-26 10:35:31.697669+00:00 2
981 t5N1iDqMn7GmdhCG0000 None True Seurat object for renal cell carcinoma dataset None .rds None 19874309 yJR98pJSOmPUuOYHKZ55hQ NaN NaN md5 None 1 True 2 181.0 329.0 2024-11-20 13:54:26.020077+00:00 28
895 nbX7Pk0SAPHNlsQD0000 None True None devdata/params_2024-09-30_11-44-22.json .json None 38084 s6viX7LZ6KsjWcXigAn0eg NaN NaN md5 None 1 True 2 NaN NaN 2024-10-02 15:25:49.609268+00:00 9
865 xqcaHbOU2jTohiwi0000 None True my RNA-seq None .parquet dataset 4091 PV-3u9wm_rq35o6J8d5A5w NaN NaN md5 DataFrame 1 True 2 128.0 260.0 2024-09-24 13:10:10.580062+00:00 9

You can now further subset or search the QuerySet. Here we query by whether the description contains “tabula sapiens”.

artifact = artifacts.filter(description__contains="Tabula Sapiens").first()
artifact.describe()
Hide code cell output
Artifact(uid='dPraor9rU1EofcFb6Wph', is_latest=True, description='Part of Tabula Sapiens, a benchmark, first-draft human cell atlas.', key='tabula_sapiens_lung.h5ad', suffix='.h5ad', size=3899435772, hash='8mB1KK2wd51F6HQdvqipcQ', _hash_type='sha1-fl', visibility=1, _key_is_virtual=False, created_at=2023-07-14 19:00:30 UTC)
  Database instance
    slug: laminlabs/lamindata
  Provenance
    .storage = 's3://lamindata'
    .transform = 'Ingest Tabula Sapiens Lung'
    .run = '2023-07-14 12:53:17 UTC'
    .created_by = 'Koncopd'
  Usage
    .input_of_runs = 2023-07-15 17:12:16 UTC
  Labels
    .tissues = 'lung'
    .cell_types = 'CD4-positive, alpha-beta T cell', 'CD8-positive, alpha-beta T cell', 'dendritic cell', 'B cell', 'fibroblast', 'non-classical monocyte', 'myofibroblast cell', 'capillary endothelial cell', 'vein endothelial cell', 'endothelial cell of lymphatic vessel', ...
    .experimental_factors = 'anoxya', 'stroke'
    .ulabels = 'TSP1', 'TSP2', 'TSP14'

By saving the artifact record that’s currently attached to the source database instance, you transfer it to the default database instance.

artifact.save()
Hide code cell output
→ mapped records: Tissue(uid='7Tt4iEKc'), CellType(uid='5tiBvp96'), CellType(uid='7Crr32HI'), CellType(uid='6dzoXJ3Y'), CellType(uid='01NqvhnI'), CellType(uid='5NceZTYm'), CellType(uid='4PSMdO3I'), CellType(uid='3JO0EdVd'), CellType(uid='6rfrjhvo'), CellType(uid='37mWPv6o'), CellType(uid='5Z76sCep'), CellType(uid='2OWUH6Z1'), CellType(uid='5TU8SFt5'), CellType(uid='ryEtgi1y'), CellType(uid='1lMgAPE8'), CellType(uid='7m6Ruz32'), CellType(uid='42qbvc90'), CellType(uid='puGNwNrs'), CellType(uid='1T8bGe2I'), CellType(uid='6IC9NGJE'), CellType(uid='6ujMwy7s'), CellType(uid='3eecYgWR'), CellType(uid='zQ4dyjEs'), CellType(uid='7mNqzyFE'), CellType(uid='5A9EFjNB'), CellType(uid='3lsrLTv6'), CellType(uid='1HYtHpIc'), CellType(uid='6UmKFrzn'), CellType(uid='7eZArDpo'), CellType(uid='2KCFdGIk'), CellType(uid='1V5wVqK5'), CellType(uid='5i19XYug'), CellType(uid='2nPA0h4F'), CellType(uid='5Xi2OLvZ'), CellType(uid='3kaL3W1c'), ExperimentalFactor(uid='5YDCOg0V'), ExperimentalFactor(uid='7R1OhRJ7')
→ transferred records: Artifact(uid='dPraor9rU1EofcFb6Wph'), Storage(uid='D9BilDV2'), CellType(uid='4mZaXZQg'), CellType(uid='5rVn0X39'), CellType(uid='EWy46Sey'), CellType(uid='4yqLzwwm'), ULabel(uid='vfLXaHgD'), ULabel(uid='gk6w8qC5'), ULabel(uid='tZCTk48f')
Artifact(uid='dPraor9rU1EofcFb6Wph', is_latest=True, description='Part of Tabula Sapiens, a benchmark, first-draft human cell atlas.', key='tabula_sapiens_lung.h5ad', suffix='.h5ad', size=3899435772, hash='8mB1KK2wd51F6HQdvqipcQ', _hash_type='sha1-fl', visibility=1, _key_is_virtual=False, storage_id=2, transform_id=2, run_id=2, created_by_id=1, created_at=2024-11-21 05:38:19 UTC)
How do I know if a record is saved in the default database instance or not?

Every record has an attribute ._state.db which can take the following values:

  • None: the record has not yet been saved to any database

  • "default": the record is saved on the default database instance

  • "account/name": the record is save on a non-default database instance referenced by account/name (e.g., laminlabs/lamindata)

The artifact record and all other feature & label records have been transferred to the current database.

artifact.describe()
Hide code cell output
Artifact(uid='dPraor9rU1EofcFb6Wph', is_latest=True, description='Part of Tabula Sapiens, a benchmark, first-draft human cell atlas.', key='tabula_sapiens_lung.h5ad', suffix='.h5ad', size=3899435772, hash='8mB1KK2wd51F6HQdvqipcQ', _hash_type='sha1-fl', visibility=1, _key_is_virtual=False, created_at=2024-11-21 05:38:19 UTC)
  Provenance
    .storage = 's3://lamindata'
    .transform = 'Transfer from `laminlabs/lamindata`'
    .run = 2024-11-21 05:38:19 UTC
    .created_by = 'anonymous'
  Labels
    .tissues = 'lung'
    .cell_types = 'type I pneumocyte', 'adventitial cell', 'basal cell', 'non-classical monocyte', 'smooth muscle cell', 'CD4-positive, alpha-beta T cell', 'plasmacytoid dendritic cell', 'neutrophil', 'natural killer cell', 'myofibroblast cell', ...
    .experimental_factors = 'anoxya', 'stroke'
    .ulabels = 'TSP1', 'TSP2', 'TSP14'

You see that the data itself remained in the original storage location, which has been added to the current instance’s storage location as a read-only location.

ln.Storage.df()
Hide code cell output
uid root description type region instance_uid run_id created_at created_by_id
id
2 D9BilDV2 s3://lamindata None s3 us-east-1 4XIuR0tvaiXM 2.0 2024-11-21 05:38:19.783809+00:00 1
1 WnzTpgU34uV8 /home/runner/work/lamindb/lamindb/docs/test-tr... None local None 1FHu5eE0uxm4 NaN 2024-11-21 05:38:09.865442+00:00 1

See the state of the database.

ln.view()
Hide code cell output
****************
* module: core *
****************
Artifact
uid version is_latest description key suffix type size hash n_objects n_observations _hash_type _accessor visibility _key_is_virtual storage_id transform_id run_id created_at created_by_id
id
1 dPraor9rU1EofcFb6Wph None True Part of Tabula Sapiens, a benchmark, first-dra... tabula_sapiens_lung.h5ad .h5ad None 3899435772 8mB1KK2wd51F6HQdvqipcQ None None sha1-fl None 1 False 2 2 2 2024-11-21 05:38:19.786301+00:00 1
! No records found
! No records found
! No records found
Run
uid started_at finished_at is_consecutive reference reference_type transform_id report_id environment_id parent_id created_at created_by_id
id
1 vUaV37szw5JQaKBMhvNp 2024-11-21 05:38:17.489901+00:00 None True None None 1 None None NaN 2024-11-21 05:38:17.489970+00:00 1
2 Sab31ZNsl0Nn3Tp7sHow 2024-11-21 05:38:19.771533+00:00 None None None None 2 None None 1.0 2024-11-21 05:38:19.771594+00:00 1
Storage
uid root description type region instance_uid run_id created_at created_by_id
id
2 D9BilDV2 s3://lamindata None s3 us-east-1 4XIuR0tvaiXM 2.0 2024-11-21 05:38:19.783809+00:00 1
1 WnzTpgU34uV8 /home/runner/work/lamindb/lamindb/docs/test-tr... None local None 1FHu5eE0uxm4 NaN 2024-11-21 05:38:09.865442+00:00 1
Transform
uid version is_latest name key description type source_code hash reference reference_type _source_code_artifact_id created_at created_by_id
id
2 4XIuR0tvaiXM0000 None True Transfer from `laminlabs/lamindata` transfers/4XIuR0tvaiXM None function None None None None None 2024-11-21 05:38:19.767031+00:00 1
1 ITeOtm7bhtdq0000 None True Transfer data transfer.ipynb None notebook None None None None None 2024-11-21 05:38:17.483847+00:00 1
ULabel
uid name description reference reference_type run_id created_at created_by_id
id
3 tZCTk48f TSP14 None None None 2 2024-11-21 05:38:23.180128+00:00 1
2 gk6w8qC5 TSP2 None None None 2 2024-11-21 05:38:23.170044+00:00 1
1 vfLXaHgD TSP1 None None None 2 2024-11-21 05:38:23.159206+00:00 1
User
uid handle name created_at
id
1 00000000 anonymous None 2024-11-21 05:38:09.860606+00:00
******************
* module: bionty *
******************

View lineage:

artifact.view_lineage()
_images/c70747fae6bdb04f81fa9e495f83618db9aca74dc28ac95f76e3be67f530d1cf.svg

The transferred dataset is linked to a special type of transform that stores the slug and uid of the source instance:

artifact.transform.name
'Transfer from `laminlabs/lamindata`'

The transform key has shape f"transfers/{source_instance.uid}":

artifact.transform.key
'transfers/4XIuR0tvaiXM'

The current notebook run is linked as the parent of the “transfer run”:

artifact.run.parent.transform
Transform(uid='ITeOtm7bhtdq0000', is_latest=True, name='Transfer data', key='transfer.ipynb', type='notebook', created_by_id=1, created_at=2024-11-21 05:38:17 UTC)
Hide code cell content
# test the last 3 cells here
assert artifact.transform.name == "Transfer from `laminlabs/lamindata`"
assert artifact.transform.key == "transfers/4XIuR0tvaiXM"
assert artifact.transform.uid == "4XIuR0tvaiXM0000"
assert artifact.run.parent.transform.name == "Transfer data"

# clean up test instance
!lamin delete --force test-transfer
! calling anonymously, will miss private instances
• deleting instance anonymous/test-transfer