Transfer data¶
This guide shows how to transfer data from a source database into the currently connected database.
# pip install 'lamindb[jupyter,bionty]'
!lamin init --storage ./test-transfer --modules bionty
Show code cell output
! using anonymous user (to identify, call: lamin login)
→ initialized lamindb: anonymous/test-transfer
import lamindb as ln
ln.track("ITeOtm7bhtdq")
Show code cell output
→ connected lamindb: anonymous/test-transfer
→ created Transform('ITeOtm7bhtdq0000'), started new Run('yo80sGzI...') at 2025-07-14 06:40:07 UTC
→ notebook imports: lamindb==1.8.0
Query all artifacts in the laminlabs/lamindata
instance and filter them to their latest versions.
# query all latest artifact versions
artifacts = ln.Artifact.using("laminlabs/lamindata").filter(is_latest=True)
# convert the QuerySet to a DataFrame and show the latest 5 versions
artifacts.df().head()
Show code cell output
uid | key | description | suffix | kind | otype | size | hash | n_files | n_observations | _hash_type | _key_is_virtual | _overwrite_versions | space_id | storage_id | schema_id | version | is_latest | run_id | created_at | created_by_id | _aux | branch_id | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | |||||||||||||||||||||||
1282 | WQtsc0CQZKB9GEst0000 | None | Example R cars dataset | .parquet | dataset | DataFrame | 2402.0 | eIk8NXNiwMoGmhhjrMILbg | NaN | NaN | md5 | True | False | 1 | 2 | NaN | None | True | 460.0 | 2025-01-15 14:22:51.192955+00:00 | 30 | None | 1 |
1349 | 9KD0HE9lVveLpvuI0000 | data/prep_adata | None | .h5ad | None | AnnData | 124511524.0 | gnwU_GFFN_xIhtncrxu-tv | NaN | NaN | sha1-fl | True | False | 1 | 2 | NaN | None | True | NaN | 2025-03-03 23:24:56.184549+00:00 | 35 | None | 1 |
1451 | cGi8QjXNQQfZzL4n0000 | simple-lineage/figures/pca_all.pdf | None | None | None | 4707.0 | QexvSEBGMa80m0pV5KXd4w | NaN | NaN | md5 | True | False | 1 | 2 | NaN | None | True | 569.0 | 2025-04-01 11:33:47.714024+00:00 | 9 | None | 1 | |
1699 | 2qBNr2ICBnMS8JSC0000 | mini_text_files/file32.txt | None | .txt | None | None | 2.0 | Y2TT8PSVtqudz407XG4LAQ | NaN | NaN | md5 | False | False | 1 | 2 | NaN | None | True | 669.0 | 2025-05-05 14:15:55.974243+00:00 | 9 | None | 1 |
1742 | FoaS7BF8AZpt0Va80000 | mini_text_files/file64.txt | None | .txt | None | None | 2.0 | 6l0vHEYIIy4H06o9mY5RNQ | NaN | NaN | md5 | False | False | 1 | 2 | NaN | None | True | 669.0 | 2025-05-05 14:16:01.479023+00:00 | 9 | None | 1 |
You can now further subset or search the QuerySet
. Here we query by whether the description contains “tabula sapiens”.
artifact = artifacts.filter(description__contains="Tabula Sapiens").first()
artifact.describe()
Show code cell output
Artifact .h5ad ├── General │ ├── uid: dPraor9rU1EofcFb6Wph hash: 8mB1KK2wd51F6HQdvqipcQ │ ├── size: 3.6 GB space: all │ ├── branch: main created_at: 2023-07-14 19:00:30 │ ├── created_by: Koncopd (Sergei Rybakov) │ ├── key: tabula_sapiens_lung.h5ad │ ├── storage location / path: s3://lamindata/tabula_sapiens_lung.h5ad │ ├── description: Part of Tabula Sapiens, a benchmark, first-draft human cell atlas. │ └── transform: ux-session-tb-lung └── Labels └── .tissues bionty.Tissue lung .cell_types bionty.CellType CD4-positive, alpha-beta T cell, CD8-po… .experimental_factors bionty.ExperimentalFactor anoxya, stroke .ulabels ULabel TSP1, TSP2, TSP14
By saving the artifact record that’s currently attached to the source database instance, you transfer it to the default database instance.
artifact.save()
Show code cell output
→ transferred: Artifact(uid='dPraor9rU1EofcFb6Wph'), Storage(uid='D9BilDV2')
Artifact(uid='dPraor9rU1EofcFb6Wph', is_latest=True, key='tabula_sapiens_lung.h5ad', description='Part of Tabula Sapiens, a benchmark, first-draft human cell atlas.', suffix='.h5ad', size=3899435772, hash='8mB1KK2wd51F6HQdvqipcQ', branch_id=1, space_id=1, storage_id=2, run_id=2, created_by_id=1, created_at=2023-07-14 19:00:30 UTC)
How do I know if a record is saved in the default database instance or not?
Every record has an attribute ._state.db
which can take the following values:
None
: the record has not yet been saved to any database"default"
: the record is saved on the default database instance"account/name"
: the record is saved on a non-default database instance referenced byaccount/name
(e.g.,laminlabs/lamindata
)
The artifact record has been transferred to the current database without feature & label annotations, but with updated data lineage.
artifact.describe()
Show code cell output
Artifact .h5ad └── General ├── uid: dPraor9rU1EofcFb6Wph hash: 8mB1KK2wd51F6HQdvqipcQ ├── size: 3.6 GB space: all ├── branch: main created_at: 2023-07-14 19:00:30 ├── created_by: anonymous ├── key: tabula_sapiens_lung.h5ad ├── storage location / path: s3://lamindata/tabula_sapiens_lung.h5ad ├── description: Part of Tabula Sapiens, a benchmark, first-draft human cell atlas. └── transform: __lamindb_transfer__/4XIuR0tvaiXM
You see that the data itself remained in the original storage location, which has been added to the current instance’s storage location as a read-only location (indicated by the fact that the instance_uid
doesn’t match the current instance).
ln.Storage.df()
Show code cell output
uid | root | description | type | region | instance_uid | space_id | run_id | created_at | created_by_id | _aux | branch_id | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
id | ||||||||||||
1 | XBeWU7nck6Vq | /home/runner/work/lamindb/lamindb/docs/test-tr... | None | local | None | 1FHu5eE0uxm4 | 1 | NaN | 2025-07-14 06:40:03.979000+00:00 | 1 | None | 1 |
2 | D9BilDV2 | s3://lamindata | None | s3 | us-east-1 | 4XIuR0tvaiXM | 1 | 2.0 | 2023-04-22 05:50:06.537267+00:00 | 1 | None | 1 |
See the state of the database.
ln.view()
Show code cell output
****************
* module: core *
****************
Artifact
uid | key | description | suffix | kind | otype | size | hash | n_files | n_observations | _hash_type | _key_is_virtual | _overwrite_versions | space_id | storage_id | schema_id | version | is_latest | run_id | created_at | created_by_id | _aux | branch_id | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | |||||||||||||||||||||||
1 | dPraor9rU1EofcFb6Wph | tabula_sapiens_lung.h5ad | Part of Tabula Sapiens, a benchmark, first-dra... | .h5ad | None | None | 3899435772 | 8mB1KK2wd51F6HQdvqipcQ | None | None | sha1-fl | False | False | 1 | 2 | None | None | True | 2 | 2023-07-14 19:00:30.621330+00:00 | 1 | None | 1 |
Run
uid | name | started_at | finished_at | reference | reference_type | _is_consecutive | _status_code | space_id | transform_id | report_id | _logfile_id | environment_id | initiated_by_run_id | created_at | created_by_id | _aux | branch_id | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | ||||||||||||||||||
1 | yo80sGzIyHPLkFkc | None | 2025-07-14 06:40:07.463886+00:00 | None | None | None | None | -1.0 | 1 | 1 | None | None | None | NaN | 2025-07-14 06:40:07.464000+00:00 | 1 | None | 1 |
2 | D3blUzckggoqMfRS | None | 2025-07-14 06:40:21.333000+00:00 | None | None | None | None | NaN | 1 | 2 | None | None | None | 1.0 | 2025-07-14 06:40:21.333000+00:00 | 1 | None | 1 |
Storage
uid | root | description | type | region | instance_uid | space_id | run_id | created_at | created_by_id | _aux | branch_id | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
id | ||||||||||||
1 | XBeWU7nck6Vq | /home/runner/work/lamindb/lamindb/docs/test-tr... | None | local | None | 1FHu5eE0uxm4 | 1 | NaN | 2025-07-14 06:40:03.979000+00:00 | 1 | None | 1 |
2 | D9BilDV2 | s3://lamindata | None | s3 | us-east-1 | 4XIuR0tvaiXM | 1 | 2.0 | 2023-04-22 05:50:06.537267+00:00 | 1 | None | 1 |
Transform
uid | key | description | type | source_code | hash | reference | reference_type | space_id | _template_id | version | is_latest | created_at | created_by_id | _aux | branch_id | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | ||||||||||||||||
2 | 4XIuR0tvaiXM0000 | __lamindb_transfer__/4XIuR0tvaiXM | Transfer from `laminlabs/lamindata` | function | None | None | None | None | 1 | None | None | True | 2025-07-14 06:40:21.326000+00:00 | 1 | None | 1 |
1 | ITeOtm7bhtdq0000 | transfer.ipynb | Transfer data | notebook | None | None | None | None | 1 | None | None | True | 2025-07-14 06:40:07.446000+00:00 | 1 | None | 1 |
******************
* module: bionty *
******************
Source
uid | entity | organism | name | in_db | currently_used | description | url | md5 | source_website | space_id | dataframe_artifact_id | version | run_id | created_at | created_by_id | _aux | branch_id | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | ||||||||||||||||||
1 | 33TUF039 | bionty.Organism | vertebrates | ensembl | False | True | Ensembl | https://ftp.ensembl.org/pub/release-112/specie... | None | https://www.ensembl.org | 1 | None | release-112 | None | 2025-07-14 06:40:04.086000+00:00 | 1 | None | 1 |
2 | 6bbVUTCS | bionty.Organism | bacteria | ensembl | False | True | Ensembl | https://ftp.ensemblgenomes.ebi.ac.uk/pub/bacte... | None | https://www.ensembl.org | 1 | None | release-57 | None | 2025-07-14 06:40:04.086000+00:00 | 1 | None | 1 |
3 | 6s9nV6xh | bionty.Organism | fungi | ensembl | False | True | Ensembl | https://ftp.ensemblgenomes.ebi.ac.uk/pub/fungi... | None | https://www.ensembl.org | 1 | None | release-57 | None | 2025-07-14 06:40:04.086000+00:00 | 1 | None | 1 |
4 | 2PmTrc8x | bionty.Organism | metazoa | ensembl | False | True | Ensembl | https://ftp.ensemblgenomes.ebi.ac.uk/pub/metaz... | None | https://www.ensembl.org | 1 | None | release-57 | None | 2025-07-14 06:40:04.086000+00:00 | 1 | None | 1 |
5 | 7GPHh16S | bionty.Organism | plants | ensembl | False | True | Ensembl | https://ftp.ensemblgenomes.ebi.ac.uk/pub/plant... | None | https://www.ensembl.org | 1 | None | release-57 | None | 2025-07-14 06:40:04.086000+00:00 | 1 | None | 1 |
6 | 4tsksCMX | bionty.Organism | all | ncbitaxon | False | True | NCBItaxon Ontology | http://purl.obolibrary.org/obo/ncbitaxon/2023-... | None | https://github.com/obophenotype/ncbitaxon | 1 | None | 2023-06-20 | None | 2025-07-14 06:40:04.086000+00:00 | 1 | None | 1 |
7 | 4UGNz3fr | bionty.Gene | human | ensembl | False | True | Ensembl | s3://bionty-assets/df_human__ensembl__release-... | None | https://www.ensembl.org | 1 | None | release-112 | None | 2025-07-14 06:40:04.086000+00:00 | 1 | None | 1 |
View lineage:
artifact.view_lineage()
Show code cell output
! calling anonymously, will miss private instances
The transferred dataset is linked to a special type of transform that stores the slug and uid of the source instance:
artifact.transform.description
Show code cell output
'Transfer from `laminlabs/lamindata`'
The transform key has the form f"__lamindb_transfer__/{source_instance.uid}"
:
artifact.transform.key
Show code cell output
'__lamindb_transfer__/4XIuR0tvaiXM'
The current notebook run is linked as the initiated_by_run of the “transfer run”:
artifact.run.initiated_by_run.transform
Show code cell output
Transform(uid='ITeOtm7bhtdq0000', is_latest=True, key='transfer.ipynb', description='Transfer data', type='notebook', branch_id=1, space_id=1, created_by_id=1, created_at=2025-07-14 06:40:07 UTC)
Upon re-transferring a record, it will identify that the record already exists in the target database and simply map the record.
artifact = artifacts.filter(description__contains="Tabula Sapiens").first()
artifact.save()
Show code cell output
→ mapped: Artifact(uid='dPraor9rU1EofcFb6Wph')
Artifact(uid='dPraor9rU1EofcFb6Wph', is_latest=True, key='tabula_sapiens_lung.h5ad', description='Part of Tabula Sapiens, a benchmark, first-draft human cell atlas.', suffix='.h5ad', size=3899435772, hash='8mB1KK2wd51F6HQdvqipcQ', branch_id=1, space_id=1, storage_id=2, run_id=2, created_by_id=1, created_at=2023-07-14 19:00:30 UTC)
If you also want to transfer annotations of the artifact, you can pass transfer="annotations"
to save()
. Just note that this might populate your target database with metadata that doesn’t match the conventions you want to enforce.
artifact = artifacts.filter(description__contains="Tabula Sapiens").first()
artifact.save(transfer="annotations")
Show code cell output
→ mapped: Artifact(uid='dPraor9rU1EofcFb6Wph'), Tissue(uid='7Tt4iEKc'), CellType(uid='5tiBvp96'), CellType(uid='7Crr32HI'), CellType(uid='6dzoXJ3Y'), CellType(uid='01NqvhnI'), CellType(uid='5NceZTYm'), CellType(uid='4PSMdO3I'), CellType(uid='3JO0EdVd'), CellType(uid='6rfrjhvo'), CellType(uid='37mWPv6o'), CellType(uid='5Z76sCep'), CellType(uid='2OWUH6Z1'), CellType(uid='5TU8SFt5'), CellType(uid='ryEtgi1y'), CellType(uid='1lMgAPE8'), CellType(uid='7m6Ruz32'), CellType(uid='42qbvc90'), CellType(uid='puGNwNrs'), CellType(uid='1T8bGe2I'), CellType(uid='6IC9NGJE'), CellType(uid='6ujMwy7s'), CellType(uid='3eecYgWR'), CellType(uid='zQ4dyjEs'), CellType(uid='7mNqzyFE'), CellType(uid='5A9EFjNB'), CellType(uid='3lsrLTv6'), CellType(uid='1HYtHpIc'), CellType(uid='6UmKFrzn'), CellType(uid='7eZArDpo'), CellType(uid='2KCFdGIk'), CellType(uid='1V5wVqK5'), CellType(uid='5i19XYug'), CellType(uid='2nPA0h4F'), CellType(uid='5Xi2OLvZ'), CellType(uid='3kaL3W1c'), ExperimentalFactor(uid='5YDCOg0V'), ExperimentalFactor(uid='7R1OhRJ7')
→ transferred: CellType(uid='4mZaXZQg'), CellType(uid='5rVn0X39'), CellType(uid='EWy46Sey'), CellType(uid='4yqLzwwm'), ULabel(uid='vfLXaHgD'), ULabel(uid='ZaVLDCZE'), ULabel(uid='gk6w8qC5'), ULabel(uid='tZCTk48f')
Artifact(uid='dPraor9rU1EofcFb6Wph', is_latest=True, key='tabula_sapiens_lung.h5ad', description='Part of Tabula Sapiens, a benchmark, first-draft human cell atlas.', suffix='.h5ad', size=3899435772, hash='8mB1KK2wd51F6HQdvqipcQ', branch_id=1, space_id=1, storage_id=2, run_id=2, created_by_id=1, created_at=2023-07-14 19:00:30 UTC)
The artifact is now annotated.
artifact.describe()
Show code cell output
Artifact .h5ad ├── General │ ├── uid: dPraor9rU1EofcFb6Wph hash: 8mB1KK2wd51F6HQdvqipcQ │ ├── size: 3.6 GB space: all │ ├── branch: main created_at: 2023-07-14 19:00:30 │ ├── created_by: anonymous │ ├── key: tabula_sapiens_lung.h5ad │ ├── storage location / path: s3://lamindata/tabula_sapiens_lung.h5ad │ ├── description: Part of Tabula Sapiens, a benchmark, first-draft human cell atlas. │ └── transform: __lamindb_transfer__/4XIuR0tvaiXM └── Labels └── .tissues bionty.Tissue lung .cell_types bionty.CellType pulmonary alveolar type 1 cell, adventi… .experimental_factors bionty.ExperimentalFactor anoxya, stroke .ulabels ULabel TSP1, TSP2, TSP14
Show code cell content
# test the last 3 cells here
assert artifact.transform.description == "Transfer from `laminlabs/lamindata`"
assert artifact.transform.key == "__lamindb_transfer__/4XIuR0tvaiXM"
assert artifact.transform.uid == "4XIuR0tvaiXM0000"
assert artifact.run.initiated_by_run.transform.description == "Transfer data"
# clean up test instance
!lamin delete --force test-transfer
! calling anonymously, will miss private instances
• deleting instance anonymous/test-transfer