Sync data across databases
¶
This guide shows how to sync objects from a source database to your default database.
We need a target database:
!lamin init --storage ./test-sync --modules bionty
Show code cell output
! using anonymous user (to identify, call: lamin login)
→ initialized lamindb: anonymous/test-sync
Import lamindb and optionally run ln.track():
import lamindb as ln
ln.track()
Show code cell output
→ connected lamindb: anonymous/test-sync
→ created Transform('WT7rFTNM9XOl0000', key='sync.ipynb'), started new Run('p4kBzalCspryiqQg') at 2026-03-20 14:26:32 UTC
→ notebook imports: lamindb
• recommendation: to identify the notebook across renames, pass the uid: ln.track("WT7rFTNM9XOl")
Syncing works for any object type (Artifact, Record, Transform, ULabel, etc.). Let’s sync an artifact to our current default database:
db = ln.DB("laminlabs/lamindata")
# query the artifact on the source database
artifact = db.Artifact.get(key="example_datasets/mini_immuno/dataset1.h5ad")
# sync the artifact to the current database
artifact.save()
Show code cell output
→ transferred: Artifact(uid='9K1dteZ6Qx0EXK8g0000'), Storage(uid='D9BilDV2'), Schema(uid='0000000000000002')
Artifact(uid='9K1dteZ6Qx0EXK8g0000', key='example_datasets/mini_immuno/dataset1.h5ad', description='Flow cytometry readouts on invitro cell culture', suffix='.h5ad', kind='dataset', otype='AnnData', size=31672.0, hash='FB3CeMjmg1ivN6HDy6wsSg', n_files=None, n_observations=3.0, branch_id=1, created_on_id=1, space_id=1, storage_id=2, run_id=2, schema_id=1, created_by_id=1, created_at=2025-07-29 12:27:25 UTC, is_locked=False, version_tag=None, is_latest=True)
If you also want to sync feature & label annotations, pass transfer="annotations":
# query again so that `artifact` holds the object on the source database
artifact = db.Artifact.get(key="example_datasets/mini_immuno/dataset1.h5ad")
# sync the artifact to the current database, including transfer of annotations where necessary
artifact.save(transfer="annotations")
Show code cell output
→ mapped: Artifact(uid='9K1dteZ6Qx0EXK8g0000'), CellType(uid='ryEtgi1yGtAcX2'), CellType(uid='22LvKd01YyNA1a'), CellType(uid='6IC9NGJEv2Y4TD'), CellType(uid='ryEtgi1yGtAcX2'), ExperimentalFactor(uid='4WYv9kl0W2SroY')
→ transferred: Feature(uid='LIrjN9FbaLR1'), Feature(uid='xFdXre6ZPLlK'), Feature(uid='fJnNc4pzxe9c'), Feature(uid='7xDpJZiVLRl3'), Feature(uid='BaPfsAPgDFrT'), Feature(uid='DLeKfqUbrUsg'), Feature(uid='zvyDVbZln36o'), Feature(uid='Q8edF7CSgjG2'), Organism(uid='1dpCL6TduFJ3AP'), Source(uid='4BENqfHn'), Source(uid='404rkf5M'), Gene(uid='1j4At3x7akJU8n'), Gene(uid='6Aqvc8ckDYeNrD'), Gene(uid='3bhNYquOnA4sdo'), ULabel(uid='vmjLLqYy'), ULabel(uid='YAhFIvh5'), ULabel(uid='Yis4YLIB'), ULabel(uid='InLummy0'), Feature(uid='4ycwa8er0EB2'), Record(uid='ZRP07Y49Ni3Ne0Ae'), Record(uid='6pjoBrrz4f1EzQMO'), Record(uid='a6Zf73YeFR7o7RFU'), Record(uid='fNBzuANAusnkFv2p'), Schema(uid='JfgNiPmWNLZz4YRh'), Feature(uid='pNaJLQh8fRA6'), Project(uid='BZF49Wr2yZAC')
Artifact(uid='9K1dteZ6Qx0EXK8g0000', key='example_datasets/mini_immuno/dataset1.h5ad', description='Flow cytometry readouts on invitro cell culture', suffix='.h5ad', kind='dataset', otype='AnnData', size=31672, hash='FB3CeMjmg1ivN6HDy6wsSg', n_files=None, n_observations=3, branch_id=1, created_on_id=1, space_id=1, storage_id=2, run_id=2, schema_id=1, created_by_id=1, created_at=2025-07-29 12:27:25 UTC, is_locked=False, version_tag=None, is_latest=True)
The artifact now has all feature & label annotations:
artifact.describe()
Show code cell output
Artifact: example_datasets/mini_immuno/dataset1.h5ad (0000) | description: Flow cytometry readouts on invitro cell culture ├── uid: 9K1dteZ6Qx0EXK8g0000 run: wlb559u (__lamindb_transfer__/4XIuR0tvaiXM) │ kind: dataset otype: AnnData │ hash: FB3CeMjmg1ivN6HDy6wsSg size: 30.9 KB │ branch: main space: all │ created_at: 2025-07-29 12:27:25 UTC created_by: anonymous │ n_observations: 3 ├── storage/path: s3://lamindata/.lamindb/9K1dteZ6Qx0EXK8g0000.h5ad ├── Dataset features │ ├── obs (8) │ │ assay_oid bionty.ExperimentalFactor.ontology… EFO:0008913 │ │ cell_type_by_expert bionty.CellType CD8-positive, alpha-beta T cell │ │ cell_type_by_model bionty.CellType B cell, T cell │ │ concentration str │ │ donor str │ │ perturbation ULabel DMSO, IFNG │ │ sample_note str │ │ treatment_time_h num │ └── var.T (3 bionty.Gene) │ CD14 num │ CD4 num │ CD8A num ├── External features │ └── experiment Record[scRNA-seq] EXP-scRNA-002 │ experiment ULabel Experiment 1 └── Labels └── .ulabels ULabel DMSO, IFNG, Experiment 1 .records Record EXP-scRNA-002 .projects Project Tutorials .cell_types bionty.CellType B cell, T cell, CD8-positive, alpha-be… .experimental_factors bionty.ExperimentalFactor single-cell RNA sequencing
The sync is zero-copy, which means that the data itself remained in the original storage location:
artifact.path
Show code cell output
S3QueryPath('s3://lamindata/.lamindb/9K1dteZ6Qx0EXK8g0000.h5ad')
Data lineage indicates the source database of the sync:
artifact.view_lineage()
Show code cell output
! calling anonymously, will miss private instances
The run that initiated the sync is linked via initiated_by_run:
artifact.run.initiated_by_run.transform
Show code cell output
Transform(uid='WT7rFTNM9XOl0000', key='sync.ipynb', description='Sync data across databases [](https://github.com/laminlabs/lamindb/blob/main/docs/sync.md)', kind='notebook', hash=None, reference=None, reference_type=None, environment=None, plan=None, branch_id=1, created_on_id=1, space_id=1, created_by_id=1, created_at=2026-03-20 14:26:32 UTC, is_locked=False, version_tag=None, is_latest=True)
Upon calling .save() again, lamindb identifies that the object already exists in the target database and simply maps it:
artifact = db.Artifact.get(key="example_datasets/mini_immuno/dataset1.h5ad")
artifact.save()
Show code cell output
→ mapped: Artifact(uid='9K1dteZ6Qx0EXK8g0000')
Artifact(uid='9K1dteZ6Qx0EXK8g0000', key='example_datasets/mini_immuno/dataset1.h5ad', description='Flow cytometry readouts on invitro cell culture', suffix='.h5ad', kind='dataset', otype='AnnData', size=31672, hash='FB3CeMjmg1ivN6HDy6wsSg', n_files=None, n_observations=3, branch_id=1, created_on_id=1, space_id=1, storage_id=2, run_id=2, schema_id=1, created_by_id=1, created_at=2025-07-29 12:27:25 UTC, is_locked=False, version_tag=None, is_latest=True)
How do I know if an object is in the default database or elsewhere?
Every SQLRecord object has an attribute ._state.db which can take the following values:
None: the object has not yet been saved to any database"default": the object is saved on the default database instance"account/name": the object is saved on a non-default database instance referenced byaccount/name(e.g.,laminlabs/lamindata)
Show code cell content
# test the last 3 cells here
assert artifact.transform.description == "Transfer from `laminlabs/lamindata`"
assert artifact.transform.key == "__lamindb_transfer__/4XIuR0tvaiXM"
assert artifact.transform.uid == "4XIuR0tvaiXM0000"
assert artifact.run.initiated_by_run.transform.description.startswith("Sync data")