Will data & metadata stay in sync?¶
Here, we walk through different errors that can occur while saving artifacts & metadata records, and show that the LaminDB instance does not get corrupted by dangling metadata or artifacts. You could say transactions across data & metadata are ACID.
# !pip install 'lamindb[jupyter,aws]'
from laminci.db import setup_local_test_postgres
pgurl = setup_local_test_postgres()
• Created Postgres test instance: 'postgresql://postgres:[email protected]:5432/pgtest'
It runs in docker container 'pgtest'
!lamin init --db {pgurl} --storage ./test-acid
! using the sql database name for the instance name
→ connected lamindb: testuser1/pgtest
import pytest
import lamindb as ln
from upath import UPath
→ connected lamindb: testuser1/pgtest
Save error due to failed upload¶
Let’s try to save an artifact to a storage location without permission.
artifact = ln.Artifact.from_anndata(
ln.core.datasets.anndata_mouse_sc_lymph_node(),
description="Mouse Lymph Node scRNA-seq",
)
! no run & transform got linked, call `ln.track()` & re-run
Because the public API only allows you to set a default storage for which you have permission, we need to hack it:
ln.setup.settings.storage._root = UPath("s3://nf-core-awsmegatests")
ln.settings.storage
StorageSettings(root='/home/runner/work/lamindb/lamindb/docs/faq/test-acid', uid='2m3dmCPHL6ZO')
This raises a RuntimeError
:
with pytest.raises(RuntimeError) as error:
artifact.save()
print(error.exconly())
! could not upload artifact: Artifact(uid='uJ4rGc0vrqC0sqSy0000', is_latest=True, description='Mouse Lymph Node scRNA-seq', suffix='.h5ad', type='dataset', size=17177479, hash='P3FWm0NQ99uJNPOYNXUxpA', _hash_type='md5', _accessor='AnnData', visibility=1, _key_is_virtual=True, storage_id=1, created_by_id=1, created_at=2024-11-21 05:37:17 UTC)
RuntimeError: Access Denied
Let’s now check that no metadata records were added to the database:
assert len(ln.Artifact.filter().all()) == 0
Save error during bulk creation¶
filepath = ln.core.datasets.file_jpg_paradisi05()
artifact = ln.Artifact(filepath, description="My image")
artifacts = [artifact, "this is not a record"]
! no run & transform got linked, call `ln.track()` & re-run
This raises an exception:
with pytest.raises(Exception) as error:
ln.save(artifacts)
print(error.exconly())
AttributeError: 'str' object has no attribute '_state'
Nothing got saved:
artifacts = ln.Artifact.filter().all()
assert len(artifacts) == 0
If a list of data objects is passed to ln.save()
and the upload of one of these data objects fails, the successful uploads are maintained and a RuntimeError
is raised, listing the successfully uploaded data objects up until that point.
Show code cell content
!docker stop pgtest && docker rm pgtest
!lamin delete --force pgtest
pgtest
pgtest
! delete() does not yet affect your Postgres database at postgresql://postgres:[email protected]:5432/pgtest
• deleting instance testuser1/pgtest