Query & search registries

!lamin init --storage ./mydata
Hide code cell output
→ connected lamindb: testuser1/mydata
import lamindb as ln

# create toy data
ln.Artifact(ln.core.datasets.file_jpg_paradisi05(), description="My image").save()
ln.Artifact.from_df(ln.core.datasets.df_iris(), description="The iris collection").save()
ln.Artifact(ln.core.datasets.file_fastq(), description="My fastq").save()
Hide code cell output
→ connected lamindb: testuser1/mydata
! no run & transform get linked, consider calling ln.context.track()
! no run & transform get linked, consider calling ln.context.track()
! no run & transform get linked, consider calling ln.context.track()
Artifact(uid='9kQngQRFgzS9Ypt10000', is_latest=True, description='My fastq', suffix='.fastq.gz', size=20, hash='hi7ZmAzz8sfMd3vIQr-57Q', _hash_type='md5', visibility=1, _key_is_virtual=True, created_by_id=1, storage_id=1, updated_at='2024-09-06 08:58:00 UTC')

Look up metadata

For entities where we don’t store more than 100k records, a look up object can be a convenient way of selecting a record.

Consider the User registry:

users = ln.User.lookup(field="handle")

With auto-complete, we find a user:

user = users.testuser1
user
User(uid='DzTjkKse', handle='testuser1', name='Test User1', updated_at='2024-09-06 08:57:58 UTC')

You can also get a dictionary, if you prefer:

users_dict = ln.User.lookup().dict()

Query exactly one record

# by uid
ln.User.get("DzTjkKse")
# by any expression involving fields
ln.User.get(handle="testuser1")
User(uid='DzTjkKse', handle='testuser1', name='Test User1', updated_at='2024-09-06 08:57:58 UTC')

Query sets of records

Filter for all artifacts created by a user:

ln.Artifact.filter(created_by=user).df()
uid version is_latest description key suffix type size hash n_objects n_observations _hash_type _accessor visibility _key_is_virtual storage_id transform_id run_id created_by_id updated_at
id
1 hkeIlncMhi6Xg7RD0000 None True My image None .jpg None 29358 r4tnqmKI_SjrkdLzpuWp4g None None md5 None 1 True 1 None None 1 2024-09-06 08:57:59.994196+00:00
2 WVabCyERwTAIDT770000 None True The iris collection None .parquet dataset 5629 VQGEXvcC5ZvBwMSm-3CHWg None None md5 DataFrame 1 True 1 None None 1 2024-09-06 08:58:00.102105+00:00
3 9kQngQRFgzS9Ypt10000 None True My fastq None .fastq.gz None 20 hi7ZmAzz8sfMd3vIQr-57Q None None md5 None 1 True 1 None None 1 2024-09-06 08:58:00.108506+00:00

To access the results encoded in a filter statement, execute its return value with one of:

  • .df(): A pandas DataFrame with each record in a row.

  • .all(): A QuerySet.

  • .one(): Exactly one record. Will raise an error if there is none. Is equivalent to the .get() method shown above.

  • .one_or_none(): Either one record or None if there is no query result.

Note

filter() returns a QuerySet.

The ORMs in LaminDB are Django Models and any Django query works. LaminDB extends Django’s API for data scientists.

Under the hood, any .filter() call translates into a SQL select statement.

.one() and .one_or_none() are two parts of LaminDB’s API that are borrowed from SQLAlchemy.

Search for metadata

ln.Artifact.search("iris").df()
Hide code cell output
uid version is_latest description key suffix type size hash n_objects n_observations _hash_type _accessor visibility _key_is_virtual storage_id transform_id run_id created_by_id updated_at
id
2 WVabCyERwTAIDT770000 None True The iris collection None .parquet dataset 5629 VQGEXvcC5ZvBwMSm-3CHWg None None md5 DataFrame 1 True 1 None None 1 2024-09-06 08:58:00.102105+00:00

Let us create 500 notebook objects with fake titles and save them:

transforms = [ln.Transform(name=title, type="notebook") for title in ln.core.datasets.fake_bio_notebook_titles(n=500)]
ln.save(transforms)

We can now search for any combination of terms:

ln.Transform.search("intestine").df().head()
Hide code cell output
uid version is_latest name key description type source_code hash reference reference_type _source_code_artifact_id created_by_id updated_at
id
1 plvBYoAOmxHn0000 None True Igg1 Speed cells classify intestine. None None notebook None None None None None 1 2024-09-06 08:58:01.697821+00:00
20 mEPvglTZOYyV0000 None True Intestine research IgG1 IgG IgG IgA. None None notebook None None None None None 1 2024-09-06 08:58:01.699070+00:00
21 U4prgJp7OsSH0000 None True Intestine Olfactory ensheathing cells IgA IgG1... None None notebook None None None None None 1 2024-09-06 08:58:01.699133+00:00
42 ieHQXKmg0lQu0000 None True Intestine cluster IgD IgG visualize intestine. None None notebook None None None None None 1 2024-09-06 08:58:01.700470+00:00
46 j4cpPN7UBGQ30000 None True Igg1 Vestibule of the ear IgG IgG IgA intestine. None None notebook None None None None None 1 2024-09-06 08:58:01.700725+00:00

Leverage relations

Django has a double-under-score syntax to filter based on related tables.

This syntax enables you to traverse several layers of relations:

ln.Artifact.filter(run__created_by__handle__startswith="testuse").df()  
Hide code cell output
uid version is_latest description key suffix type size hash n_objects n_observations _hash_type _accessor visibility _key_is_virtual storage_id transform_id run_id created_by_id updated_at
id

The filter selects all artifacts based on the users who ran the generating notebook.

Under the hood, in the SQL database, it’s joining the artifact table with the run and the user table.

Beyond __startswith, Django supports about two dozen field comparators field__comparator=value. Below follow some of them.

and

ln.Artifact.filter(suffix=".jpg", created_by=user).df()
Hide code cell output
uid version is_latest description key suffix type size hash n_objects n_observations _hash_type _accessor visibility _key_is_virtual storage_id transform_id run_id created_by_id updated_at
id
1 hkeIlncMhi6Xg7RD0000 None True My image None .jpg None 29358 r4tnqmKI_SjrkdLzpuWp4g None None md5 None 1 True 1 None None 1 2024-09-06 08:57:59.994196+00:00

less than/ greater than

Or subset to artifacts smaller than 10kB. Here, we can’t use keyword arguments, but need an explicit where statement.

ln.Artifact.filter(created_by=user, size__lt=1e4).df()
Hide code cell output
uid version is_latest description key suffix type size hash n_objects n_observations _hash_type _accessor visibility _key_is_virtual storage_id transform_id run_id created_by_id updated_at
id
2 WVabCyERwTAIDT770000 None True The iris collection None .parquet dataset 5629 VQGEXvcC5ZvBwMSm-3CHWg None None md5 DataFrame 1 True 1 None None 1 2024-09-06 08:58:00.102105+00:00
3 9kQngQRFgzS9Ypt10000 None True My fastq None .fastq.gz None 20 hi7ZmAzz8sfMd3vIQr-57Q None None md5 None 1 True 1 None None 1 2024-09-06 08:58:00.108506+00:00

or

ln.Artifact.filter().filter(ln.Q(suffix=".jpg") | ln.Q(suffix=".fastq.gz")).df()
Hide code cell output
uid version is_latest description key suffix type size hash n_objects n_observations _hash_type _accessor visibility _key_is_virtual storage_id transform_id run_id created_by_id updated_at
id
1 hkeIlncMhi6Xg7RD0000 None True My image None .jpg None 29358 r4tnqmKI_SjrkdLzpuWp4g None None md5 None 1 True 1 None None 1 2024-09-06 08:57:59.994196+00:00
3 9kQngQRFgzS9Ypt10000 None True My fastq None .fastq.gz None 20 hi7ZmAzz8sfMd3vIQr-57Q None None md5 None 1 True 1 None None 1 2024-09-06 08:58:00.108506+00:00

in

ln.Artifact.filter(suffix__in=[".jpg", ".fastq.gz"]).df()
Hide code cell output
uid version is_latest description key suffix type size hash n_objects n_observations _hash_type _accessor visibility _key_is_virtual storage_id transform_id run_id created_by_id updated_at
id
1 hkeIlncMhi6Xg7RD0000 None True My image None .jpg None 29358 r4tnqmKI_SjrkdLzpuWp4g None None md5 None 1 True 1 None None 1 2024-09-06 08:57:59.994196+00:00
3 9kQngQRFgzS9Ypt10000 None True My fastq None .fastq.gz None 20 hi7ZmAzz8sfMd3vIQr-57Q None None md5 None 1 True 1 None None 1 2024-09-06 08:58:00.108506+00:00

order by

ln.Artifact.filter().order_by("-updated_at").df()
Hide code cell output
uid version is_latest description key suffix type size hash n_objects n_observations _hash_type _accessor visibility _key_is_virtual storage_id transform_id run_id created_by_id updated_at
id
3 9kQngQRFgzS9Ypt10000 None True My fastq None .fastq.gz None 20 hi7ZmAzz8sfMd3vIQr-57Q None None md5 None 1 True 1 None None 1 2024-09-06 08:58:00.108506+00:00
2 WVabCyERwTAIDT770000 None True The iris collection None .parquet dataset 5629 VQGEXvcC5ZvBwMSm-3CHWg None None md5 DataFrame 1 True 1 None None 1 2024-09-06 08:58:00.102105+00:00
1 hkeIlncMhi6Xg7RD0000 None True My image None .jpg None 29358 r4tnqmKI_SjrkdLzpuWp4g None None md5 None 1 True 1 None None 1 2024-09-06 08:57:59.994196+00:00

contains

ln.Transform.filter(name__contains="search").df().head(10)
Hide code cell output
uid version is_latest name key description type source_code hash reference reference_type _source_code_artifact_id created_by_id updated_at
id
6 zrpb30rlALx10000 None True Apocrine Sweat Gland research IgG1 rank. None None notebook None None None None None 1 2024-09-06 08:58:01.698178+00:00
11 pNlRzkhnwwwM0000 None True Apocrine Sweat Gland IgG2 Midbrain IgG2 IgG1 r... None None notebook None None None None None 1 2024-09-06 08:58:01.698499+00:00
14 P4LNM1r5iGwh0000 None True Veins result Olfactory ensheathing cells Speed... None None notebook None None None None None 1 2024-09-06 08:58:01.698689+00:00
16 HvVnIPEysmA50000 None True Peripolar Cell Midbrain research Apocrine swea... None None notebook None None None None None 1 2024-09-06 08:58:01.698816+00:00
20 mEPvglTZOYyV0000 None True Intestine research IgG1 IgG IgG IgA. None None notebook None None None None None 1 2024-09-06 08:58:01.699070+00:00
28 XouO3uBRIUrm0000 None True Research visualize IgG2 rank. None None notebook None None None None None 1 2024-09-06 08:58:01.699576+00:00
29 JTrGMTkZABpp0000 None True Igg2 Unipolar brush cells IgG3 research intest... None None notebook None None None None None 1 2024-09-06 08:58:01.699640+00:00
32 X1cAk0OinqW00000 None True Olfactory Ensheathing Cells IgA research Sperm... None None notebook None None None None None 1 2024-09-06 08:58:01.699831+00:00
36 c3VAt72ouuJQ0000 None True Igg3 research Spermatozoon visualize Melanotro... None None notebook None None None None None 1 2024-09-06 08:58:01.700085+00:00
37 HNKunYFPnyaw0000 None True Igg3 Unipolar brush cells IgA research Olfacto... None None notebook None None None None None 1 2024-09-06 08:58:01.700148+00:00

And case-insensitive:

ln.Transform.filter(name__icontains="Search").df().head(10)
Hide code cell output
uid version is_latest name key description type source_code hash reference reference_type _source_code_artifact_id created_by_id updated_at
id
6 zrpb30rlALx10000 None True Apocrine Sweat Gland research IgG1 rank. None None notebook None None None None None 1 2024-09-06 08:58:01.698178+00:00
11 pNlRzkhnwwwM0000 None True Apocrine Sweat Gland IgG2 Midbrain IgG2 IgG1 r... None None notebook None None None None None 1 2024-09-06 08:58:01.698499+00:00
14 P4LNM1r5iGwh0000 None True Veins result Olfactory ensheathing cells Speed... None None notebook None None None None None 1 2024-09-06 08:58:01.698689+00:00
16 HvVnIPEysmA50000 None True Peripolar Cell Midbrain research Apocrine swea... None None notebook None None None None None 1 2024-09-06 08:58:01.698816+00:00
20 mEPvglTZOYyV0000 None True Intestine research IgG1 IgG IgG IgA. None None notebook None None None None None 1 2024-09-06 08:58:01.699070+00:00
28 XouO3uBRIUrm0000 None True Research visualize IgG2 rank. None None notebook None None None None None 1 2024-09-06 08:58:01.699576+00:00
29 JTrGMTkZABpp0000 None True Igg2 Unipolar brush cells IgG3 research intest... None None notebook None None None None None 1 2024-09-06 08:58:01.699640+00:00
32 X1cAk0OinqW00000 None True Olfactory Ensheathing Cells IgA research Sperm... None None notebook None None None None None 1 2024-09-06 08:58:01.699831+00:00
36 c3VAt72ouuJQ0000 None True Igg3 research Spermatozoon visualize Melanotro... None None notebook None None None None None 1 2024-09-06 08:58:01.700085+00:00
37 HNKunYFPnyaw0000 None True Igg3 Unipolar brush cells IgA research Olfacto... None None notebook None None None None None 1 2024-09-06 08:58:01.700148+00:00

startswith

ln.Transform.filter(name__startswith="Research").df()
Hide code cell output
uid version is_latest name key description type source_code hash reference reference_type _source_code_artifact_id created_by_id updated_at
id
28 XouO3uBRIUrm0000 None True Research visualize IgG2 rank. None None notebook None None None None None 1 2024-09-06 08:58:01.699576+00:00
65 CabDJZDSjocn0000 None True Research IgG2 result IgG Midbrain IgG2 intestine. None None notebook None None None None None 1 2024-09-06 08:58:01.701950+00:00
71 Q8NcGKt9ANDy0000 None True Research IgG1 Apocrine sweat gland IgG result. None None notebook None None None None None 1 2024-09-06 08:58:01.705192+00:00
93 5hABW8ua4lQj0000 None True Research candidate Semicircular canals IgG. None None notebook None None None None None 1 2024-09-06 08:58:01.706539+00:00
131 FbV2ajlCEewE0000 None True Research IgA intestine candidate Unipolar brus... None None notebook None None None None None 1 2024-09-06 08:58:01.708808+00:00
143 TqxLrssuauBq0000 None True Research Olfactory ensheathing cells intestina... None None notebook None None None None None 1 2024-09-06 08:58:01.711963+00:00
177 z3FRhHwBc8ql0000 None True Research IgA Peripolar cell rank Melanotropes ... None None notebook None None None None None 1 2024-09-06 08:58:01.714034+00:00
281 Ti1sYoEZqGqI0000 None True Research Oligodendrocytes Peripolar cell. None None notebook None None None None None 1 2024-09-06 08:58:01.725331+00:00
342 N6Z131OHjspO0000 None True Research IgG1 investigate classify cluster. None None notebook None None None None None 1 2024-09-06 08:58:01.731439+00:00
364 OUx78m6gHrEl0000 None True Research Oligodendrocytes IgG3 IgG1 Midbrain I... None None notebook None None None None None 1 2024-09-06 08:58:01.732753+00:00
379 hBMSpYcGAx810000 None True Research IgG IgA IgG1. None None notebook None None None None None 1 2024-09-06 08:58:01.733674+00:00
Hide code cell content
# clean up test instance
!lamin delete --force mydata
!rm -r mydata
Traceback (most recent call last):
  File "/opt/hostedtoolcache/Python/3.11.9/x64/bin/lamin", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/rich_click/rich_command.py", line 367, in __call__
    return super().__call__(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/rich_click/rich_command.py", line 152, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/lamin_cli/__main__.py", line 195, in delete
    return delete(instance, force=force)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/lamindb_setup/_delete.py", line 98, in delete
    n_objects = check_storage_is_empty(
                ^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/lamindb_setup/core/upath.py", line 776, in check_storage_is_empty
    raise InstanceNotEmpty(message)
lamindb_setup.core.upath.InstanceNotEmpty: Storage /home/runner/work/lamindb/lamindb/docs/mydata/.lamindb contains 3 objects ('_is_initialized' ignored) - delete them prior to deleting the instance