Query & search registries

This guide walks through all the ways of finding metadata records in LaminDB registries.

# !pip install lamindb
!lamin init --storage ./test-registries
Hide code cell output
→ connected lamindb: testuser1/test-registries

We’ll need some toy data.

import lamindb as ln

# create toy data
ln.Artifact(ln.core.datasets.file_jpg_paradisi05(), description="My image").save()
ln.Artifact.from_df(ln.core.datasets.df_iris(), description="The iris collection").save()
ln.Artifact(ln.core.datasets.file_fastq(), description="My fastq").save()

# see the content of the artifact registry
ln.Artifact.df()
Hide code cell output
→ connected lamindb: testuser1/test-registries
! no run & transform got linked, call `ln.context.track()` & re-run`
! no run & transform got linked, call `ln.context.track()` & re-run`
! no run & transform got linked, call `ln.context.track()` & re-run`
uid version is_latest description key suffix type size hash n_objects n_observations _hash_type _accessor visibility _key_is_virtual storage_id transform_id run_id created_by_id updated_at
id
3 4OzS290TkVcQFbsf0000 None True My fastq None .fastq.gz None 20 hi7ZmAzz8sfMd3vIQr-57Q None None md5 None 1 True 1 None None 1 2024-09-18 22:36:31.000599+00:00
2 1CXXf2SGj5NFJTgg0000 None True The iris collection None .parquet dataset 5629 VQGEXvcC5ZvBwMSm-3CHWg None None md5 DataFrame 1 True 1 None None 1 2024-09-18 22:36:30.992523+00:00
1 cAqWojeihNU2Keqb0000 None True My image None .jpg None 29358 r4tnqmKI_SjrkdLzpuWp4g None None md5 None 1 True 1 None None 1 2024-09-18 22:36:30.850389+00:00

Look up metadata

For registries with less than 100k records, auto-completing a Lookup object is the most convenient way of finding a record.

For example, take the User registry:

# query the database for all users, optionally pass the field that creates the key
users = ln.User.lookup(field="handle")

# the lookup object is a NamedTuple
users
Hide code cell output
Lookup(testuser1=User(uid='DzTjkKse', handle='testuser1', name='Test User1', updated_at='2024-09-18 22:36:28 UTC'), dict=<bound method Lookup.dict of <lamin_utils._lookup.Lookup object at 0x7f28b8977490>>)

With auto-complete, we find a specific user record:

user = users.testuser1
user
Hide code cell output
User(uid='DzTjkKse', handle='testuser1', name='Test User1', updated_at='2024-09-18 22:36:28 UTC')

You can also get a dictionary:

users_dict = ln.User.lookup().dict()
users_dict
Hide code cell output
{'testuser1': User(uid='DzTjkKse', handle='testuser1', name='Test User1', updated_at='2024-09-18 22:36:28 UTC')}

Query exactly one record

get errors if more than one matching records are found.

# by the universal base62 uid
ln.User.get("DzTjkKse")

# by any expression involving fields
ln.User.get(handle="testuser1")
Hide code cell output
User(uid='DzTjkKse', handle='testuser1', name='Test User1', updated_at='2024-09-18 22:36:28 UTC')

Query sets of records

Filter for all artifacts created by a user:

ln.Artifact.filter(created_by=user).df()
Hide code cell output
uid version is_latest description key suffix type size hash n_objects n_observations _hash_type _accessor visibility _key_is_virtual storage_id transform_id run_id created_by_id updated_at
id
1 cAqWojeihNU2Keqb0000 None True My image None .jpg None 29358 r4tnqmKI_SjrkdLzpuWp4g None None md5 None 1 True 1 None None 1 2024-09-18 22:36:30.850389+00:00
2 1CXXf2SGj5NFJTgg0000 None True The iris collection None .parquet dataset 5629 VQGEXvcC5ZvBwMSm-3CHWg None None md5 DataFrame 1 True 1 None None 1 2024-09-18 22:36:30.992523+00:00
3 4OzS290TkVcQFbsf0000 None True My fastq None .fastq.gz None 20 hi7ZmAzz8sfMd3vIQr-57Q None None md5 None 1 True 1 None None 1 2024-09-18 22:36:31.000599+00:00

To access the results encoded in a filter statement, execute its return value with one of:

  • .df(): A pandas DataFrame with each record in a row.

  • .all(): A QuerySet.

  • .one(): Exactly one record. Will raise an error if there is none. Is equivalent to the .get() method shown above.

  • .one_or_none(): Either one record or None if there is no query result.

Note

filter() returns a QuerySet.

The ORMs in LaminDB are Django Models and any Django query works. LaminDB extends Django’s API for data scientists.

Under the hood, any .filter() call translates into a SQL select statement.

.one() and .one_or_none() are two parts of LaminDB’s API that are borrowed from SQLAlchemy.

Search for records

Search the toy data:

ln.Artifact.search("iris").df()
Hide code cell output
uid version is_latest description key suffix type size hash n_objects n_observations _hash_type _accessor visibility _key_is_virtual storage_id transform_id run_id created_by_id updated_at
id
2 1CXXf2SGj5NFJTgg0000 None True The iris collection None .parquet dataset 5629 VQGEXvcC5ZvBwMSm-3CHWg None None md5 DataFrame 1 True 1 None None 1 2024-09-18 22:36:30.992523+00:00

Let us create 500 notebook objects with fake titles, save, and search them:

transforms = [ln.Transform(name=title, type="notebook") for title in ln.core.datasets.fake_bio_notebook_titles(n=500)]
ln.save(transforms)

# search
ln.Transform.search("intestine").df().head(5)
Hide code cell output
uid version is_latest name key description type source_code hash reference reference_type _source_code_artifact_id created_by_id updated_at
id
3 35DTgF8ohdJu0000 None True Igg2 IgG2 study intestinal intestine. None None notebook None None None None None 1 2024-09-18 22:36:32.655770+00:00
20 hyoq5DaSwOpR0000 None True Retina Lungs Nasal cavity intestine Satellite ... None None notebook None None None None None 1 2024-09-18 22:36:32.656829+00:00
25 tSKtnxmVFxEp0000 None True Igg2 Inner phalangeal cells of organ of Corti ... None None notebook None None None None None 1 2024-09-18 22:36:32.657146+00:00
30 zdk3EYlDGUCm0000 None True Igm intestine classify IgD Iris intestine Iris... None None notebook None None None None None 1 2024-09-18 22:36:32.657540+00:00
42 JEHnSjWDIaDF0000 None True Intestine intestinal study Osteoblast classify... None None notebook None None None None None 1 2024-09-18 22:36:32.658294+00:00

Note

Currently, the LaminHub UI search is more powerful than the search of the lamindb open-source package.

Leverage relations

Django has a double-under-score syntax to filter based on related tables.

This syntax enables you to traverse several layers of relations and leverage different comparators.

ln.Artifact.filter(created_by__handle__startswith="testuse").df()  
Hide code cell output
uid version is_latest description key suffix type size hash n_objects n_observations _hash_type _accessor visibility _key_is_virtual storage_id transform_id run_id created_by_id updated_at
id
1 cAqWojeihNU2Keqb0000 None True My image None .jpg None 29358 r4tnqmKI_SjrkdLzpuWp4g None None md5 None 1 True 1 None None 1 2024-09-18 22:36:30.850389+00:00
2 1CXXf2SGj5NFJTgg0000 None True The iris collection None .parquet dataset 5629 VQGEXvcC5ZvBwMSm-3CHWg None None md5 DataFrame 1 True 1 None None 1 2024-09-18 22:36:30.992523+00:00
3 4OzS290TkVcQFbsf0000 None True My fastq None .fastq.gz None 20 hi7ZmAzz8sfMd3vIQr-57Q None None md5 None 1 True 1 None None 1 2024-09-18 22:36:31.000599+00:00

The filter selects all artifacts based on the users who ran the generating notebook.

Under the hood, in the SQL database, it’s joining the artifact table with the run and the user table.

Comparators

You can qualify the type of comparison in a query by using a comparator.

Below follows a list of the most import, but Django supports about two dozen field comparators field__comparator=value.

and

ln.Artifact.filter(suffix=".jpg", created_by=user).df()
Hide code cell output
uid version is_latest description key suffix type size hash n_objects n_observations _hash_type _accessor visibility _key_is_virtual storage_id transform_id run_id created_by_id updated_at
id
1 cAqWojeihNU2Keqb0000 None True My image None .jpg None 29358 r4tnqmKI_SjrkdLzpuWp4g None None md5 None 1 True 1 None None 1 2024-09-18 22:36:30.850389+00:00

less than/ greater than

Or subset to artifacts smaller than 10kB. Here, we can’t use keyword arguments, but need an explicit where statement.

ln.Artifact.filter(created_by=user, size__lt=1e4).df()
Hide code cell output
uid version is_latest description key suffix type size hash n_objects n_observations _hash_type _accessor visibility _key_is_virtual storage_id transform_id run_id created_by_id updated_at
id
2 1CXXf2SGj5NFJTgg0000 None True The iris collection None .parquet dataset 5629 VQGEXvcC5ZvBwMSm-3CHWg None None md5 DataFrame 1 True 1 None None 1 2024-09-18 22:36:30.992523+00:00
3 4OzS290TkVcQFbsf0000 None True My fastq None .fastq.gz None 20 hi7ZmAzz8sfMd3vIQr-57Q None None md5 None 1 True 1 None None 1 2024-09-18 22:36:31.000599+00:00

in

ln.Artifact.filter(suffix__in=[".jpg", ".fastq.gz"]).df()
Hide code cell output
uid version is_latest description key suffix type size hash n_objects n_observations _hash_type _accessor visibility _key_is_virtual storage_id transform_id run_id created_by_id updated_at
id
1 cAqWojeihNU2Keqb0000 None True My image None .jpg None 29358 r4tnqmKI_SjrkdLzpuWp4g None None md5 None 1 True 1 None None 1 2024-09-18 22:36:30.850389+00:00
3 4OzS290TkVcQFbsf0000 None True My fastq None .fastq.gz None 20 hi7ZmAzz8sfMd3vIQr-57Q None None md5 None 1 True 1 None None 1 2024-09-18 22:36:31.000599+00:00

order by

ln.Artifact.filter().order_by("-updated_at").df()
Hide code cell output
uid version is_latest description key suffix type size hash n_objects n_observations _hash_type _accessor visibility _key_is_virtual storage_id transform_id run_id created_by_id updated_at
id
3 4OzS290TkVcQFbsf0000 None True My fastq None .fastq.gz None 20 hi7ZmAzz8sfMd3vIQr-57Q None None md5 None 1 True 1 None None 1 2024-09-18 22:36:31.000599+00:00
2 1CXXf2SGj5NFJTgg0000 None True The iris collection None .parquet dataset 5629 VQGEXvcC5ZvBwMSm-3CHWg None None md5 DataFrame 1 True 1 None None 1 2024-09-18 22:36:30.992523+00:00
1 cAqWojeihNU2Keqb0000 None True My image None .jpg None 29358 r4tnqmKI_SjrkdLzpuWp4g None None md5 None 1 True 1 None None 1 2024-09-18 22:36:30.850389+00:00

contains

ln.Transform.filter(name__contains="search").df().head(5)
Hide code cell output
uid version is_latest name key description type source_code hash reference reference_type _source_code_artifact_id created_by_id updated_at
id
8 hybOOXiyQJtm0000 None True Research Pancreas Osteoblast IgD Bulbourethral... None None notebook None None None None None 1 2024-09-18 22:36:32.656098+00:00
11 1uACN5od220A0000 None True Research IgG classify. None None notebook None None None None None 1 2024-09-18 22:36:32.656283+00:00
25 tSKtnxmVFxEp0000 None True Igg2 Inner phalangeal cells of organ of Corti ... None None notebook None None None None None 1 2024-09-18 22:36:32.657146+00:00
28 lQiEJpUgc6Vv0000 None True Result Place cells result investigate IgG3 res... None None notebook None None None None None 1 2024-09-18 22:36:32.657397+00:00
33 qSQcridYChGw0000 None True Research IgA Retina Retina. None None notebook None None None None None 1 2024-09-18 22:36:32.657729+00:00

And case-insensitive:

ln.Transform.filter(name__icontains="Search").df().head(5)
Hide code cell output
uid version is_latest name key description type source_code hash reference reference_type _source_code_artifact_id created_by_id updated_at
id
8 hybOOXiyQJtm0000 None True Research Pancreas Osteoblast IgD Bulbourethral... None None notebook None None None None None 1 2024-09-18 22:36:32.656098+00:00
11 1uACN5od220A0000 None True Research IgG classify. None None notebook None None None None None 1 2024-09-18 22:36:32.656283+00:00
25 tSKtnxmVFxEp0000 None True Igg2 Inner phalangeal cells of organ of Corti ... None None notebook None None None None None 1 2024-09-18 22:36:32.657146+00:00
28 lQiEJpUgc6Vv0000 None True Result Place cells result investigate IgG3 res... None None notebook None None None None None 1 2024-09-18 22:36:32.657397+00:00
33 qSQcridYChGw0000 None True Research IgA Retina Retina. None None notebook None None None None None 1 2024-09-18 22:36:32.657729+00:00

startswith

ln.Transform.filter(name__startswith="Research").df()
Hide code cell output
uid version is_latest name key description type source_code hash reference reference_type _source_code_artifact_id created_by_id updated_at
id
8 hybOOXiyQJtm0000 None True Research Pancreas Osteoblast IgD Bulbourethral... None None notebook None None None None None 1 2024-09-18 22:36:32.656098+00:00
11 1uACN5od220A0000 None True Research IgG classify. None None notebook None None None None None 1 2024-09-18 22:36:32.656283+00:00
33 qSQcridYChGw0000 None True Research IgA Retina Retina. None None notebook None None None None None 1 2024-09-18 22:36:32.657729+00:00
60 yMgIwcrD1u8W0000 None True Research IgD study IgD study. None None notebook None None None None None 1 2024-09-18 22:36:32.659402+00:00
64 kmRZto0kexO00000 None True Research IgM investigate IgM Pancreatic stella... None None notebook None None None None None 1 2024-09-18 22:36:32.659641+00:00
147 h7hAdTBXXoUy0000 None True Research Satellite glial cells Nuclear chain c... None None notebook None None None None None 1 2024-09-18 22:36:32.670310+00:00
179 T7lEQ8MBGg0j0000 None True Research IgG2 Pancreatic stellate cell IgG2 cl... None None notebook None None None None None 1 2024-09-18 22:36:32.672213+00:00
186 NuzrPKjv2pA40000 None True Research Osteoblast intestine result IgG3 Inne... None None notebook None None None None None 1 2024-09-18 22:36:32.672621+00:00
189 Tidr2XJtsOnR0000 None True Research IgD IgM Bulbourethral gland Retina Ig... None None notebook None None None None None 1 2024-09-18 22:36:32.672795+00:00
190 4doUhIDSYuC70000 None True Research Principal cell Nasal cavity IgG2 IgG3... None None notebook None None None None None 1 2024-09-18 22:36:32.672854+00:00
212 EegdSpnUPhSZ0000 None True Research IgG4 IgA Nasal cavity result visualize. None None notebook None None None None None 1 2024-09-18 22:36:32.676677+00:00
230 8tZ7qPYo7llG0000 None True Research IgD IgG2. None None notebook None None None None None 1 2024-09-18 22:36:32.677736+00:00
233 JK4j9onz5zPk0000 None True Research intestinal Osteoblast IgA result. None None notebook None None None None None 1 2024-09-18 22:36:32.677911+00:00
281 3qhzCOaSdx0f0000 None True Research investigate IgM efficiency. None None notebook None None None None None 1 2024-09-18 22:36:32.683521+00:00
286 1exsNfe3d6rv0000 None True Research IgG3 study Place cells IgG2 IgG2 clas... None None notebook None None None None None 1 2024-09-18 22:36:32.683811+00:00
300 IzjHNwFov0xW0000 None True Research Nasal cavity IgG2 Place cells investi... None None notebook None None None None None 1 2024-09-18 22:36:32.684626+00:00
302 D8zaEavzurs20000 None True Research IgG2 Principal cell Iris IgM IgG4 Pla... None None notebook None None None None None 1 2024-09-18 22:36:32.684740+00:00
323 QlQn1bBSHSfg0000 None True Research Satellite glial cells result Von Ebne... None None notebook None None None None None 1 2024-09-18 22:36:32.685954+00:00
354 KJz3308FnTWc0000 None True Research Pancreas Osteoblast Satellite glial c... None None notebook None None None None None 1 2024-09-18 22:36:32.690321+00:00
359 Vl3NWAENmZla0000 None True Research Nasal cavity visualize Bulbourethral ... None None notebook None None None None None 1 2024-09-18 22:36:32.690615+00:00
420 wPdX4jvMRej60000 None True Research IgG3 IgG2 visualize IgY IgG3 study cl... None None notebook None None None None None 1 2024-09-18 22:36:32.696850+00:00
499 VASCVPu1Nrqf0000 None True Research IgD Von Ebner's gland IgM IgM Retina. None None notebook None None None None None 1 2024-09-18 22:36:32.704135+00:00

or

ln.Artifact.filter(ln.Q(suffix=".jpg") | ln.Q(suffix=".fastq.gz")).df()
Hide code cell output
uid version is_latest description key suffix type size hash n_objects n_observations _hash_type _accessor visibility _key_is_virtual storage_id transform_id run_id created_by_id updated_at
id
1 cAqWojeihNU2Keqb0000 None True My image None .jpg None 29358 r4tnqmKI_SjrkdLzpuWp4g None None md5 None 1 True 1 None None 1 2024-09-18 22:36:30.850389+00:00
3 4OzS290TkVcQFbsf0000 None True My fastq None .fastq.gz None 20 hi7ZmAzz8sfMd3vIQr-57Q None None md5 None 1 True 1 None None 1 2024-09-18 22:36:31.000599+00:00

negate/ unequal

ln.Artifact.filter(~ln.Q(suffix=".jpg")).df()
Hide code cell output
uid version is_latest description key suffix type size hash n_objects n_observations _hash_type _accessor visibility _key_is_virtual storage_id transform_id run_id created_by_id updated_at
id
2 1CXXf2SGj5NFJTgg0000 None True The iris collection None .parquet dataset 5629 VQGEXvcC5ZvBwMSm-3CHWg None None md5 DataFrame 1 True 1 None None 1 2024-09-18 22:36:30.992523+00:00
3 4OzS290TkVcQFbsf0000 None True My fastq None .fastq.gz None 20 hi7ZmAzz8sfMd3vIQr-57Q None None md5 None 1 True 1 None None 1 2024-09-18 22:36:31.000599+00:00

Clean up the test instance.

!rm -r ./test-registries
!lamin delete --force test-registries
Hide code cell output
• deleting instance testuser1/test-registries