Query & search registries¶
This guide walks through all the ways of finding metadata records in LaminDB registries.
# !pip install lamindb
!lamin init --storage ./test-registries
Show code cell output
→ connected lamindb: testuser1/test-registries
We’ll need some toy data.
import lamindb as ln
# create toy data
ln.Artifact(ln.core.datasets.file_jpg_paradisi05(), description="My image").save()
ln.Artifact.from_df(ln.core.datasets.df_iris(), description="The iris collection").save()
ln.Artifact(ln.core.datasets.file_fastq(), description="My fastq").save()
# see the content of the artifact registry
ln.Artifact.df()
Show code cell output
→ connected lamindb: testuser1/test-registries
! no run & transform got linked, call `ln.context.track()` & re-run`
! no run & transform got linked, call `ln.context.track()` & re-run`
! no run & transform got linked, call `ln.context.track()` & re-run`
uid | version | is_latest | description | key | suffix | type | size | hash | n_objects | n_observations | _hash_type | _accessor | visibility | _key_is_virtual | storage_id | transform_id | run_id | created_by_id | updated_at | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | ||||||||||||||||||||
3 | 4OzS290TkVcQFbsf0000 | None | True | My fastq | None | .fastq.gz | None | 20 | hi7ZmAzz8sfMd3vIQr-57Q | None | None | md5 | None | 1 | True | 1 | None | None | 1 | 2024-09-18 22:36:31.000599+00:00 |
2 | 1CXXf2SGj5NFJTgg0000 | None | True | The iris collection | None | .parquet | dataset | 5629 | VQGEXvcC5ZvBwMSm-3CHWg | None | None | md5 | DataFrame | 1 | True | 1 | None | None | 1 | 2024-09-18 22:36:30.992523+00:00 |
1 | cAqWojeihNU2Keqb0000 | None | True | My image | None | .jpg | None | 29358 | r4tnqmKI_SjrkdLzpuWp4g | None | None | md5 | None | 1 | True | 1 | None | None | 1 | 2024-09-18 22:36:30.850389+00:00 |
Look up metadata¶
For registries with less than 100k records, auto-completing a Lookup
object is the most convenient way of finding a record.
For example, take the User
registry:
# query the database for all users, optionally pass the field that creates the key
users = ln.User.lookup(field="handle")
# the lookup object is a NamedTuple
users
Show code cell output
Lookup(testuser1=User(uid='DzTjkKse', handle='testuser1', name='Test User1', updated_at='2024-09-18 22:36:28 UTC'), dict=<bound method Lookup.dict of <lamin_utils._lookup.Lookup object at 0x7f28b8977490>>)
With auto-complete, we find a specific user record:
user = users.testuser1
user
Show code cell output
User(uid='DzTjkKse', handle='testuser1', name='Test User1', updated_at='2024-09-18 22:36:28 UTC')
You can also get a dictionary:
users_dict = ln.User.lookup().dict()
users_dict
Show code cell output
{'testuser1': User(uid='DzTjkKse', handle='testuser1', name='Test User1', updated_at='2024-09-18 22:36:28 UTC')}
Query exactly one record¶
get
errors if more than one matching records are found.
# by the universal base62 uid
ln.User.get("DzTjkKse")
# by any expression involving fields
ln.User.get(handle="testuser1")
Show code cell output
User(uid='DzTjkKse', handle='testuser1', name='Test User1', updated_at='2024-09-18 22:36:28 UTC')
Query sets of records¶
Filter for all artifacts created by a user:
ln.Artifact.filter(created_by=user).df()
Show code cell output
uid | version | is_latest | description | key | suffix | type | size | hash | n_objects | n_observations | _hash_type | _accessor | visibility | _key_is_virtual | storage_id | transform_id | run_id | created_by_id | updated_at | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | ||||||||||||||||||||
1 | cAqWojeihNU2Keqb0000 | None | True | My image | None | .jpg | None | 29358 | r4tnqmKI_SjrkdLzpuWp4g | None | None | md5 | None | 1 | True | 1 | None | None | 1 | 2024-09-18 22:36:30.850389+00:00 |
2 | 1CXXf2SGj5NFJTgg0000 | None | True | The iris collection | None | .parquet | dataset | 5629 | VQGEXvcC5ZvBwMSm-3CHWg | None | None | md5 | DataFrame | 1 | True | 1 | None | None | 1 | 2024-09-18 22:36:30.992523+00:00 |
3 | 4OzS290TkVcQFbsf0000 | None | True | My fastq | None | .fastq.gz | None | 20 | hi7ZmAzz8sfMd3vIQr-57Q | None | None | md5 | None | 1 | True | 1 | None | None | 1 | 2024-09-18 22:36:31.000599+00:00 |
To access the results encoded in a filter statement, execute its return value with one of:
.df()
: A pandasDataFrame
with each record in a row..all()
: AQuerySet
..one()
: Exactly one record. Will raise an error if there is none. Is equivalent to the.get()
method shown above..one_or_none()
: Either one record orNone
if there is no query result.
Note
The ORMs in LaminDB are Django Models and any Django query works. LaminDB extends Django’s API for data scientists.
Under the hood, any .filter()
call translates into a SQL select statement.
.one()
and .one_or_none()
are two parts of LaminDB’s API that are borrowed from SQLAlchemy.
Search for records¶
Search the toy data:
ln.Artifact.search("iris").df()
Show code cell output
uid | version | is_latest | description | key | suffix | type | size | hash | n_objects | n_observations | _hash_type | _accessor | visibility | _key_is_virtual | storage_id | transform_id | run_id | created_by_id | updated_at | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | ||||||||||||||||||||
2 | 1CXXf2SGj5NFJTgg0000 | None | True | The iris collection | None | .parquet | dataset | 5629 | VQGEXvcC5ZvBwMSm-3CHWg | None | None | md5 | DataFrame | 1 | True | 1 | None | None | 1 | 2024-09-18 22:36:30.992523+00:00 |
Let us create 500 notebook objects with fake titles, save, and search them:
transforms = [ln.Transform(name=title, type="notebook") for title in ln.core.datasets.fake_bio_notebook_titles(n=500)]
ln.save(transforms)
# search
ln.Transform.search("intestine").df().head(5)
Show code cell output
uid | version | is_latest | name | key | description | type | source_code | hash | reference | reference_type | _source_code_artifact_id | created_by_id | updated_at | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | ||||||||||||||
3 | 35DTgF8ohdJu0000 | None | True | Igg2 IgG2 study intestinal intestine. | None | None | notebook | None | None | None | None | None | 1 | 2024-09-18 22:36:32.655770+00:00 |
20 | hyoq5DaSwOpR0000 | None | True | Retina Lungs Nasal cavity intestine Satellite ... | None | None | notebook | None | None | None | None | None | 1 | 2024-09-18 22:36:32.656829+00:00 |
25 | tSKtnxmVFxEp0000 | None | True | Igg2 Inner phalangeal cells of organ of Corti ... | None | None | notebook | None | None | None | None | None | 1 | 2024-09-18 22:36:32.657146+00:00 |
30 | zdk3EYlDGUCm0000 | None | True | Igm intestine classify IgD Iris intestine Iris... | None | None | notebook | None | None | None | None | None | 1 | 2024-09-18 22:36:32.657540+00:00 |
42 | JEHnSjWDIaDF0000 | None | True | Intestine intestinal study Osteoblast classify... | None | None | notebook | None | None | None | None | None | 1 | 2024-09-18 22:36:32.658294+00:00 |
Note
Currently, the LaminHub UI search is more powerful than the search of the lamindb
open-source package.
Leverage relations¶
Django has a double-under-score syntax to filter based on related tables.
This syntax enables you to traverse several layers of relations and leverage different comparators.
ln.Artifact.filter(created_by__handle__startswith="testuse").df()
Show code cell output
uid | version | is_latest | description | key | suffix | type | size | hash | n_objects | n_observations | _hash_type | _accessor | visibility | _key_is_virtual | storage_id | transform_id | run_id | created_by_id | updated_at | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | ||||||||||||||||||||
1 | cAqWojeihNU2Keqb0000 | None | True | My image | None | .jpg | None | 29358 | r4tnqmKI_SjrkdLzpuWp4g | None | None | md5 | None | 1 | True | 1 | None | None | 1 | 2024-09-18 22:36:30.850389+00:00 |
2 | 1CXXf2SGj5NFJTgg0000 | None | True | The iris collection | None | .parquet | dataset | 5629 | VQGEXvcC5ZvBwMSm-3CHWg | None | None | md5 | DataFrame | 1 | True | 1 | None | None | 1 | 2024-09-18 22:36:30.992523+00:00 |
3 | 4OzS290TkVcQFbsf0000 | None | True | My fastq | None | .fastq.gz | None | 20 | hi7ZmAzz8sfMd3vIQr-57Q | None | None | md5 | None | 1 | True | 1 | None | None | 1 | 2024-09-18 22:36:31.000599+00:00 |
The filter selects all artifacts based on the users who ran the generating notebook.
Under the hood, in the SQL database, it’s joining the artifact table with the run and the user table.
Comparators¶
You can qualify the type of comparison in a query by using a comparator.
Below follows a list of the most import, but Django supports about two dozen field comparators field__comparator=value
.
and¶
ln.Artifact.filter(suffix=".jpg", created_by=user).df()
Show code cell output
uid | version | is_latest | description | key | suffix | type | size | hash | n_objects | n_observations | _hash_type | _accessor | visibility | _key_is_virtual | storage_id | transform_id | run_id | created_by_id | updated_at | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | ||||||||||||||||||||
1 | cAqWojeihNU2Keqb0000 | None | True | My image | None | .jpg | None | 29358 | r4tnqmKI_SjrkdLzpuWp4g | None | None | md5 | None | 1 | True | 1 | None | None | 1 | 2024-09-18 22:36:30.850389+00:00 |
less than/ greater than¶
Or subset to artifacts smaller than 10kB. Here, we can’t use keyword arguments, but need an explicit where statement.
ln.Artifact.filter(created_by=user, size__lt=1e4).df()
Show code cell output
uid | version | is_latest | description | key | suffix | type | size | hash | n_objects | n_observations | _hash_type | _accessor | visibility | _key_is_virtual | storage_id | transform_id | run_id | created_by_id | updated_at | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | ||||||||||||||||||||
2 | 1CXXf2SGj5NFJTgg0000 | None | True | The iris collection | None | .parquet | dataset | 5629 | VQGEXvcC5ZvBwMSm-3CHWg | None | None | md5 | DataFrame | 1 | True | 1 | None | None | 1 | 2024-09-18 22:36:30.992523+00:00 |
3 | 4OzS290TkVcQFbsf0000 | None | True | My fastq | None | .fastq.gz | None | 20 | hi7ZmAzz8sfMd3vIQr-57Q | None | None | md5 | None | 1 | True | 1 | None | None | 1 | 2024-09-18 22:36:31.000599+00:00 |
in¶
ln.Artifact.filter(suffix__in=[".jpg", ".fastq.gz"]).df()
Show code cell output
uid | version | is_latest | description | key | suffix | type | size | hash | n_objects | n_observations | _hash_type | _accessor | visibility | _key_is_virtual | storage_id | transform_id | run_id | created_by_id | updated_at | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | ||||||||||||||||||||
1 | cAqWojeihNU2Keqb0000 | None | True | My image | None | .jpg | None | 29358 | r4tnqmKI_SjrkdLzpuWp4g | None | None | md5 | None | 1 | True | 1 | None | None | 1 | 2024-09-18 22:36:30.850389+00:00 |
3 | 4OzS290TkVcQFbsf0000 | None | True | My fastq | None | .fastq.gz | None | 20 | hi7ZmAzz8sfMd3vIQr-57Q | None | None | md5 | None | 1 | True | 1 | None | None | 1 | 2024-09-18 22:36:31.000599+00:00 |
order by¶
ln.Artifact.filter().order_by("-updated_at").df()
Show code cell output
uid | version | is_latest | description | key | suffix | type | size | hash | n_objects | n_observations | _hash_type | _accessor | visibility | _key_is_virtual | storage_id | transform_id | run_id | created_by_id | updated_at | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | ||||||||||||||||||||
3 | 4OzS290TkVcQFbsf0000 | None | True | My fastq | None | .fastq.gz | None | 20 | hi7ZmAzz8sfMd3vIQr-57Q | None | None | md5 | None | 1 | True | 1 | None | None | 1 | 2024-09-18 22:36:31.000599+00:00 |
2 | 1CXXf2SGj5NFJTgg0000 | None | True | The iris collection | None | .parquet | dataset | 5629 | VQGEXvcC5ZvBwMSm-3CHWg | None | None | md5 | DataFrame | 1 | True | 1 | None | None | 1 | 2024-09-18 22:36:30.992523+00:00 |
1 | cAqWojeihNU2Keqb0000 | None | True | My image | None | .jpg | None | 29358 | r4tnqmKI_SjrkdLzpuWp4g | None | None | md5 | None | 1 | True | 1 | None | None | 1 | 2024-09-18 22:36:30.850389+00:00 |
contains¶
ln.Transform.filter(name__contains="search").df().head(5)
Show code cell output
uid | version | is_latest | name | key | description | type | source_code | hash | reference | reference_type | _source_code_artifact_id | created_by_id | updated_at | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | ||||||||||||||
8 | hybOOXiyQJtm0000 | None | True | Research Pancreas Osteoblast IgD Bulbourethral... | None | None | notebook | None | None | None | None | None | 1 | 2024-09-18 22:36:32.656098+00:00 |
11 | 1uACN5od220A0000 | None | True | Research IgG classify. | None | None | notebook | None | None | None | None | None | 1 | 2024-09-18 22:36:32.656283+00:00 |
25 | tSKtnxmVFxEp0000 | None | True | Igg2 Inner phalangeal cells of organ of Corti ... | None | None | notebook | None | None | None | None | None | 1 | 2024-09-18 22:36:32.657146+00:00 |
28 | lQiEJpUgc6Vv0000 | None | True | Result Place cells result investigate IgG3 res... | None | None | notebook | None | None | None | None | None | 1 | 2024-09-18 22:36:32.657397+00:00 |
33 | qSQcridYChGw0000 | None | True | Research IgA Retina Retina. | None | None | notebook | None | None | None | None | None | 1 | 2024-09-18 22:36:32.657729+00:00 |
And case-insensitive:
ln.Transform.filter(name__icontains="Search").df().head(5)
Show code cell output
uid | version | is_latest | name | key | description | type | source_code | hash | reference | reference_type | _source_code_artifact_id | created_by_id | updated_at | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | ||||||||||||||
8 | hybOOXiyQJtm0000 | None | True | Research Pancreas Osteoblast IgD Bulbourethral... | None | None | notebook | None | None | None | None | None | 1 | 2024-09-18 22:36:32.656098+00:00 |
11 | 1uACN5od220A0000 | None | True | Research IgG classify. | None | None | notebook | None | None | None | None | None | 1 | 2024-09-18 22:36:32.656283+00:00 |
25 | tSKtnxmVFxEp0000 | None | True | Igg2 Inner phalangeal cells of organ of Corti ... | None | None | notebook | None | None | None | None | None | 1 | 2024-09-18 22:36:32.657146+00:00 |
28 | lQiEJpUgc6Vv0000 | None | True | Result Place cells result investigate IgG3 res... | None | None | notebook | None | None | None | None | None | 1 | 2024-09-18 22:36:32.657397+00:00 |
33 | qSQcridYChGw0000 | None | True | Research IgA Retina Retina. | None | None | notebook | None | None | None | None | None | 1 | 2024-09-18 22:36:32.657729+00:00 |
startswith¶
ln.Transform.filter(name__startswith="Research").df()
Show code cell output
uid | version | is_latest | name | key | description | type | source_code | hash | reference | reference_type | _source_code_artifact_id | created_by_id | updated_at | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | ||||||||||||||
8 | hybOOXiyQJtm0000 | None | True | Research Pancreas Osteoblast IgD Bulbourethral... | None | None | notebook | None | None | None | None | None | 1 | 2024-09-18 22:36:32.656098+00:00 |
11 | 1uACN5od220A0000 | None | True | Research IgG classify. | None | None | notebook | None | None | None | None | None | 1 | 2024-09-18 22:36:32.656283+00:00 |
33 | qSQcridYChGw0000 | None | True | Research IgA Retina Retina. | None | None | notebook | None | None | None | None | None | 1 | 2024-09-18 22:36:32.657729+00:00 |
60 | yMgIwcrD1u8W0000 | None | True | Research IgD study IgD study. | None | None | notebook | None | None | None | None | None | 1 | 2024-09-18 22:36:32.659402+00:00 |
64 | kmRZto0kexO00000 | None | True | Research IgM investigate IgM Pancreatic stella... | None | None | notebook | None | None | None | None | None | 1 | 2024-09-18 22:36:32.659641+00:00 |
147 | h7hAdTBXXoUy0000 | None | True | Research Satellite glial cells Nuclear chain c... | None | None | notebook | None | None | None | None | None | 1 | 2024-09-18 22:36:32.670310+00:00 |
179 | T7lEQ8MBGg0j0000 | None | True | Research IgG2 Pancreatic stellate cell IgG2 cl... | None | None | notebook | None | None | None | None | None | 1 | 2024-09-18 22:36:32.672213+00:00 |
186 | NuzrPKjv2pA40000 | None | True | Research Osteoblast intestine result IgG3 Inne... | None | None | notebook | None | None | None | None | None | 1 | 2024-09-18 22:36:32.672621+00:00 |
189 | Tidr2XJtsOnR0000 | None | True | Research IgD IgM Bulbourethral gland Retina Ig... | None | None | notebook | None | None | None | None | None | 1 | 2024-09-18 22:36:32.672795+00:00 |
190 | 4doUhIDSYuC70000 | None | True | Research Principal cell Nasal cavity IgG2 IgG3... | None | None | notebook | None | None | None | None | None | 1 | 2024-09-18 22:36:32.672854+00:00 |
212 | EegdSpnUPhSZ0000 | None | True | Research IgG4 IgA Nasal cavity result visualize. | None | None | notebook | None | None | None | None | None | 1 | 2024-09-18 22:36:32.676677+00:00 |
230 | 8tZ7qPYo7llG0000 | None | True | Research IgD IgG2. | None | None | notebook | None | None | None | None | None | 1 | 2024-09-18 22:36:32.677736+00:00 |
233 | JK4j9onz5zPk0000 | None | True | Research intestinal Osteoblast IgA result. | None | None | notebook | None | None | None | None | None | 1 | 2024-09-18 22:36:32.677911+00:00 |
281 | 3qhzCOaSdx0f0000 | None | True | Research investigate IgM efficiency. | None | None | notebook | None | None | None | None | None | 1 | 2024-09-18 22:36:32.683521+00:00 |
286 | 1exsNfe3d6rv0000 | None | True | Research IgG3 study Place cells IgG2 IgG2 clas... | None | None | notebook | None | None | None | None | None | 1 | 2024-09-18 22:36:32.683811+00:00 |
300 | IzjHNwFov0xW0000 | None | True | Research Nasal cavity IgG2 Place cells investi... | None | None | notebook | None | None | None | None | None | 1 | 2024-09-18 22:36:32.684626+00:00 |
302 | D8zaEavzurs20000 | None | True | Research IgG2 Principal cell Iris IgM IgG4 Pla... | None | None | notebook | None | None | None | None | None | 1 | 2024-09-18 22:36:32.684740+00:00 |
323 | QlQn1bBSHSfg0000 | None | True | Research Satellite glial cells result Von Ebne... | None | None | notebook | None | None | None | None | None | 1 | 2024-09-18 22:36:32.685954+00:00 |
354 | KJz3308FnTWc0000 | None | True | Research Pancreas Osteoblast Satellite glial c... | None | None | notebook | None | None | None | None | None | 1 | 2024-09-18 22:36:32.690321+00:00 |
359 | Vl3NWAENmZla0000 | None | True | Research Nasal cavity visualize Bulbourethral ... | None | None | notebook | None | None | None | None | None | 1 | 2024-09-18 22:36:32.690615+00:00 |
420 | wPdX4jvMRej60000 | None | True | Research IgG3 IgG2 visualize IgY IgG3 study cl... | None | None | notebook | None | None | None | None | None | 1 | 2024-09-18 22:36:32.696850+00:00 |
499 | VASCVPu1Nrqf0000 | None | True | Research IgD Von Ebner's gland IgM IgM Retina. | None | None | notebook | None | None | None | None | None | 1 | 2024-09-18 22:36:32.704135+00:00 |
or¶
ln.Artifact.filter(ln.Q(suffix=".jpg") | ln.Q(suffix=".fastq.gz")).df()
Show code cell output
uid | version | is_latest | description | key | suffix | type | size | hash | n_objects | n_observations | _hash_type | _accessor | visibility | _key_is_virtual | storage_id | transform_id | run_id | created_by_id | updated_at | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | ||||||||||||||||||||
1 | cAqWojeihNU2Keqb0000 | None | True | My image | None | .jpg | None | 29358 | r4tnqmKI_SjrkdLzpuWp4g | None | None | md5 | None | 1 | True | 1 | None | None | 1 | 2024-09-18 22:36:30.850389+00:00 |
3 | 4OzS290TkVcQFbsf0000 | None | True | My fastq | None | .fastq.gz | None | 20 | hi7ZmAzz8sfMd3vIQr-57Q | None | None | md5 | None | 1 | True | 1 | None | None | 1 | 2024-09-18 22:36:31.000599+00:00 |
negate/ unequal¶
ln.Artifact.filter(~ln.Q(suffix=".jpg")).df()
Show code cell output
uid | version | is_latest | description | key | suffix | type | size | hash | n_objects | n_observations | _hash_type | _accessor | visibility | _key_is_virtual | storage_id | transform_id | run_id | created_by_id | updated_at | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | ||||||||||||||||||||
2 | 1CXXf2SGj5NFJTgg0000 | None | True | The iris collection | None | .parquet | dataset | 5629 | VQGEXvcC5ZvBwMSm-3CHWg | None | None | md5 | DataFrame | 1 | True | 1 | None | None | 1 | 2024-09-18 22:36:30.992523+00:00 |
3 | 4OzS290TkVcQFbsf0000 | None | True | My fastq | None | .fastq.gz | None | 20 | hi7ZmAzz8sfMd3vIQr-57Q | None | None | md5 | None | 1 | True | 1 | None | None | 1 | 2024-09-18 22:36:31.000599+00:00 |
Clean up the test instance.
!rm -r ./test-registries
!lamin delete --force test-registries
Show code cell output
• deleting instance testuser1/test-registries