Query & search registries¶
!lamin init --storage ./mydata
Show code cell output
💡 connected lamindb: testuser1/mydata
import lamindb as ln
# create toy data
ln.Artifact(ln.core.datasets.file_jpg_paradisi05(), description="My image").save()
ln.Artifact.from_df(ln.core.datasets.df_iris(), description="The iris collection").save()
ln.Artifact(ln.core.datasets.file_fastq(), description="My fastq").save()
Show code cell output
💡 connected lamindb: testuser1/mydata
❗ no run & transform get linked, consider calling ln.track()
❗ no run & transform get linked, consider calling ln.track()
❗ no run & transform get linked, consider calling ln.track()
Artifact(uid='2bex0nsJwfE7cZuk4L35', description='My fastq', suffix='.fastq.gz', type='dataset', size=20, hash='hi7ZmAzz8sfMd3vIQr-57Q', hash_type='md5', visibility=1, key_is_virtual=True, created_by_id=1, storage_id=1, updated_at='2024-07-26 14:36:28 UTC')
Look up metadata¶
For entities where we don’t store more than 100k records, a look up object can be a convenient way of selecting a record.
Consider the User
registry:
users = ln.User.lookup(field="handle")
With auto-complete, we find a user:
user = users.testuser1
user
User(uid='DzTjkKse', handle='testuser1', name='Test User1', updated_at='2024-07-26 14:36:26 UTC')
You can also get a dictionary, if you prefer:
users_dict = ln.User.lookup().dict()
Filter by metadata¶
Filter for all artifacts created by a user:
ln.Artifact.filter(created_by=user).df()
uid | version | description | key | suffix | type | accessor | size | hash | hash_type | n_objects | n_observations | visibility | key_is_virtual | storage_id | transform_id | run_id | created_by_id | updated_at | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | |||||||||||||||||||
1 | WuxBwVjs3WB10x63sZO8 | None | My image | None | .jpg | dataset | None | 29358 | r4tnqmKI_SjrkdLzpuWp4g | md5 | None | None | 1 | True | 1 | None | None | 1 | 2024-07-26 14:36:28.200400+00:00 |
2 | wOWcnezBDnpQ25r4pw0a | None | The iris collection | None | .parquet | dataset | DataFrame | 5629 | VQGEXvcC5ZvBwMSm-3CHWg | md5 | None | None | 1 | True | 1 | None | None | 1 | 2024-07-26 14:36:28.345288+00:00 |
3 | 2bex0nsJwfE7cZuk4L35 | None | My fastq | None | .fastq.gz | dataset | None | 20 | hi7ZmAzz8sfMd3vIQr-57Q | md5 | None | None | 1 | True | 1 | None | None | 1 | 2024-07-26 14:36:28.352415+00:00 |
To access the results encoded in a filter statement, execute its return value with one of:
.df()
: A pandasDataFrame
with each record in a row..all()
: AQuerySet
..one()
: Exactly one record. Will raise an error if there is none..one_or_none()
: Either one record orNone
if there is no query result.
Note
The ORMs in LaminDB are Django Models and any Django query works. LaminDB extends Django’s API for data scientists.
Under the hood, any .filter()
call translates into a SQL select statement.
.one()
and .one_or_none()
are two parts of LaminDB’s API that are borrowed from SQLAlchemy.
Search for metadata¶
ln.Artifact.search("iris").df()
Show code cell output
uid | version | description | key | suffix | type | accessor | size | hash | hash_type | n_objects | n_observations | visibility | key_is_virtual | storage_id | transform_id | run_id | created_by_id | updated_at | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | |||||||||||||||||||
2 | wOWcnezBDnpQ25r4pw0a | None | The iris collection | None | .parquet | dataset | DataFrame | 5629 | VQGEXvcC5ZvBwMSm-3CHWg | md5 | None | None | 1 | True | 1 | None | None | 1 | 2024-07-26 14:36:28.345288+00:00 |
Let us create 500 notebook objects with fake titles and save them:
transforms = [ln.Transform(name=title, type="notebook") for title in ln.core.datasets.fake_bio_notebook_titles(n=500)]
ln.save(transforms)
We can now search for any combination of terms:
ln.Transform.search("intestine").df().head()
Show code cell output
uid | version | name | key | description | type | reference | reference_type | latest_report_id | source_code_id | created_by_id | updated_at | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
id | ||||||||||||
2 | Pm30xcotCxXCmKQs | None | Igg3 intestine Platelets Osteoblast. | None | None | notebook | None | None | None | None | 1 | 2024-07-26 14:36:29.924893+00:00 |
35 | QVg4UlDYHL3BsiU4 | None | Intestine IgY IgG2 IgA IgE IgE. | None | None | notebook | None | None | None | None | 1 | 2024-07-26 14:36:29.930039+00:00 |
38 | knZnxSZebPYxtVWK | None | Intestine Diencephalon investigate Natural kil... | None | None | notebook | None | None | None | None | 1 | 2024-07-26 14:36:29.930499+00:00 |
59 | iDIRkJNMmn76Hsol | None | Research Liver lipocyte Natural killer T cell ... | None | None | notebook | None | None | None | None | 1 | 2024-07-26 14:36:29.933754+00:00 |
65 | KVc5Se8Ck31cEmSj | None | Intestine Betz cells Diencephalon visualize. | None | None | notebook | None | None | None | None | 1 | 2024-07-26 14:36:29.934679+00:00 |
Leverage relations¶
Django has a double-under-score syntax to filter based on related tables.
This syntax enables you to traverse several layers of relations:
ln.Artifact.filter(run__created_by__handle__startswith="testuse").df()
Show code cell output
uid | version | description | key | suffix | type | accessor | size | hash | hash_type | n_objects | n_observations | visibility | key_is_virtual | storage_id | transform_id | run_id | created_by_id | updated_at | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id |
The filter selects all artifacts based on the users who ran the generating notebook.
Under the hood, in the SQL database, it’s joining the artifact table with the run and the user table.
Beyond __startswith
, Django supports about two dozen field comparators field__comparator=value
. Below follow some of them.
and¶
ln.Artifact.filter(suffix=".jpg", created_by=user).df()
Show code cell output
uid | version | description | key | suffix | type | accessor | size | hash | hash_type | n_objects | n_observations | visibility | key_is_virtual | storage_id | transform_id | run_id | created_by_id | updated_at | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | |||||||||||||||||||
1 | WuxBwVjs3WB10x63sZO8 | None | My image | None | .jpg | dataset | None | 29358 | r4tnqmKI_SjrkdLzpuWp4g | md5 | None | None | 1 | True | 1 | None | None | 1 | 2024-07-26 14:36:28.200400+00:00 |
less than/ greater than¶
Or subset to artifacts smaller than 10kB. Here, we can’t use keyword arguments, but need an explicit where statement.
ln.Artifact.filter(created_by=user, size__lt=1e4).df()
Show code cell output
uid | version | description | key | suffix | type | accessor | size | hash | hash_type | n_objects | n_observations | visibility | key_is_virtual | storage_id | transform_id | run_id | created_by_id | updated_at | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | |||||||||||||||||||
2 | wOWcnezBDnpQ25r4pw0a | None | The iris collection | None | .parquet | dataset | DataFrame | 5629 | VQGEXvcC5ZvBwMSm-3CHWg | md5 | None | None | 1 | True | 1 | None | None | 1 | 2024-07-26 14:36:28.345288+00:00 |
3 | 2bex0nsJwfE7cZuk4L35 | None | My fastq | None | .fastq.gz | dataset | None | 20 | hi7ZmAzz8sfMd3vIQr-57Q | md5 | None | None | 1 | True | 1 | None | None | 1 | 2024-07-26 14:36:28.352415+00:00 |
or¶
ln.Artifact.filter().filter(ln.Q(suffix=".jpg") | ln.Q(suffix=".fastq.gz")).df()
Show code cell output
uid | version | description | key | suffix | type | accessor | size | hash | hash_type | n_objects | n_observations | visibility | key_is_virtual | storage_id | transform_id | run_id | created_by_id | updated_at | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | |||||||||||||||||||
1 | WuxBwVjs3WB10x63sZO8 | None | My image | None | .jpg | dataset | None | 29358 | r4tnqmKI_SjrkdLzpuWp4g | md5 | None | None | 1 | True | 1 | None | None | 1 | 2024-07-26 14:36:28.200400+00:00 |
3 | 2bex0nsJwfE7cZuk4L35 | None | My fastq | None | .fastq.gz | dataset | None | 20 | hi7ZmAzz8sfMd3vIQr-57Q | md5 | None | None | 1 | True | 1 | None | None | 1 | 2024-07-26 14:36:28.352415+00:00 |
in¶
ln.Artifact.filter(suffix__in=[".jpg", ".fastq.gz"]).df()
Show code cell output
uid | version | description | key | suffix | type | accessor | size | hash | hash_type | n_objects | n_observations | visibility | key_is_virtual | storage_id | transform_id | run_id | created_by_id | updated_at | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | |||||||||||||||||||
1 | WuxBwVjs3WB10x63sZO8 | None | My image | None | .jpg | dataset | None | 29358 | r4tnqmKI_SjrkdLzpuWp4g | md5 | None | None | 1 | True | 1 | None | None | 1 | 2024-07-26 14:36:28.200400+00:00 |
3 | 2bex0nsJwfE7cZuk4L35 | None | My fastq | None | .fastq.gz | dataset | None | 20 | hi7ZmAzz8sfMd3vIQr-57Q | md5 | None | None | 1 | True | 1 | None | None | 1 | 2024-07-26 14:36:28.352415+00:00 |
order by¶
ln.Artifact.filter().order_by("-updated_at").df()
Show code cell output
uid | version | description | key | suffix | type | accessor | size | hash | hash_type | n_objects | n_observations | visibility | key_is_virtual | storage_id | transform_id | run_id | created_by_id | updated_at | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | |||||||||||||||||||
3 | 2bex0nsJwfE7cZuk4L35 | None | My fastq | None | .fastq.gz | dataset | None | 20 | hi7ZmAzz8sfMd3vIQr-57Q | md5 | None | None | 1 | True | 1 | None | None | 1 | 2024-07-26 14:36:28.352415+00:00 |
2 | wOWcnezBDnpQ25r4pw0a | None | The iris collection | None | .parquet | dataset | DataFrame | 5629 | VQGEXvcC5ZvBwMSm-3CHWg | md5 | None | None | 1 | True | 1 | None | None | 1 | 2024-07-26 14:36:28.345288+00:00 |
1 | WuxBwVjs3WB10x63sZO8 | None | My image | None | .jpg | dataset | None | 29358 | r4tnqmKI_SjrkdLzpuWp4g | md5 | None | None | 1 | True | 1 | None | None | 1 | 2024-07-26 14:36:28.200400+00:00 |
contains¶
ln.Transform.filter(name__contains="search").df().head(10)
Show code cell output
uid | version | name | key | description | type | reference | reference_type | latest_report_id | source_code_id | created_by_id | updated_at | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
id | ||||||||||||
18 | wBs4xhDVLBWLz3zS | None | Research Inner hair cells Natural killer T cel... | None | None | notebook | None | None | None | None | 1 | 2024-07-26 14:36:29.927401+00:00 |
19 | jHj5dFxvmyHTpNSY | None | Liver Lipocyte research Type I Pneumocyte Betz... | None | None | notebook | None | None | None | None | 1 | 2024-07-26 14:36:29.927554+00:00 |
30 | l5XWwvzSZhZl6wbX | None | Igd Inner hair cells cluster research IgG Cili... | None | None | notebook | None | None | None | None | 1 | 2024-07-26 14:36:29.929265+00:00 |
31 | GKPG74uqzwitZkYY | None | Igg2 visualize IgY Inner hair cells research. | None | None | notebook | None | None | None | None | 1 | 2024-07-26 14:36:29.929423+00:00 |
36 | UTQaxsbdsdze1Yjj | None | Heart IgE IgE result research IgA. | None | None | notebook | None | None | None | None | 1 | 2024-07-26 14:36:29.930192+00:00 |
59 | iDIRkJNMmn76Hsol | None | Research Liver lipocyte Natural killer T cell ... | None | None | notebook | None | None | None | None | 1 | 2024-07-26 14:36:29.933754+00:00 |
73 | weQFFJx8ZDfRPgtX | None | Intestinal research investigate research Type ... | None | None | notebook | None | None | None | None | 1 | 2024-07-26 14:36:29.935908+00:00 |
83 | i4wDmckovdO2h3PN | None | Ige study IgD research result IgG1. | None | None | notebook | None | None | None | None | 1 | 2024-07-26 14:36:30.021650+00:00 |
86 | Mehg72TcMRmnA8IB | None | Igg IgG1 IgG research. | None | None | notebook | None | None | None | None | 1 | 2024-07-26 14:36:30.022092+00:00 |
107 | zrz3PbzGH8WboXB3 | None | Iga Natural killer T cell Inner hair cells res... | None | None | notebook | None | None | None | None | 1 | 2024-07-26 14:36:30.025196+00:00 |
And case-insensitive:
ln.Transform.filter(name__icontains="Search").df().head(10)
Show code cell output
uid | version | name | key | description | type | reference | reference_type | latest_report_id | source_code_id | created_by_id | updated_at | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
id | ||||||||||||
18 | wBs4xhDVLBWLz3zS | None | Research Inner hair cells Natural killer T cel... | None | None | notebook | None | None | None | None | 1 | 2024-07-26 14:36:29.927401+00:00 |
19 | jHj5dFxvmyHTpNSY | None | Liver Lipocyte research Type I Pneumocyte Betz... | None | None | notebook | None | None | None | None | 1 | 2024-07-26 14:36:29.927554+00:00 |
30 | l5XWwvzSZhZl6wbX | None | Igd Inner hair cells cluster research IgG Cili... | None | None | notebook | None | None | None | None | 1 | 2024-07-26 14:36:29.929265+00:00 |
31 | GKPG74uqzwitZkYY | None | Igg2 visualize IgY Inner hair cells research. | None | None | notebook | None | None | None | None | 1 | 2024-07-26 14:36:29.929423+00:00 |
36 | UTQaxsbdsdze1Yjj | None | Heart IgE IgE result research IgA. | None | None | notebook | None | None | None | None | 1 | 2024-07-26 14:36:29.930192+00:00 |
59 | iDIRkJNMmn76Hsol | None | Research Liver lipocyte Natural killer T cell ... | None | None | notebook | None | None | None | None | 1 | 2024-07-26 14:36:29.933754+00:00 |
73 | weQFFJx8ZDfRPgtX | None | Intestinal research investigate research Type ... | None | None | notebook | None | None | None | None | 1 | 2024-07-26 14:36:29.935908+00:00 |
83 | i4wDmckovdO2h3PN | None | Ige study IgD research result IgG1. | None | None | notebook | None | None | None | None | 1 | 2024-07-26 14:36:30.021650+00:00 |
86 | Mehg72TcMRmnA8IB | None | Igg IgG1 IgG research. | None | None | notebook | None | None | None | None | 1 | 2024-07-26 14:36:30.022092+00:00 |
107 | zrz3PbzGH8WboXB3 | None | Iga Natural killer T cell Inner hair cells res... | None | None | notebook | None | None | None | None | 1 | 2024-07-26 14:36:30.025196+00:00 |
startswith¶
ln.Transform.filter(name__startswith="Research").df()
Show code cell output
uid | version | name | key | description | type | reference | reference_type | latest_report_id | source_code_id | created_by_id | updated_at | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
id | ||||||||||||
18 | wBs4xhDVLBWLz3zS | None | Research Inner hair cells Natural killer T cel... | None | None | notebook | None | None | None | None | 1 | 2024-07-26 14:36:29.927401+00:00 |
59 | iDIRkJNMmn76Hsol | None | Research Liver lipocyte Natural killer T cell ... | None | None | notebook | None | None | None | None | 1 | 2024-07-26 14:36:29.933754+00:00 |
111 | jZ1f5dp2BfF2Rcrf | None | Research study Ligaments IgG2 IgG Betz cells. | None | None | notebook | None | None | None | None | 1 | 2024-07-26 14:36:30.025787+00:00 |
170 | kAFmz2a8uLT1zE0O | None | Research cluster classify Osteoblast investiga... | None | None | notebook | None | None | None | None | 1 | 2024-07-26 14:36:30.037218+00:00 |
248 | chgCie7zYLb2L465 | None | Research rank IgD Liver lipocyte Betz cells. | None | None | notebook | None | None | None | None | 1 | 2024-07-26 14:36:30.051316+00:00 |
286 | vxb3m7AtJk0XITWS | None | Research rank intestinal IgG3. | None | None | notebook | None | None | None | None | 1 | 2024-07-26 14:36:30.056891+00:00 |
294 | eCuX2RkZj426RQfF | None | Research Pancreatic acinar Inner hair cells in... | None | None | notebook | None | None | None | None | 1 | 2024-07-26 14:36:30.058077+00:00 |
408 | uiSmt0Cl4YrMqX00 | None | Research IgE IgD Ligaments IgG2 Betz cells. | None | None | notebook | None | None | None | None | 1 | 2024-07-26 14:36:30.080211+00:00 |
428 | VNa8fF1fLwWbB0L9 | None | Research IgE Liver lipocyte IgG3 intestine Typ... | None | None | notebook | None | None | None | None | 1 | 2024-07-26 14:36:30.083205+00:00 |
Show code cell content
# clean up test instance
!lamin delete --force mydata
!rm -r mydata
Traceback (most recent call last):
File "/opt/hostedtoolcache/Python/3.11.9/x64/bin/lamin", line 8, in <module>
sys.exit(main())
^^^^^^
File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/rich_click/rich_command.py", line 367, in __call__
return super().__call__(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/click/core.py", line 1157, in __call__
return self.main(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/rich_click/rich_command.py", line 152, in main
rv = self.invoke(ctx)
^^^^^^^^^^^^^^^^
File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/lamin_cli/__main__.py", line 105, in delete
return delete(instance, force=force)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/lamindb_setup/_delete.py", line 98, in delete
n_objects = check_storage_is_empty(
^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/lamindb_setup/core/upath.py", line 779, in check_storage_is_empty
raise InstanceNotEmpty(message)
lamindb_setup.core.upath.InstanceNotEmpty: Storage /home/runner/work/lamindb/lamindb/docs/mydata/.lamindb contains 3 objects ('_is_initialized' ignored) - delete them prior to deleting the instance
['/home/runner/work/lamindb/lamindb/docs/mydata/.lamindb/2bex0nsJwfE7cZuk4L35.fastq.gz', '/home/runner/work/lamindb/lamindb/docs/mydata/.lamindb/WuxBwVjs3WB10x63sZO8.jpg', '/home/runner/work/lamindb/lamindb/docs/mydata/.lamindb/_is_initialized', '/home/runner/work/lamindb/lamindb/docs/mydata/.lamindb/wOWcnezBDnpQ25r4pw0a.parquet']