Query & search registries

!lamin init --storage ./mydata
Hide code cell output
💡 connected lamindb: testuser1/mydata
import lamindb as ln

# create toy data
ln.Artifact(ln.core.datasets.file_jpg_paradisi05(), description="My image").save()
ln.Artifact.from_df(ln.core.datasets.df_iris(), description="The iris collection").save()
ln.Artifact(ln.core.datasets.file_fastq(), description="My fastq").save()
Hide code cell output
💡 connected lamindb: testuser1/mydata
❗ no run & transform get linked, consider calling ln.track()
❗ no run & transform get linked, consider calling ln.track()
❗ no run & transform get linked, consider calling ln.track()
Artifact(uid='2bex0nsJwfE7cZuk4L35', description='My fastq', suffix='.fastq.gz', type='dataset', size=20, hash='hi7ZmAzz8sfMd3vIQr-57Q', hash_type='md5', visibility=1, key_is_virtual=True, created_by_id=1, storage_id=1, updated_at='2024-07-26 14:36:28 UTC')

Look up metadata

For entities where we don’t store more than 100k records, a look up object can be a convenient way of selecting a record.

Consider the User registry:

users = ln.User.lookup(field="handle")

With auto-complete, we find a user:

user = users.testuser1
user
User(uid='DzTjkKse', handle='testuser1', name='Test User1', updated_at='2024-07-26 14:36:26 UTC')

You can also get a dictionary, if you prefer:

users_dict = ln.User.lookup().dict()

Filter by metadata

Filter for all artifacts created by a user:

ln.Artifact.filter(created_by=user).df()
uid version description key suffix type accessor size hash hash_type n_objects n_observations visibility key_is_virtual storage_id transform_id run_id created_by_id updated_at
id
1 WuxBwVjs3WB10x63sZO8 None My image None .jpg dataset None 29358 r4tnqmKI_SjrkdLzpuWp4g md5 None None 1 True 1 None None 1 2024-07-26 14:36:28.200400+00:00
2 wOWcnezBDnpQ25r4pw0a None The iris collection None .parquet dataset DataFrame 5629 VQGEXvcC5ZvBwMSm-3CHWg md5 None None 1 True 1 None None 1 2024-07-26 14:36:28.345288+00:00
3 2bex0nsJwfE7cZuk4L35 None My fastq None .fastq.gz dataset None 20 hi7ZmAzz8sfMd3vIQr-57Q md5 None None 1 True 1 None None 1 2024-07-26 14:36:28.352415+00:00

To access the results encoded in a filter statement, execute its return value with one of:

  • .df(): A pandas DataFrame with each record in a row.

  • .all(): A QuerySet.

  • .one(): Exactly one record. Will raise an error if there is none.

  • .one_or_none(): Either one record or None if there is no query result.

Note

filter() returns a QuerySet.

The ORMs in LaminDB are Django Models and any Django query works. LaminDB extends Django’s API for data scientists.

Under the hood, any .filter() call translates into a SQL select statement.

.one() and .one_or_none() are two parts of LaminDB’s API that are borrowed from SQLAlchemy.

Search for metadata

ln.Artifact.search("iris").df()
Hide code cell output
uid version description key suffix type accessor size hash hash_type n_objects n_observations visibility key_is_virtual storage_id transform_id run_id created_by_id updated_at
id
2 wOWcnezBDnpQ25r4pw0a None The iris collection None .parquet dataset DataFrame 5629 VQGEXvcC5ZvBwMSm-3CHWg md5 None None 1 True 1 None None 1 2024-07-26 14:36:28.345288+00:00

Let us create 500 notebook objects with fake titles and save them:

transforms = [ln.Transform(name=title, type="notebook") for title in ln.core.datasets.fake_bio_notebook_titles(n=500)]
ln.save(transforms)

We can now search for any combination of terms:

ln.Transform.search("intestine").df().head()
Hide code cell output
uid version name key description type reference reference_type latest_report_id source_code_id created_by_id updated_at
id
2 Pm30xcotCxXCmKQs None Igg3 intestine Platelets Osteoblast. None None notebook None None None None 1 2024-07-26 14:36:29.924893+00:00
35 QVg4UlDYHL3BsiU4 None Intestine IgY IgG2 IgA IgE IgE. None None notebook None None None None 1 2024-07-26 14:36:29.930039+00:00
38 knZnxSZebPYxtVWK None Intestine Diencephalon investigate Natural kil... None None notebook None None None None 1 2024-07-26 14:36:29.930499+00:00
59 iDIRkJNMmn76Hsol None Research Liver lipocyte Natural killer T cell ... None None notebook None None None None 1 2024-07-26 14:36:29.933754+00:00
65 KVc5Se8Ck31cEmSj None Intestine Betz cells Diencephalon visualize. None None notebook None None None None 1 2024-07-26 14:36:29.934679+00:00

Leverage relations

Django has a double-under-score syntax to filter based on related tables.

This syntax enables you to traverse several layers of relations:

ln.Artifact.filter(run__created_by__handle__startswith="testuse").df()  
Hide code cell output
uid version description key suffix type accessor size hash hash_type n_objects n_observations visibility key_is_virtual storage_id transform_id run_id created_by_id updated_at
id

The filter selects all artifacts based on the users who ran the generating notebook.

Under the hood, in the SQL database, it’s joining the artifact table with the run and the user table.

Beyond __startswith, Django supports about two dozen field comparators field__comparator=value. Below follow some of them.

and

ln.Artifact.filter(suffix=".jpg", created_by=user).df()
Hide code cell output
uid version description key suffix type accessor size hash hash_type n_objects n_observations visibility key_is_virtual storage_id transform_id run_id created_by_id updated_at
id
1 WuxBwVjs3WB10x63sZO8 None My image None .jpg dataset None 29358 r4tnqmKI_SjrkdLzpuWp4g md5 None None 1 True 1 None None 1 2024-07-26 14:36:28.200400+00:00

less than/ greater than

Or subset to artifacts smaller than 10kB. Here, we can’t use keyword arguments, but need an explicit where statement.

ln.Artifact.filter(created_by=user, size__lt=1e4).df()
Hide code cell output
uid version description key suffix type accessor size hash hash_type n_objects n_observations visibility key_is_virtual storage_id transform_id run_id created_by_id updated_at
id
2 wOWcnezBDnpQ25r4pw0a None The iris collection None .parquet dataset DataFrame 5629 VQGEXvcC5ZvBwMSm-3CHWg md5 None None 1 True 1 None None 1 2024-07-26 14:36:28.345288+00:00
3 2bex0nsJwfE7cZuk4L35 None My fastq None .fastq.gz dataset None 20 hi7ZmAzz8sfMd3vIQr-57Q md5 None None 1 True 1 None None 1 2024-07-26 14:36:28.352415+00:00

or

ln.Artifact.filter().filter(ln.Q(suffix=".jpg") | ln.Q(suffix=".fastq.gz")).df()
Hide code cell output
uid version description key suffix type accessor size hash hash_type n_objects n_observations visibility key_is_virtual storage_id transform_id run_id created_by_id updated_at
id
1 WuxBwVjs3WB10x63sZO8 None My image None .jpg dataset None 29358 r4tnqmKI_SjrkdLzpuWp4g md5 None None 1 True 1 None None 1 2024-07-26 14:36:28.200400+00:00
3 2bex0nsJwfE7cZuk4L35 None My fastq None .fastq.gz dataset None 20 hi7ZmAzz8sfMd3vIQr-57Q md5 None None 1 True 1 None None 1 2024-07-26 14:36:28.352415+00:00

in

ln.Artifact.filter(suffix__in=[".jpg", ".fastq.gz"]).df()
Hide code cell output
uid version description key suffix type accessor size hash hash_type n_objects n_observations visibility key_is_virtual storage_id transform_id run_id created_by_id updated_at
id
1 WuxBwVjs3WB10x63sZO8 None My image None .jpg dataset None 29358 r4tnqmKI_SjrkdLzpuWp4g md5 None None 1 True 1 None None 1 2024-07-26 14:36:28.200400+00:00
3 2bex0nsJwfE7cZuk4L35 None My fastq None .fastq.gz dataset None 20 hi7ZmAzz8sfMd3vIQr-57Q md5 None None 1 True 1 None None 1 2024-07-26 14:36:28.352415+00:00

order by

ln.Artifact.filter().order_by("-updated_at").df()
Hide code cell output
uid version description key suffix type accessor size hash hash_type n_objects n_observations visibility key_is_virtual storage_id transform_id run_id created_by_id updated_at
id
3 2bex0nsJwfE7cZuk4L35 None My fastq None .fastq.gz dataset None 20 hi7ZmAzz8sfMd3vIQr-57Q md5 None None 1 True 1 None None 1 2024-07-26 14:36:28.352415+00:00
2 wOWcnezBDnpQ25r4pw0a None The iris collection None .parquet dataset DataFrame 5629 VQGEXvcC5ZvBwMSm-3CHWg md5 None None 1 True 1 None None 1 2024-07-26 14:36:28.345288+00:00
1 WuxBwVjs3WB10x63sZO8 None My image None .jpg dataset None 29358 r4tnqmKI_SjrkdLzpuWp4g md5 None None 1 True 1 None None 1 2024-07-26 14:36:28.200400+00:00

contains

ln.Transform.filter(name__contains="search").df().head(10)
Hide code cell output
uid version name key description type reference reference_type latest_report_id source_code_id created_by_id updated_at
id
18 wBs4xhDVLBWLz3zS None Research Inner hair cells Natural killer T cel... None None notebook None None None None 1 2024-07-26 14:36:29.927401+00:00
19 jHj5dFxvmyHTpNSY None Liver Lipocyte research Type I Pneumocyte Betz... None None notebook None None None None 1 2024-07-26 14:36:29.927554+00:00
30 l5XWwvzSZhZl6wbX None Igd Inner hair cells cluster research IgG Cili... None None notebook None None None None 1 2024-07-26 14:36:29.929265+00:00
31 GKPG74uqzwitZkYY None Igg2 visualize IgY Inner hair cells research. None None notebook None None None None 1 2024-07-26 14:36:29.929423+00:00
36 UTQaxsbdsdze1Yjj None Heart IgE IgE result research IgA. None None notebook None None None None 1 2024-07-26 14:36:29.930192+00:00
59 iDIRkJNMmn76Hsol None Research Liver lipocyte Natural killer T cell ... None None notebook None None None None 1 2024-07-26 14:36:29.933754+00:00
73 weQFFJx8ZDfRPgtX None Intestinal research investigate research Type ... None None notebook None None None None 1 2024-07-26 14:36:29.935908+00:00
83 i4wDmckovdO2h3PN None Ige study IgD research result IgG1. None None notebook None None None None 1 2024-07-26 14:36:30.021650+00:00
86 Mehg72TcMRmnA8IB None Igg IgG1 IgG research. None None notebook None None None None 1 2024-07-26 14:36:30.022092+00:00
107 zrz3PbzGH8WboXB3 None Iga Natural killer T cell Inner hair cells res... None None notebook None None None None 1 2024-07-26 14:36:30.025196+00:00

And case-insensitive:

ln.Transform.filter(name__icontains="Search").df().head(10)
Hide code cell output
uid version name key description type reference reference_type latest_report_id source_code_id created_by_id updated_at
id
18 wBs4xhDVLBWLz3zS None Research Inner hair cells Natural killer T cel... None None notebook None None None None 1 2024-07-26 14:36:29.927401+00:00
19 jHj5dFxvmyHTpNSY None Liver Lipocyte research Type I Pneumocyte Betz... None None notebook None None None None 1 2024-07-26 14:36:29.927554+00:00
30 l5XWwvzSZhZl6wbX None Igd Inner hair cells cluster research IgG Cili... None None notebook None None None None 1 2024-07-26 14:36:29.929265+00:00
31 GKPG74uqzwitZkYY None Igg2 visualize IgY Inner hair cells research. None None notebook None None None None 1 2024-07-26 14:36:29.929423+00:00
36 UTQaxsbdsdze1Yjj None Heart IgE IgE result research IgA. None None notebook None None None None 1 2024-07-26 14:36:29.930192+00:00
59 iDIRkJNMmn76Hsol None Research Liver lipocyte Natural killer T cell ... None None notebook None None None None 1 2024-07-26 14:36:29.933754+00:00
73 weQFFJx8ZDfRPgtX None Intestinal research investigate research Type ... None None notebook None None None None 1 2024-07-26 14:36:29.935908+00:00
83 i4wDmckovdO2h3PN None Ige study IgD research result IgG1. None None notebook None None None None 1 2024-07-26 14:36:30.021650+00:00
86 Mehg72TcMRmnA8IB None Igg IgG1 IgG research. None None notebook None None None None 1 2024-07-26 14:36:30.022092+00:00
107 zrz3PbzGH8WboXB3 None Iga Natural killer T cell Inner hair cells res... None None notebook None None None None 1 2024-07-26 14:36:30.025196+00:00

startswith

ln.Transform.filter(name__startswith="Research").df()
Hide code cell output
uid version name key description type reference reference_type latest_report_id source_code_id created_by_id updated_at
id
18 wBs4xhDVLBWLz3zS None Research Inner hair cells Natural killer T cel... None None notebook None None None None 1 2024-07-26 14:36:29.927401+00:00
59 iDIRkJNMmn76Hsol None Research Liver lipocyte Natural killer T cell ... None None notebook None None None None 1 2024-07-26 14:36:29.933754+00:00
111 jZ1f5dp2BfF2Rcrf None Research study Ligaments IgG2 IgG Betz cells. None None notebook None None None None 1 2024-07-26 14:36:30.025787+00:00
170 kAFmz2a8uLT1zE0O None Research cluster classify Osteoblast investiga... None None notebook None None None None 1 2024-07-26 14:36:30.037218+00:00
248 chgCie7zYLb2L465 None Research rank IgD Liver lipocyte Betz cells. None None notebook None None None None 1 2024-07-26 14:36:30.051316+00:00
286 vxb3m7AtJk0XITWS None Research rank intestinal IgG3. None None notebook None None None None 1 2024-07-26 14:36:30.056891+00:00
294 eCuX2RkZj426RQfF None Research Pancreatic acinar Inner hair cells in... None None notebook None None None None 1 2024-07-26 14:36:30.058077+00:00
408 uiSmt0Cl4YrMqX00 None Research IgE IgD Ligaments IgG2 Betz cells. None None notebook None None None None 1 2024-07-26 14:36:30.080211+00:00
428 VNa8fF1fLwWbB0L9 None Research IgE Liver lipocyte IgG3 intestine Typ... None None notebook None None None None 1 2024-07-26 14:36:30.083205+00:00
Hide code cell content
# clean up test instance
!lamin delete --force mydata
!rm -r mydata
Traceback (most recent call last):
  File "/opt/hostedtoolcache/Python/3.11.9/x64/bin/lamin", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/rich_click/rich_command.py", line 367, in __call__
    return super().__call__(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/rich_click/rich_command.py", line 152, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/lamin_cli/__main__.py", line 105, in delete
    return delete(instance, force=force)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/lamindb_setup/_delete.py", line 98, in delete
    n_objects = check_storage_is_empty(
                ^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/lamindb_setup/core/upath.py", line 779, in check_storage_is_empty
    raise InstanceNotEmpty(message)
lamindb_setup.core.upath.InstanceNotEmpty: Storage /home/runner/work/lamindb/lamindb/docs/mydata/.lamindb contains 3 objects ('_is_initialized' ignored) - delete them prior to deleting the instance
['/home/runner/work/lamindb/lamindb/docs/mydata/.lamindb/2bex0nsJwfE7cZuk4L35.fastq.gz', '/home/runner/work/lamindb/lamindb/docs/mydata/.lamindb/WuxBwVjs3WB10x63sZO8.jpg', '/home/runner/work/lamindb/lamindb/docs/mydata/.lamindb/_is_initialized', '/home/runner/work/lamindb/lamindb/docs/mydata/.lamindb/wOWcnezBDnpQ25r4pw0a.parquet']