lamindb.Storage¶

class lamindb.Storage(root: str, *, description: str | None = None, space: Space | None = None, host: str | None = None)¶

Bases: SQLRecord, TracksRun, TracksUpdates

Storage locations of artifacts such as local directories or S3 buckets.

A storage location is either a directory (local or a folder in the cloud) or an entire S3/GCP bucket.

A storage location is written to by at most one LaminDB instance: the location’s writing instance. Some locations are not managed with LaminDB and, hence, do not have a writing instance.

Parameters:

root – str The root path of the storage location, e.g., "./mydir", "s3://my-bucket", "s3://my-bucket/myfolder", "gs://my-bucket/myfolder", "/nfs/shared/datasets/genomics", "/weka/shared/models/", …
description – str | None = None An optional description.
space – Space | None = None A space to restrict access permissions to the storage location.
host – str | None = None For local storage locations, a globally unique identifier for the physical machine/server hosting the storage. This distinguishes storage locations that may have the same local path but exist on different servers, e.g. "my-institute-cluster-1", "my-server-abcd".

See also

lamindb.core.Settings.storage: Current default storage location of your compute session for writing artifacts.
StorageSettings: Storage settings.
Keep artifacts local in a cloud instance: Avoid storing artifacts in the cloud, but keep them on local infrastructure.

Examples

When you create a LaminDB instance, you configure its default storage location via --storage:

lamin init --storage ./mydatadir  # or "s3://my-bucket/myfolder", "gs://my-bucket/myfolder", ...

View the current default storage location for writing artifacts:

import lamindb as ln

ln.settings.storage

Create a new cloud storage location:

ln.Storage(root="s3://our-bucket/our-folder").save()

Create a new local storage location:

ln.Storage(root="/dir/our-shared-dir", host="our-server-123").save()

Globally switch to another storage location:

ln.settings.storage = "/dir/our-shared-dir"  # or "s3://our-bucket/our-folder", "gs://our-bucket/our-folder", ...

Or if you’re operating in keep-artifacts-local mode (Keep artifacts local in a cloud instance):

ln.settings.local_storage = "/dir/our-other-shared-dir"

View all storage locations used in your LaminDB instance:

ln.Storage.to_dataframe()

Notes

Attributes¶

property host: str | None¶

Host identifier for local storage locations.

Is None for locations with type != "local".

A globally unique user-defined host identifier (cluster, server, laptop, etc.).

property path: Path | UPath¶

Path.

Uses the .root field and converts it into a Path or UPath.

Simple fields¶

uid: str¶: Universal id, valid across DB instances.

root: str¶: Root path of storage (cloud or local path).

description: str | None¶: A description.

type: StorageType¶: Can be “local” vs. “s3” vs. “gs”. Is auto-detected from the format of the root path.

region: str | None¶: Storage region for cloud storage locations. Host identifier for local storage locations.

instance_uid: str | None¶

The writing instance.

Only the LaminDB instance with this uid can write to this storage location. This instance also governs the access permissions of the storage location unless the location is moved into a space.

is_locked: bool¶: Whether the record is locked for edits.

created_at: datetime¶: Time of creation of record.

updated_at: datetime¶: Time of last update to record.

Relational fields¶

branch: Branch¶

Life cycle state of record.

branch.name can be “main” (default branch), “trash” (trash), branch.name = "archive" (archived), or any other user-created branch typically planned for merging onto main after review.

space: Space¶: The space in which the record lives.

created_by: User¶: Creator of record.

run: Run | None¶: Run that created record.

artifacts: Artifact¶: Artifacts contained in this storage location.

Class methods¶

classmethod filter(*queries, **expressions)¶

Query records.

Parameters:

queries – One or multiple Q objects.
expressions – Fields and values passed as Django query expressions.

Return type:

QuerySet

See also

Guide: Query & search registries
Django documentation: Queries

Examples

>>> ln.Project(name="my label").save()
>>> ln.Project.filter(name__startswith="my").to_dataframe()

classmethod get(idlike=None, **expressions)¶

Get a single record.

Parameters:

idlike (int | str | None, default: None) – Either a uid stub, uid or an integer id.
expressions – Fields and values passed as Django query expressions.

Raises:

lamindb.errors.DoesNotExist – In case no matching record is found.

Return type:

SQLRecord

See also

Guide: Query & search registries
Django documentation: Queries

Examples

record = ln.Record.get("FvtpPJLJ")
record = ln.Record.get(name="my-label")

classmethod to_dataframe(include=None, features=False, limit=100)¶

Evaluate and convert to pd.DataFrame.

By default, maps simple fields and foreign keys onto DataFrame columns.

Guide: Query & search registries

Parameters:

include (str | list[str] | None, default: None) – Related data to include as columns. Takes strings of form "records__name", "cell_types__name", etc. or a list of such strings. For Artifact, Record, and Run, can also pass "features" to include features with data types pointing to entities in the core schema. If "privates", includes private fields (fields starting with _).
features (bool | list[str], default: False) – Configure the features to include. Can be a feature name or a list of such names. If "queryset", infers the features used within the current queryset. Only available for Artifact, Record, and Run.
limit (int, default: 100) – Maximum number of rows to display. If None, includes all results.
order_by – Field name to order the records by. Prefix with ‘-’ for descending order. Defaults to ‘-id’ to get the most recent records. This argument is ignored if the queryset is already ordered or if the specified field does not exist.

Return type:

DataFrame

Examples

Include the name of the creator:

ln.Record.to_dataframe(include="created_by__name"])

Include features:

ln.Artifact.to_dataframe(include="features")

Include selected features:

ln.Artifact.to_dataframe(features=["cell_type_by_expert", "cell_type_by_model"])

classmethod search(string, *, field=None, limit=20, case_sensitive=False)¶

Search.

Parameters:

string (str) – The input string to match against the field ontology values.
field (str | DeferredAttribute | None, default: None) – The field or fields to search. Search all string fields by default.
limit (int | None, default: 20) – Maximum amount of top results to return.
case_sensitive (bool, default: False) – Whether the match is case sensitive.

Return type:

QuerySet

Returns:

A sorted DataFrame of search results with a score in column score. If return_queryset is True. QuerySet.

See also

filter() lookup()

Examples

records = ln.Record.from_values(["Label1", "Label2", "Label3"], field="name").save()
ln.Record.search("Label2")

classmethod lookup(field=None, return_field=None)¶

Return an auto-complete object for a field.

Parameters:

field (str | DeferredAttribute | None, default: None) – The field to look up the values for. Defaults to first string field.
return_field (str | DeferredAttribute | None, default: None) – The field to return. If None, returns the whole record.
keep – When multiple records are found for a lookup, how to return the records. - "first": return the first record. - "last": return the last record. - False: return all records.

Return type:

NamedTuple

Returns:

A NamedTuple of lookup information of the field values with a dictionary converter.

See also

search()

Examples

Lookup via auto-complete on .:

import bionty as bt
bt.Gene.from_source(symbol="ADGB-DT").save()
lookup = bt.Gene.lookup()
lookup.adgb_dt

Look up via auto-complete in dictionary:

lookup_dict = lookup.dict()
lookup_dict['ADGB-DT']

Look up via a specific field:

lookup_by_ensembl_id = bt.Gene.lookup(field="ensembl_gene_id")
genes.ensg00000002745

Return a specific field value instead of the full record:

lookup_return_symbols = bt.Gene.lookup(field="ensembl_gene_id", return_field="symbol")

classmethod connect(instance)¶

Query a non-default LaminDB instance.

Parameters:: instance (str | None) – An instance identifier of form “account_handle/instance_name”.
Return type:: QuerySet

Examples

ln.Record.connect("account_handle/instance_name").search("label7", field="name")

Methods¶

save(*args, **kwargs)¶: Save the storage record.

delete(permanent=None)¶

Delete the storage location.

This errors in case the storage location is not empty.

Unlike other SQLRecord-based registries, this does not move the storage record into the trash.

Parameters:: permanent (bool | None, default: None) – False raises an error, as soft delete is impossible.
Return type:: None

restore()¶

Restore from trash onto the main branch.

Does not restore descendant records if the record is HasType with is_type = True.

Return type:: None

refresh_from_db(using=None, fields=None, from_queryset=None)¶

Reload field values from the database.

By default, the reloading happens from the database this instance was loaded from, or by the read router if this instance wasn’t loaded from any database. The using parameter will override the default.

Fields can be used to specify which fields to reload. The fields should be an iterable of field attnames. If fields is None, then all non-deferred fields are reloaded.

When accessing deferred fields of an instance, the deferred loading of the field will call this method.

async arefresh_from_db(using=None, fields=None, from_queryset=None)¶