lamindb.Storage

class lamindb.Storage(root: str, type: str, description: str | None = None, region: str | None = None)

Bases: SQLRecord, TracksRun, TracksUpdates

Storage locations of artifacts such as folders and S3 buckets.

A storage location is either a folder (local or in the cloud) or an entire S3/GCP bucket.

A LaminDB instance can manage and reference multiple storage locations. But any storage location is managed by at most one LaminDB instance.

Managed vs. referenced storage locations

A LaminDB instance can only write artifacts to its managed storage locations and merely reads artifacts from its referenced storage locations.

The instance_uid field defines the managing LaminDB instance of a storage location. Some storage locations may not be managed by any LaminDB instance, in which case the instance_uid is None. If it matches the instance_uid of the current instance, the storage location is managed by the current instance.

Here is an example based (source):

https://lamin-site-assets.s3.amazonaws.com/.lamindb/eHDmIOAxLEoqZ2oK0000.png
Keeping track of storage locations across instances

Head over to https://lamin.ai/{account}/infrastructure and you’ll see something like this:

https://lamin-site-assets.s3.amazonaws.com/.lamindb/ze8hkgVxVptSSZEU0000.png
Parameters:
  • rootstr The root path of the storage location, e.g., "./myfolder", "s3://my-bucket/myfolder", or "gs://my-bucket/myfolder".

  • typeStorageType The type of storage.

  • descriptionstr | None = None A description.

  • regionstr | None = None Cloud storage region, if applicable. Auto-populated for AWS S3.

See also

lamindb.core.Settings.storage

Current default storage location of your compute session for writing artifacts.

StorageSettings

Storage settings.

Examples

When you create a LaminDB instance, you configure its default storage location via --storage:

lamin init --storage ./myfolder  # or "s3://my-bucket/myfolder" or "gs://my-bucket/myfolder"

View the current default storage location in your compute session for writing artifacts:

import lamindb as ln

ln.settings.storage

Switch to another default storage location for writing artifacts:

ln.settings.storage = "./myfolder2"  # or "s3://my-bucket/my-folder2" or "gs://my-bucket/my-folder2"

View all storage locations used in your LaminDB instance:

ln.Storage.df()

Create a new storage location:

ln.Storage(root="./myfolder3").save()

Notes

How do I manage access to a storage location?

You can low-level manage access through AWS policies that you attach to your S3 bucket or leverage LaminHub’s fine-grained access management.

Manage access explains both approaches.

What is the .lamindb/ directory inside a storage location?

It stores all artifacts that are ingested through lamindb, indexed by the artifact uid. This means you don’t have to worry about renaming or moving files, as this all happens on the database level.

Existing artifacts are typically stored in hierarchical structures with semantic folder names. Instead of copying such artifacts into .lamindb/ upon calls of Artifact("legacy_path").save(), LaminDB registers them with the semantic key representing the relative path within the storage location. These artifacts are marked with artifact._key_is_virtual = False and treated correspondingly.

There is only a single .lamindb/ directory per storage location.

What should I do if I want to bulk migrate all artifacts to another storage?

Currently, you can only achieve this manually and you should be careful with it.

  1. Copy or move artifacts into the desired new storage location

  2. Adapt the corresponding record in the {class}`~lamindb.Storage` registry by setting the root field to the new location

  3. If your LaminDB storage location is managed through the hub, you also need to update the storage record on the hub – contact support

Attributes

property path: Path | UPath

Path.

Uses the .root field and converts it into a Path or UPath.

Simple fields

uid: str

Universal id, valid across DB instances.

root: str

Root path of storage (cloud or local path).

description: str | None

A description of what the storage location is used for (optional).

type: StorageType

Can be “local” vs. “s3” vs. “gs”.

region: str | None

Cloud storage region, if applicable.

instance_uid: str | None

Instance that manages this storage location.

created_at: datetime

Time of creation of record.

updated_at: datetime

Time of last update to record.

Relational fields

branch: Branch

Whether record is on a branch or in another “special state”.

space: Space

The space in which the record lives.

created_by: User

Creator of record.

run: Run | None

Run that created record.

artifacts: Artifact

Artifacts contained in this storage location.

Class methods

classmethod df(include=None, features=False, limit=100)

Convert to pd.DataFrame.

By default, shows all direct fields, except updated_at.

Use arguments include or feature to include other data.

Parameters:
  • include (str | list[str] | None, default: None) – Related fields to include as columns. Takes strings of form "ulabels__name", "cell_types__name", etc. or a list of such strings.

  • features (bool | list[str], default: False) – If True, map all features of the Feature registry onto the resulting DataFrame. Only available for Artifact.

  • limit (int, default: 100) – Maximum number of rows to display from a Pandas DataFrame. Defaults to 100 to reduce database load.

Return type:

DataFrame

Examples

Include the name of the creator in the DataFrame:

>>> ln.ULabel.df(include="created_by__name"])

Include display of features for Artifact:

>>> df = ln.Artifact.df(features=True)
>>> ln.view(df)  # visualize with type annotations

Only include select features:

>>> df = ln.Artifact.df(features=["cell_type_by_expert", "cell_type_by_model"])
classmethod filter(*queries, **expressions)

Query records.

Parameters:
  • queries – One or multiple Q objects.

  • expressions – Fields and values passed as Django query expressions.

Return type:

QuerySet

Returns:

A QuerySet.

See also

Examples

>>> ln.ULabel(name="my label").save()
>>> ln.ULabel.filter(name__startswith="my").df()
classmethod get(idlike=None, **expressions)

Get a single record.

Parameters:
  • idlike (int | str | None, default: None) – Either a uid stub, uid or an integer id.

  • expressions – Fields and values passed as Django query expressions.

Raises:

lamindb.errors.DoesNotExist – In case no matching record is found.

Return type:

SQLRecord

See also

Examples

ulabel = ln.ULabel.get("FvtpPJLJ")
ulabel = ln.ULabel.get(name="my-label")
classmethod lookup(field=None, return_field=None)

Return an auto-complete object for a field.

Parameters:
  • field (str | DeferredAttribute | None, default: None) – The field to look up the values for. Defaults to first string field.

  • return_field (str | DeferredAttribute | None, default: None) – The field to return. If None, returns the whole record.

  • keep – When multiple records are found for a lookup, how to return the records. - "first": return the first record. - "last": return the last record. - False: return all records.

Return type:

NamedTuple

Returns:

A NamedTuple of lookup information of the field values with a dictionary converter.

See also

search()

Examples

>>> import bionty as bt
>>> bt.settings.organism = "human"
>>> bt.Gene.from_source(symbol="ADGB-DT").save()
>>> lookup = bt.Gene.lookup()
>>> lookup.adgb_dt
>>> lookup_dict = lookup.dict()
>>> lookup_dict['ADGB-DT']
>>> lookup_by_ensembl_id = bt.Gene.lookup(field="ensembl_gene_id")
>>> genes.ensg00000002745
>>> lookup_return_symbols = bt.Gene.lookup(field="ensembl_gene_id", return_field="symbol")
classmethod search(string, *, field=None, limit=20, case_sensitive=False)

Search.

Parameters:
  • string (str) – The input string to match against the field ontology values.

  • field (str | DeferredAttribute | None, default: None) – The field or fields to search. Search all string fields by default.

  • limit (int | None, default: 20) – Maximum amount of top results to return.

  • case_sensitive (bool, default: False) – Whether the match is case sensitive.

Return type:

QuerySet

Returns:

A sorted DataFrame of search results with a score in column score. If return_queryset is True. QuerySet.

See also

filter() lookup()

Examples

>>> ulabels = ln.ULabel.from_values(["ULabel1", "ULabel2", "ULabel3"], field="name")
>>> ln.save(ulabels)
>>> ln.ULabel.search("ULabel2")
classmethod using(instance)

Use a non-default LaminDB instance.

Parameters:

instance (str | None) – An instance identifier of form “account_handle/instance_name”.

Return type:

QuerySet

Examples

>>> ln.ULabel.using("account_handle/instance_name").search("ULabel7", field="name")
            uid    score
name
ULabel7  g7Hk9b2v  100.0
ULabel5  t4Jm6s0q   75.0
ULabel6  r2Xw8p1z   75.0

Methods

delete()

Delete the storage location.

This errors in case the storage location is not empty.

Return type:

None

save(*args, **kwargs)

Save.

Always saves to the default database.

Return type:

TypeVar(T, bound= SQLRecord)