lamindb.Storage

class lamindb.Storage(root: str, *, description: str | None = None, space: Space | None = None, host: str | None = None)

Bases: SQLRecord, TracksRun, TracksUpdates

Storage locations of artifacts such as local directories or S3 buckets.

A storage location is either a directory (local or a folder in the cloud) or an entire S3/GCP bucket.

A storage location is written to by at most one LaminDB instance: the location’s writing instance. Some locations are not managed with LaminDB and, hence, do not have a writing instance.

Writable vs. read-only storage locations

The instance_uid field of Storage defines its writing instance. Only if a storage location’s instance_uid matches your current instance’s uid (ln.settings.instance_uid), you can write to it. All other storage locations are read-only in your current instance.

Here is an example (source).

https://lamin-site-assets.s3.amazonaws.com/.lamindb/eHDmIOAxLEoqZ2oK0000.png

Some storage locations are not written to by any LaminDB instance, hence, their instance_uid is None.

Managing access to storage locations across instances

You can manage access through LaminHub’s fine-grained access management or through AWS policies that you attach to your S3 bucket.

To enable access management via LaminHub, head over to https://lamin.ai/{account}/infrastructure. By clicking the green button that says “Connect S3 bucket”, your collaborators will access data based on their LaminHub permissions. Manage access has more details.

https://lamin-site-assets.s3.amazonaws.com/.lamindb/ze8hkgVxVptSSZEU0000.png

By default, a storage location inherits the access permissions of its instance. If you want to further restrict access to a storage location, you can move it into a space:

space = ln.Space.get(name="my-space")
storage_loc = ln.Storage.get(root="s3://my-storage-location")
storage_loc.space = space
storage_loc.save()

If you don’t want to store data in the cloud, you can use local storage locations: Keep artifacts local in a cloud instance.

Parameters:
  • rootstr The root path of the storage location, e.g., "./mydir", "s3://my-bucket", "s3://my-bucket/myfolder", "gs://my-bucket/myfolder", "/nfs/shared/datasets/genomics", "/weka/shared/models/", …

  • descriptionstr | None = None An optional description.

  • spaceSpace | None = None A space to restrict access permissions to the storage location.

  • hoststr | None = None For local storage locations, a globally unique identifier for the physical machine/server hosting the storage. This distinguishes storage locations that may have the same local path but exist on different servers, e.g. "my-institute-cluster-1", "my-server-abcd".

See also

lamindb.core.Settings.storage

Current default storage location of your compute session for writing artifacts.

StorageSettings

Storage settings.

Keep artifacts local in a cloud instance

Avoid storing artifacts in the cloud, but keep them on local infrastructure.

Examples

When you create a LaminDB instance, you configure its default storage location via --storage:

lamin init --storage ./mydatadir  # or "s3://my-bucket/myfolder", "gs://my-bucket/myfolder", ...

View the current default storage location for writing artifacts:

import lamindb as ln

ln.settings.storage

Create a new cloud storage location:

ln.Storage(root="s3://our-bucket/our-folder").save()

Create a new local storage location:

ln.Storage(root="/dir/our-shared-dir", host="our-server-123").save()

Globally switch to another storage location:

ln.settings.storage = "/dir/our-shared-dir"  # or "s3://our-bucket/our-folder", "gs://our-bucket/our-folder", ...

Or if you’re operating in keep-artifacts-local mode (Keep artifacts local in a cloud instance):

ln.settings.local_storage = "/dir/our-other-shared-dir"

View all storage locations used in your LaminDB instance:

ln.Storage.to_dataframe()

Notes

What is the .lamindb/ directory inside a storage location?

It stores all artifacts that are ingested through lamindb, indexed by the artifact uid. This means you don’t have to worry about renaming or moving files, as this all happens on the database level.

Existing artifacts are typically stored in hierarchical structures with semantic folder names. Instead of copying such artifacts into .lamindb/ upon calls of Artifact("legacy_path").save(), LaminDB registers them with the semantic key representing the relative path within the storage location. These artifacts are marked with artifact._key_is_virtual = False and treated correspondingly.

There is only a single .lamindb/ directory per storage location.

What should I do if I want to bulk migrate all artifacts to another storage?

Currently, you can only achieve this manually and you should be careful with it.

  1. Copy or move artifacts into the desired new storage location

  2. Adapt the corresponding record in the {class}`~lamindb.Storage` registry by setting the root field to the new location

  3. If your LaminDB storage location is connected to the hub, you also need to update the storage record on the hub

DoesNotExist = <class 'lamindb.models.storage.Storage.DoesNotExist'>
Meta = <class 'lamindb.models.sqlrecord.SQLRecord.Meta'>
artifacts: Artifact

Artifacts contained in this storage location.

branch: Branch

Life cycle state of record.

branch.name can be “main” (default branch), “trash” (trash), branch.name = "archive" (archived), or any other user-created branch typically planned for merging onto main after review.

branch_id
created_by: User

Creator of record.

created_by_id
property host: str | None

Host identifier for local storage locations.

Is None for locations with type != "local".

A globally unique user-defined host identifier (cluster, server, laptop, etc.).

objects = <lamindb.models.query_manager.QueryManager object>
property path: Path | UPath

Path.

Uses the .root field and converts it into a Path or UPath.

property pk
run: Run | None

Run that created record.

run_id
space: Space

The space in which the record lives.

space_id
save(*args, **kwargs)

Save the storage record.

delete(permanent=None)

Delete the storage location.

This errors in case the storage location is not empty.

Unlike other SQLRecord-based registries, this does not move the storage record into the trash.

Parameters:

permanent (bool | None, default: None) – False raises an error, as soft delete is impossible.

Return type:

None

restore()

Restore from trash onto the main branch.

Does not restore descendant records if the record is HasType with is_type = True.

Return type:

None

refresh_from_db(using=None, fields=None, from_queryset=None)

Reload field values from the database.

By default, the reloading happens from the database this instance was loaded from, or by the read router if this instance wasn’t loaded from any database. The using parameter will override the default.

Fields can be used to specify which fields to reload. The fields should be an iterable of field attnames. If fields is None, then all non-deferred fields are reloaded.

When accessing deferred fields of an instance, the deferred loading of the field will call this method.

async arefresh_from_db(using=None, fields=None, from_queryset=None)