lamindb.Storage¶
- class lamindb.Storage(root: str, *, description: str | None = None, space: Space | None = None, host: str | None = None)¶
Bases:
SQLRecord,TracksRun,TracksUpdatesStorage locations of artifacts such as local directories or S3 buckets.
A storage location is either a directory (local or a folder in the cloud) or an entire S3/GCP bucket.
A storage location is written to by at most one LaminDB instance: the location’s writing instance. Some locations are not managed with LaminDB and, hence, do not have a writing instance.
Writable vs. read-only storage locations
The
instance_uidfield ofStoragedefines its writing instance. Only if a storage location’sinstance_uidmatches your current instance’suid(ln.settings.instance_uid), you can write to it. All other storage locations are read-only in your current instance.Here is an example (source).
Some storage locations are not written to by any LaminDB instance, hence, their
instance_uidisNone.Managing access to storage locations across instances
You can manage access through LaminHub’s fine-grained access management or through AWS policies that you attach to your S3 bucket.
To enable access management via LaminHub, head over to
https://lamin.ai/{account}/infrastructure. By clicking the green button that says “Connect S3 bucket”, your collaborators will access data based on their LaminHub permissions. Manage access has more details.
By default, a storage location inherits the access permissions of its instance. If you want to further restrict access to a storage location, you can move it into a space:
space = ln.Space.get(name="my-space") storage_loc = ln.Storage.get(root="s3://my-storage-location") storage_loc.space = space storage_loc.save()
If you don’t want to store data in the cloud, you can use local storage locations: Keep artifacts local in a cloud instance.
- Parameters:
root –
strThe root path of the storage location, e.g.,"./mydir","s3://my-bucket","s3://my-bucket/myfolder","gs://my-bucket/myfolder","/nfs/shared/datasets/genomics","/weka/shared/models/", …description –
str | None = NoneAn optional description.space –
Space | None = NoneA space to restrict access permissions to the storage location.host –
str | None = NoneFor local storage locations, a globally unique identifier for the physical machine/server hosting the storage. This distinguishes storage locations that may have the same local path but exist on different servers, e.g."my-institute-cluster-1","my-server-abcd".
See also
lamindb.core.Settings.storageCurrent default storage location of your compute session for writing artifacts.
StorageSettingsStorage settings.
- Keep artifacts local in a cloud instance
Avoid storing artifacts in the cloud, but keep them on local infrastructure.
Examples
When you create a LaminDB instance, you configure its default storage location via
--storage:lamin init --storage ./mydatadir # or "s3://my-bucket/myfolder", "gs://my-bucket/myfolder", ...
View the current default storage location for writing artifacts:
import lamindb as ln ln.settings.storage
Create a new cloud storage location:
ln.Storage(root="s3://our-bucket/our-folder").save()
Create a new local storage location:
ln.Storage(root="/dir/our-shared-dir", host="our-server-123").save()
Globally switch to another storage location:
ln.settings.storage = "/dir/our-shared-dir" # or "s3://our-bucket/our-folder", "gs://our-bucket/our-folder", ...
Or if you’re operating in
keep-artifacts-localmode (Keep artifacts local in a cloud instance):ln.settings.local_storage = "/dir/our-other-shared-dir"
View all storage locations used in your LaminDB instance:
ln.Storage.to_dataframe()
Notes
What is the
.lamindb/directory inside a storage location?It stores all artifacts that are ingested through
lamindb, indexed by the artifactuid. This means you don’t have to worry about renaming or moving files, as this all happens on the database level.Existing artifacts are typically stored in hierarchical structures with semantic folder names. Instead of copying such artifacts into
.lamindb/upon calls ofArtifact("legacy_path").save(), LaminDB registers them with the semantickeyrepresenting the relative path within the storage location. These artifacts are marked withartifact._key_is_virtual = Falseand treated correspondingly.There is only a single
.lamindb/directory per storage location.What should I do if I want to bulk migrate all artifacts to another storage?
Currently, you can only achieve this manually and you should be careful with it.
Copy or move artifacts into the desired new storage location
Adapt the corresponding record in the {class}`~lamindb.Storage` registry by setting the
rootfield to the new locationIf your LaminDB storage location is connected to the hub, you also need to update the storage record on the hub
- DoesNotExist = <class 'lamindb.models.storage.Storage.DoesNotExist'>¶
- Meta = <class 'lamindb.models.sqlrecord.SQLRecord.Meta'>¶
- artifacts: Artifact¶
Artifacts contained in this storage location.
- branch: Branch¶
Life cycle state of record.
branch.namecan be “main” (default branch), “trash” (trash),branch.name = "archive"(archived), or any other user-created branch typically planned for merging onto main after review.
- branch_id¶
- created_by: User¶
Creator of record.
- created_by_id¶
- property host: str | None¶
Host identifier for local storage locations.
Is
Nonefor locations withtype != "local".A globally unique user-defined host identifier (cluster, server, laptop, etc.).
- objects = <lamindb.models.query_manager.QueryManager object>¶
- property pk¶
- run: Run | None¶
Run that created record.
- run_id¶
- space: Space¶
The space in which the record lives.
- space_id¶
- save(*args, **kwargs)¶
Save the storage record.
- delete(permanent=None)¶
Delete the storage location.
This errors in case the storage location is not empty.
Unlike other
SQLRecord-based registries, this does not move the storage record into the trash.- Parameters:
permanent (
bool|None, default:None) –Falseraises an error, as soft delete is impossible.- Return type:
None
- restore()¶
Restore from trash onto the main branch.
Does not restore descendant records if the record is
HasTypewithis_type = True.- Return type:
None
- refresh_from_db(using=None, fields=None, from_queryset=None)¶
Reload field values from the database.
By default, the reloading happens from the database this instance was loaded from, or by the read router if this instance wasn’t loaded from any database. The using parameter will override the default.
Fields can be used to specify which fields to reload. The fields should be an iterable of field attnames. If fields is None, then all non-deferred fields are reloaded.
When accessing deferred fields of an instance, the deferred loading of the field will call this method.
- async arefresh_from_db(using=None, fields=None, from_queryset=None)¶