lamindb.Storage¶
- class lamindb.Storage(root: str, *, description: str | None = None, space: Space | None = None, host: str | None = None)¶
Bases:
SQLRecord
,TracksRun
,TracksUpdates
Storage locations of artifacts such as local directories or S3 buckets.
A storage location is either a directory (local or a folder in the cloud) or an entire S3/GCP bucket. A LaminDB instance can manage and read from multiple storage locations. But any storage location is managed by at most one LaminDB instance.
Managed vs. read-only storage locations
A LaminDB instance can only write artifacts to its managed storage locations.
The
instance_uid
field defines the managing LaminDB instance of a storage location. You can access theinstance_uid
of your current instance throughln.setup.settings.instance_uid
.Here is an example (source).
Some public storage locations are not be managed by any LaminDB instance: their
instance_uid
isNone
.Managing access to storage locations across instances
You can manage access through LaminHub’s fine-grained access management or through AWS policies that you attach to your S3 bucket.
To enable access management via LaminHub, head over to
https://lamin.ai/{account}/infrastructure
. By clicking the green button that says “Connect S3 bucket”, LaminDB will start connecting through federated S3 tokens so that your collaborators access data based on their permissions in LaminHub. Manage access has more details.By default, access permissions to a storage location are governed by the access permissions of its managing instance. If you want to further restrict access to a storage location, you can move it into a space:
space = ln.Space.get(name="my-space") storage_loc = ln.Storage.get(root="s3://my-storace-location") storage_loc.space = space storage_loc.save()
If you don’t want to store data in the cloud, you can use local storage locations: Keep artifacts local in a cloud instance.
- Parameters:
root –
str
The root path of the storage location, e.g.,"./mydir"
,"s3://my-bucket"
,"s3://my-bucket/myfolder"
,"gs://my-bucket/myfolder"
,"/nfs/shared/datasets/genomics"
,"/weka/shared/models/"
, …description –
str | None = None
An optional description.space –
Space | None = None
A space to restrict access permissions to the storage location.host –
str | None = None
For local storage locations, pass a globally unique host identifier, e.g."my-institute-cluster-1"
,"my-server-abcd"
, …
See also
lamindb.core.Settings.storage
Current default storage location of your compute session for writing artifacts.
StorageSettings
Storage settings.
- Keep artifacts local in a cloud instance
Avoid storing artifacts in the cloud, but keep them on local infrastructure.
Examples
When you create a LaminDB instance, you configure its default storage location via
--storage
:lamin init --storage ./mydatadir # or "s3://my-bucket/myfolder", "gs://my-bucket/myfolder", ...
View the current default storage location for writing artifacts:
import lamindb as ln ln.settings.storage
Create a new cloud storage location:
ln.Storage(root="s3://our-bucket/our-folder").save()
Create a new local storage location:
ln.Storage(root="/dir/our-shared-dir", host="our-server-123").save()
Globally switch to another storage location:
ln.settings.storage = "/dir/our-shared-dir" # or "s3://our-bucket/our-folder", "gs://our-bucket/our-folder", ...
Or if you’re operating in
keep-artifacts-local
mode (Keep artifacts local in a cloud instance):ln.settings.local_storage = "/dir/our-other-shared-dir"
View all storage locations used in your LaminDB instance:
ln.Storage.to_dataframe()
Notes
What is the
.lamindb/
directory inside a storage location?It stores all artifacts that are ingested through
lamindb
, indexed by the artifactuid
. This means you don’t have to worry about renaming or moving files, as this all happens on the database level.Existing artifacts are typically stored in hierarchical structures with semantic folder names. Instead of copying such artifacts into
.lamindb/
upon calls ofArtifact("legacy_path").save()
, LaminDB registers them with the semantickey
representing the relative path within the storage location. These artifacts are marked withartifact._key_is_virtual = False
and treated correspondingly.There is only a single
.lamindb/
directory per storage location.What should I do if I want to bulk migrate all artifacts to another storage?
Currently, you can only achieve this manually and you should be careful with it.
Copy or move artifacts into the desired new storage location
Adapt the corresponding record in the {class}`~lamindb.Storage` registry by setting the
root
field to the new locationIf your LaminDB storage location is managed through the hub, you also need to update the storage record on the hub – contact support
Attributes¶
- property host: str | None¶
Host identifier for local storage locations.
Is
None
for locations withtype != "local"
.A globally unique user-defined host identifier (cluster, server, laptop, etc.).
Simple fields¶
- uid: str¶
Universal id, valid across DB instances.
- root: str¶
Root path of storage (cloud or local path).
- description: str | None¶
A description of what the storage location is used for (optional).
- type: StorageType¶
Can be “local” vs. “s3” vs. “gs”. Is auto-detected from the format of the
root
path.
- region: str | None¶
Storage region for cloud storage locations. Host identifier for local storage locations.
- instance_uid: str | None¶
Instance that manages this storage location.
- created_at: datetime¶
Time of creation of record.
- updated_at: datetime¶
Time of last update to record.
Relational fields¶
- branch: Branch¶
Whether record is on a branch or in another “special state”.
Class methods¶
- classmethod filter(*queries, **expressions)¶
Query records.
- Parameters:
queries – One or multiple
Q
objects.expressions – Fields and values passed as Django query expressions.
- Return type:
- Returns:
A
QuerySet
.
See also
Guide: Query & search registries
Django documentation: Queries
Examples
>>> ln.ULabel(name="my label").save() >>> ln.ULabel.filter(name__startswith="my").to_dataframe()
- classmethod get(idlike=None, **expressions)¶
Get a single record.
- Parameters:
idlike (
int
|str
|None
, default:None
) – Either a uid stub, uid or an integer id.expressions – Fields and values passed as Django query expressions.
- Raises:
lamindb.errors.DoesNotExist – In case no matching record is found.
- Return type:
See also
Guide: Query & search registries
Django documentation: Queries
Examples
ulabel = ln.ULabel.get("FvtpPJLJ") ulabel = ln.ULabel.get(name="my-label")
- classmethod df(include=None, features=False, limit=100)¶
None
- Return type:
DataFrame
- classmethod search(string, *, field=None, limit=20, case_sensitive=False)¶
Search.
- Parameters:
string (
str
) – The input string to match against the field ontology values.field (
str
|DeferredAttribute
|None
, default:None
) – The field or fields to search. Search all string fields by default.limit (
int
|None
, default:20
) – Maximum amount of top results to return.case_sensitive (
bool
, default:False
) – Whether the match is case sensitive.
- Return type:
- Returns:
A sorted
DataFrame
of search results with a score in columnscore
. Ifreturn_queryset
isTrue
.QuerySet
.
Examples
>>> ulabels = ln.ULabel.from_values(["ULabel1", "ULabel2", "ULabel3"], field="name") >>> ln.save(ulabels) >>> ln.ULabel.search("ULabel2")
- classmethod lookup(field=None, return_field=None)¶
Return an auto-complete object for a field.
- Parameters:
field (
str
|DeferredAttribute
|None
, default:None
) – The field to look up the values for. Defaults to first string field.return_field (
str
|DeferredAttribute
|None
, default:None
) – The field to return. IfNone
, returns the whole record.keep – When multiple records are found for a lookup, how to return the records. -
"first"
: return the first record. -"last"
: return the last record. -False
: return all records.
- Return type:
NamedTuple
- Returns:
A
NamedTuple
of lookup information of the field values with a dictionary converter.
See also
Examples
>>> import bionty as bt >>> bt.settings.organism = "human" >>> bt.Gene.from_source(symbol="ADGB-DT").save() >>> lookup = bt.Gene.lookup() >>> lookup.adgb_dt >>> lookup_dict = lookup.dict() >>> lookup_dict['ADGB-DT'] >>> lookup_by_ensembl_id = bt.Gene.lookup(field="ensembl_gene_id") >>> genes.ensg00000002745 >>> lookup_return_symbols = bt.Gene.lookup(field="ensembl_gene_id", return_field="symbol")
- classmethod using(instance)¶
Use a non-default LaminDB instance.
- Parameters:
instance (
str
|None
) – An instance identifier of form “account_handle/instance_name”.- Return type:
Examples
>>> ln.ULabel.using("account_handle/instance_name").search("ULabel7", field="name") uid score name ULabel7 g7Hk9b2v 100.0 ULabel5 t4Jm6s0q 75.0 ULabel6 r2Xw8p1z 75.0
Methods¶
- save(*args, **kwargs)¶
Save the storage record.
- delete()¶
Delete the storage location.
This errors in case the storage location is not empty.
Unlike other
SQLRecord
-based registries, this does not move the storage record into the trash.- Return type:
None