lamindb.Storage¶
- class lamindb.Storage(root: str, type: str, description: str | None = None, region: str | None = None)¶
Bases:
SQLRecord
,TracksRun
,TracksUpdates
Storage locations of artifacts such as folders and S3 buckets.
A storage location is either a folder (local or in the cloud) or an entire S3/GCP bucket.
A LaminDB instance can manage and reference multiple storage locations. But any storage location is managed by at most one LaminDB instance.
Managed vs. referenced storage locations
A LaminDB instance can only write artifacts to its managed storage locations and merely reads artifacts from its referenced storage locations.
The
instance_uid
field defines the managing LaminDB instance of a storage location. Some storage locations may not be managed by any LaminDB instance, in which case theinstance_uid
isNone
. If it matches theinstance_uid
of the current instance, the storage location is managed by the current instance.Here is an example based (source):
Keeping track of storage locations across instances
- Parameters:
root –
str
The root path of the storage location, e.g.,"./myfolder"
,"s3://my-bucket/myfolder"
, or"gs://my-bucket/myfolder"
.type –
StorageType
The type of storage.description –
str | None = None
A description.region –
str | None = None
Cloud storage region, if applicable. Auto-populated for AWS S3.
See also
lamindb.core.Settings.storage
Current default storage location of your compute session for writing artifacts.
StorageSettings
Storage settings.
Examples
When you create a LaminDB instance, you configure its default storage location via
--storage
:lamin init --storage ./myfolder # or "s3://my-bucket/myfolder" or "gs://my-bucket/myfolder"
View the current default storage location in your compute session for writing artifacts:
import lamindb as ln ln.settings.storage
Switch to another default storage location for writing artifacts:
ln.settings.storage = "./myfolder2" # or "s3://my-bucket/my-folder2" or "gs://my-bucket/my-folder2"
View all storage locations used in your LaminDB instance:
ln.Storage.df()
Create a new storage location:
ln.Storage(root="./myfolder3").save()
Notes
How do I manage access to a storage location?
You can low-level manage access through AWS policies that you attach to your S3 bucket or leverage LaminHub’s fine-grained access management.
Manage access explains both approaches.
What is the
.lamindb/
directory inside a storage location?It stores all artifacts that are ingested through
lamindb
, indexed by the artifactuid
. This means you don’t have to worry about renaming or moving files, as this all happens on the database level.Existing artifacts are typically stored in hierarchical structures with semantic folder names. Instead of copying such artifacts into
.lamindb/
upon calls ofArtifact("legacy_path").save()
, LaminDB registers them with the semantickey
representing the relative path within the storage location. These artifacts are marked withartifact._key_is_virtual = False
and treated correspondingly.There is only a single
.lamindb/
directory per storage location.What should I do if I want to bulk migrate all artifacts to another storage?
Currently, you can only achieve this manually and you should be careful with it.
Copy or move artifacts into the desired new storage location
Adapt the corresponding record in the {class}`~lamindb.Storage` registry by setting the
root
field to the new locationIf your LaminDB storage location is managed through the hub, you also need to update the storage record on the hub – contact support
Attributes¶
Simple fields¶
- uid: str¶
Universal id, valid across DB instances.
- root: str¶
Root path of storage (cloud or local path).
- description: str | None¶
A description of what the storage location is used for (optional).
- type: StorageType¶
Can be “local” vs. “s3” vs. “gs”.
- region: str | None¶
Cloud storage region, if applicable.
- instance_uid: str | None¶
Instance that manages this storage location.
- created_at: datetime¶
Time of creation of record.
- updated_at: datetime¶
Time of last update to record.
Relational fields¶
- branch: Branch¶
Whether record is on a branch or in another “special state”.
Class methods¶
- classmethod df(include=None, features=False, limit=100)¶
Convert to
pd.DataFrame
.By default, shows all direct fields, except
updated_at
.Use arguments
include
orfeature
to include other data.- Parameters:
include (
str
|list
[str
] |None
, default:None
) – Related fields to include as columns. Takes strings of form"ulabels__name"
,"cell_types__name"
, etc. or a list of such strings.features (
bool
|list
[str
], default:False
) – IfTrue
, map all features of theFeature
registry onto the resultingDataFrame
. Only available forArtifact
.limit (
int
, default:100
) – Maximum number of rows to display from a Pandas DataFrame. Defaults to 100 to reduce database load.
- Return type:
DataFrame
Examples
Include the name of the creator in the
DataFrame
:>>> ln.ULabel.df(include="created_by__name"])
Include display of features for
Artifact
:>>> df = ln.Artifact.df(features=True) >>> ln.view(df) # visualize with type annotations
Only include select features:
>>> df = ln.Artifact.df(features=["cell_type_by_expert", "cell_type_by_model"])
- classmethod filter(*queries, **expressions)¶
Query records.
- Parameters:
queries – One or multiple
Q
objects.expressions – Fields and values passed as Django query expressions.
- Return type:
- Returns:
A
QuerySet
.
See also
Guide: Query & search registries
Django documentation: Queries
Examples
>>> ln.ULabel(name="my label").save() >>> ln.ULabel.filter(name__startswith="my").df()
- classmethod get(idlike=None, **expressions)¶
Get a single record.
- Parameters:
idlike (
int
|str
|None
, default:None
) – Either a uid stub, uid or an integer id.expressions – Fields and values passed as Django query expressions.
- Raises:
lamindb.errors.DoesNotExist – In case no matching record is found.
- Return type:
See also
Guide: Query & search registries
Django documentation: Queries
Examples
ulabel = ln.ULabel.get("FvtpPJLJ") ulabel = ln.ULabel.get(name="my-label")
- classmethod lookup(field=None, return_field=None)¶
Return an auto-complete object for a field.
- Parameters:
field (
str
|DeferredAttribute
|None
, default:None
) – The field to look up the values for. Defaults to first string field.return_field (
str
|DeferredAttribute
|None
, default:None
) – The field to return. IfNone
, returns the whole record.keep – When multiple records are found for a lookup, how to return the records. -
"first"
: return the first record. -"last"
: return the last record. -False
: return all records.
- Return type:
NamedTuple
- Returns:
A
NamedTuple
of lookup information of the field values with a dictionary converter.
See also
Examples
>>> import bionty as bt >>> bt.settings.organism = "human" >>> bt.Gene.from_source(symbol="ADGB-DT").save() >>> lookup = bt.Gene.lookup() >>> lookup.adgb_dt >>> lookup_dict = lookup.dict() >>> lookup_dict['ADGB-DT'] >>> lookup_by_ensembl_id = bt.Gene.lookup(field="ensembl_gene_id") >>> genes.ensg00000002745 >>> lookup_return_symbols = bt.Gene.lookup(field="ensembl_gene_id", return_field="symbol")
- classmethod search(string, *, field=None, limit=20, case_sensitive=False)¶
Search.
- Parameters:
string (
str
) – The input string to match against the field ontology values.field (
str
|DeferredAttribute
|None
, default:None
) – The field or fields to search. Search all string fields by default.limit (
int
|None
, default:20
) – Maximum amount of top results to return.case_sensitive (
bool
, default:False
) – Whether the match is case sensitive.
- Return type:
- Returns:
A sorted
DataFrame
of search results with a score in columnscore
. Ifreturn_queryset
isTrue
.QuerySet
.
Examples
>>> ulabels = ln.ULabel.from_values(["ULabel1", "ULabel2", "ULabel3"], field="name") >>> ln.save(ulabels) >>> ln.ULabel.search("ULabel2")
- classmethod using(instance)¶
Use a non-default LaminDB instance.
- Parameters:
instance (
str
|None
) – An instance identifier of form “account_handle/instance_name”.- Return type:
Examples
>>> ln.ULabel.using("account_handle/instance_name").search("ULabel7", field="name") uid score name ULabel7 g7Hk9b2v 100.0 ULabel5 t4Jm6s0q 75.0 ULabel6 r2Xw8p1z 75.0
Methods¶
- delete()¶
Delete the storage location.
This errors in case the storage location is not empty.
- Return type:
None
- save(*args, **kwargs)¶
Save.
Always saves to the default database.
- Return type:
TypeVar
(T
, bound= SQLRecord)