## lamindb.Artifact

| class lamindb.Artifact(path: UPathStr, *, key: str | None = None, description: str | None = None, kind: ArtifactKind | str | None = None, features: dict[str, Any] | None = None, schema: Schema | None = None, revises: Artifact | None = None, overwrite_versions: bool | None = None, run: Run | False | None = None, storage: Storage | None = None, branch: Branch | None = None, space: Space | None = None, skip_hash_lookup: bool = False) |
class lamindb.Artifact(*db_args)

 Bases: "SQLRecord", "IsVersioned", "TracksRun", "TracksUpdates"

 Datasets & models stored as files, folders, or arrays.

 Some artifacts are table- or array-like, e.g., when stored as
 ".parquet", ".h5ad", ".zarr", or ".tiledb".

 Parameters:
 * **path** -- "UPathStr" A path to a local or remote folder or
 file from which to create the artifact.

| * **key** -- "str | None = None" A key within the storage |
 location, e.g., ""myfolder/myfile.fcs"". Artifacts with the
 same key form a version family.

| * **description** -- "str | None = None" A description. |

| * **kind** -- "Literal["dataset", "model"] | str | None = None" |
 Distinguish models from datasets from other files & folders.

| * **features** -- "dict | None = None" External features to |
 annotate the artifact with via "set_values".

| * **schema** -- "Schema | None = None" A schema to validate |
 features.

| * **revises** -- "Artifact | None = None" Previous version of |
 the artifact. An alternative to passing "key" when creating a
 new version.

| * **overwrite_versions** -- "bool | None = None" Whether to |
 overwrite versions. Defaults to "True" for folders and "False"
 for files.

| * **run** -- "Run | bool | None = None" The run that creates the |
 artifact. If "False", suppress tracking the run. If "None",
 infer the run from the global run context.

| * **branch** -- "Branch | None = None" The branch of the |
 artifact. If "None", uses the current branch.

| * **space** -- "Space | None = None" The space of the artifact. |
 If "None", uses the current space.

| * **storage** -- "Storage | None = None" The storage location |
 for the artifact. If "None", uses the default storage
 location. You can see and set the default storage location in
 "storage".

 * **skip_hash_lookup** -- "bool = False" Skip the hash lookup so
 that a new artifact is created even if an artifact with the
 same hash already exists.

 -[ Examples ]-

 Create an artifact **from a local file or folder**:

 artifact = ln.Artifact("./my_file.parquet", key="examples/my_file.parquet").save()
 artifact = ln.Artifact("./my_folder", key="project1/my_folder").save()

 Calling ".save()" copies or uploads the file to the default storage
 location of your lamindb instance. If you create an artifact **from
 a remote file or folder**, lamindb registers the S3 "key" and
 avoids copying the data:

 artifact = ln.Artifact("s3://my_bucket/my_folder/my_file.csv").save()

 If you then want to query & access the artifact later on, this is
 how you do it:

 artifact = ln.Artifact.get(key="examples/my_file.parquet")
 cached_path = artifact.cache()  # sync to local cache & get local path

 If the storage format supports it, you can load the artifact
 directly into memory or query it through a streaming interface,
 e.g., for parquet files:

 df = artifact.load() # load parquet file as DataFrame
 pyarrow_dataset = artifact.open()  # open a streaming file-like object

 If you want to **validate & annotate** a dataframe or an array
 using the feature & label registries, pass "schema" to one of the
 ".from_dataframe()", ".from_anndata()", ... constructors:

 artifact = ln.Artifact.from_dataframe(
 "./my_file.parquet",
 key="my_dataset.parquet",
 schema="valid_features"
 ).save()

 To annotate by **external features**:

 artifact = ln.Artifact("./my_file.parquet", features={"cell_type_by_model": "T cell"}).save()

 You can make a **new version** of an artifact by passing an
 existing "key":

 artifact_v2 = ln.Artifact("./my_file.parquet", key="examples/my_file.parquet").save()
 artifact_v2.versions.to_dataframe()  # see all versions

 You can write artifacts to **non-default storage locations** by
 passing the "storage" argument:

 storage_loc = ln.Storage.get(root="s3://my_bucket")  # get storage location, or create via ln.Storage(root="s3://my_bucket").save()
 ln.Artifact("./my_file.parquet", key="examples/my_file.parquet", storage=storage_loc).save()  # upload to s3://my_bucket

 Sometimes you want to **avoid mapping the artifact into a path
 hierarchy**, and you only pass "description":

 artifact = ln.Artifact("./my_folder", description="My folder").save()
 artifact_v2 = ln.Artifact("./my_folder", revises=old_artifact).save()  # need to version based on `revises`, a shared description does not trigger a new version

 -[ Notes ]-

 -[ Storage formats & object types ]-

 The "Artifact" registry tracks the storage format via "suffix" and
 an abstract object type via "otype".

| --- | --- | --- | --- |
| description | "suffix" | "otype" | Python type examples |
| ================== | ======================================== | ================== | ====================================================================== |
| table | ".csv", ".tsv", ".parquet", ".ipc" | ""DataFrame"" | "pandas.DataFrame", "polars.DataFrame", "pyarrow.Table" |
| --- | --- | --- | --- |
| annotated matrix | ".h5ad", ".zarr", ".h5mu" | ""AnnData"" | "anndata.AnnData" |
| --- | --- | --- | --- |
| stacked matrix | ".zarr" ".tiledbsoma" | ""MuData"" | "mudata.MuData" "tiledbsoma.Experiment" |
| ""tiledbsoma"" |
| --- | --- | --- | --- |
| spatial data | ".zarr" | ""SpatialData"" | "spatialdata.SpatialData" |
| --- | --- | --- | --- |
| generic arrays | ".h5", ".zarr", ".tiledb" | --- | "h5py.Dataset", "zarr.Array", "tiledb.Array" |
| --- | --- | --- | --- |
| unstructured | ".fastq", ".pdf", ".vcf", ".html" | --- | --- |
| --- | --- | --- | --- |

 You can map storage formats onto **R types**, e.g., an "AnnData"
 might be accessed via "anndataR".

 Because "otype" accepts any "str", you can define custom object
 types that enable queries & logic that you need, e.g.,
 ""SingleCellExperiment"" or ""MyCustomZarrDataStructure"".

 LaminDB makes some default choices (e.g., serialize a "DataFrame"
 as a ".parquet" file).

 -[ Will artifacts get duplicated? ]-

 If an artifact with the exact same hash already exists,
 "Artifact()" returns the existing artifact.

 In concurrent workloads where the same artifact is created
 repeatedly at the exact same time, ".save()" detects the
 duplication and will return the existing artifact.

 -[ Why does the constructor look the way it looks? ]-

 It's inspired by APIs building on AWS S3.

 Both boto3 and quilt select a bucket (a storage location in
 LaminDB) and define a target path through a "key" argument.

 In boto3:

 # signature: S3.Bucket.upload_file(filepath, key)
 import boto3
 s3 = boto3.resource('s3')
 bucket = s3.Bucket('mybucket')
 bucket.upload_file('/tmp/hello.txt', 'hello.txt')

 In quilt3:

 # signature: quilt3.Bucket.put_file(key, filepath)
 import quilt3
 bucket = quilt3.Bucket('mybucket')
 bucket.put_file('hello.txt', '/tmp/hello.txt')

 See also:

 "Storage"
 Storage locations for artifacts.

 "Collection"
 Collections of artifacts.

 "from_dataframe()"
 Create an artifact from a "DataFrame".

 "from_anndata()"
 Create an artifact from an "AnnData".

 property features: FeatureManager

 Feature manager.

 Typically, you annotate a dataset with features by defining a
 "Schema" and passing it to the "Artifact" constructor.

 Here is how to do annotate an artifact ad hoc:

 artifact.features.add_values({
 "species": "human",
 "scientist": ['Barbara McClintock', 'Edgar Anderson'],
 "temperature": 27.6,
 "experiment": "Experiment 1"
 })

 Query artifacts by features:

 ln.Artifact.filter(scientist="Barbara McClintock")

 Note: Features may or may not be part of the dataset, i.e., the
 artifact content in storage. For instance, the
 "DataFrameCurator" flow validates the columns of a
 "DataFrame"-like artifact and annotates it with features
 corresponding to these columns. "artifact.features.add_values",
 by contrast, does not validate the content of the artifact.

 To get all feature values:

 dictionary_of_values = artifact.features.get_values()

 The dicationary above uses identifiers, like the string "human"
 for an "Organism" object. The below, by contrast, returns a
 Python object for categorical features:

 organism = artifact.features["species"]  # returns an Organism object, not "human"
 temperature = artifact.features["temperature"]  # returns a temperature value, a float

 You can also validate external feature annotations with a
 "schema":

 schema = ln.Schema([ln.Feature(name="species", dtype=str).save()]).save()
 artifact.features.add_values({"species": "bird"}, schema=schema)

 property labels: LabelManager

 Label manager.

 A way to access all label annotations of an artifact,
 irrespective of their type.

 To annotate with labels, use the type-specific accessor, for
 example:

 experiment = ln.Record(name="Experiment 1").save()
 artifact.records.add(experiment)
 project = ln.Project(name="Project A").save()
 artifact.projects.add(project)

| property transform: Transform | None |

 Transform whose run created the artifact.

 property overwrite_versions: bool

 Indicates whether to keep or overwrite versions.

 It defaults to "False" for file-like artifacts and to "True" for
 folder-like artifacts.

 Note that this requires significant storage space for large
 folders with many duplicated files. Currently, "lamindb" does
 *not* de-duplicate files across versions as in git, but keeps
 all files for all versions of the folder in storage.

 property path: Path

 Path.

 Example:

 import lamindb as ln

 # File in cloud storage, here AWS S3:
 artifact = ln.Artifact("s3://my-bucket/my-file.csv").save()
 artifact.path
 #S3QueryPath('s3://my-bucket/my-file.csv')

 # File in local storage:
 ln.Artifact("./myfile.csv", key="myfile.csv").save()
 artifact.path
 #> PosixPath('/home/runner/work/lamindb/lamindb/docs/guide/mydata/myfile.csv')

 uid: str

 A universal random id.

| key: str | None |

 A (virtual) relative file path within the artifact's storage
 location.

 Setting a "key" is useful to automatically group artifacts into
 a version family.

 LaminDB defaults to a virtual file path to make renaming of data
 in object storage easy.

 If you register existing files in a storage location, the "key"
 equals the actual filepath on the underyling filesytem or object
 store.

| description: str | None |

 A description.

 suffix: str

 The path suffix or an empty string if no suffix exists.

 This is either a file suffix ("".csv"", "".h5ad"", etc.) or the
 empty string "".

| kind: ArtifactKind | str | None |

 "ArtifactKind" or custom "str" value (default "None").

| otype: Literal['DataFrame', 'AnnData', 'MuData', 'SpatialData', 'tiledbsoma'] | str | None |

 The object type represented as a string.

 The field is automatically set when using the
 "from_dataframe()", "from_anndata()", ... constructors.
 Unstructured artifacts have "otype=None".

 The field also accepts custom "str" values to allow for building
 logic around them in third-party packages.

 See section storage formats & object types for more background.

| size: int | None |

 The size in bytes.

 Examples: 1KB is 1e3 bytes, 1MB is 1e6, 1GB is 1e9, 1TB is 1e12
 etc.

| hash: str | None |

 The hash or pseudo-hash of the artifact content in storage.

 Useful to ascertain integrity and avoid duplication.

 Different versions of the artifact have different hashes.

| n_files: int | None |

 The number of files for folder-like artifacts.

 Is "None" for file-like artifacts.

 Note that some arrays are also stored as folders, e.g., ".zarr"
 or ".tiledbsoma".

| n_observations: int | None |

 The number of observations in this artifact.

 Typically, this denotes the first array dimension.

 storage: Storage

 Storage location, e.g. an S3 or GCP bucket or a local directory
 ← "artifacts".

| schema: Schema | None |

 The validating schema of this artifact ← "validated_artifacts".

 The validating schema is helpful to query artifacts that were
 validated by the same schema.

 input_of_runs: RelatedManager[Run]

 The runs that use this artifact as an input ← "input_artifacts".

 recreating_runs: RelatedManager[Run]

 The runs that re-created the artifact after its initial creation
 ← "recreated_artifacts".

 schemas: RelatedManager[Schema]

 The inferred schemas of this artifact ← "artifacts".

 The inferred schemas are helpful to answer the question: "Which
 features are present in the artifact?"

 The validating schema typically allows a range of valid actual
 dataset schemas. The inferred schemas link the actual schemas of
 the artifact, and are auto-generated by parsing the artifact
 content during validation.

 json_values: RelatedManager[JsonValue]

 The feature-indexed JSON values annotating this artifact ←
 "artifacts".

 artifacts: RelatedManager[Artifact]

 The annotating artifacts of this artifact ←
 "linked_by_artifacts".

 linked_in_records: RelatedManager[Record]

 The records linking this artifact as a feature value ←
 "linked_artifacts".

 users: RelatedManager[User]

 The users annotating this artifact ← "artifacts".

 runs: RelatedManager[Run]

 The runs annotating this artifact ← "artifacts".

 ulabels: RelatedManager[ULabel]

 The ulabels annotating this artifact ← "artifacts".

 linked_by_artifacts: RelatedManager[Artifact]

 The artifacts annotated by this artifact ← "artifacts".

 collections: RelatedManager[Collection]

 The collections that this artifact is part of ← "artifacts".

 records: RelatedManager[Record]

 The records annotating this artifact ← "artifacts".

 references: RelatedManager[Reference]

 The references annotating this artifact ← "artifacts".

 projects: RelatedManager[Project]

 The projects annotating this artifact ← "artifacts".

 ablocks: ArtifactBlock

 Attached blocks ← "artifact".

 get(*, key=None, path=None, is_run_input=False, **expressions)

 Get a single record.

 Parameters:
| * **idlike** (int | str | None, default: "None") -- Either a |
 uid stub, uid or an integer id.

 * **expressions** -- Fields and values passed as Django query
 expressions.

 Raises:
 **lamindb.errors.ObjectDoesNotExist** -- In case no matching
 record is found.

 Return type:
 Artifact

 See also:

 * Guide: Query & search registries

 * Django documentation: Queries

 -[ Examples ]-

 record = ln.Record.get("FvtpPJLJ")
 record = ln.Record.get(name="my-label")

 filter(**expressions)

 Query records.

 Parameters:
 * **queries** -- One or multiple "Q" objects.

 * **expressions** -- Fields and values passed as Django query
 expressions.

 Return type:
 "QuerySet"

 See also:

 * Guide: Query & search registries

 * Django documentation: Queries

 -[ Examples ]-

 >>> ln.Project(name="my label").save()
 >>> ln.Project.filter(name__startswith="my").to_dataframe()

 classmethod from_lazy(suffix, overwrite_versions, key=None, description=None, run=None, **kwargs)

 Create a lazy artifact for streaming to auto-generated internal
 paths.

 This is needed when it is desirable to stream to a "lamindb"
 auto-generated internal path and register the path as an
 artifact. It allows writing directly into the default cloud (or
 local) storage of the current instance and then saving as an
 "Artifact".

 The lazy artifact object (see "LazyArtifact") creates a real
 artifact on ".save()" with the provided arguments.

 Parameters:
 * **suffix** ("str") -- The suffix for the auto-generated
 internal path

 * **overwrite_versions** ("bool") -- Whether to overwrite
 versions.

| * **key** ("str" | "None", default: "None") -- An optional |
 key to reference the artifact.

| * **description** ("str" | "None", default: "None") -- A |
 description.

| * **run** ("Run" | "None", default: "None") -- The run that |
 creates the artifact.

 * ****kwargs** -- Other keyword arguments for the artifact to
 be created.

 Return type:
 "LazyArtifact"

 -[ Examples ]-

 Local storage: create a lazy artifact, stream to the path, then
 save:

 lazy = ln.Artifact.from_lazy(suffix=".zarr", overwrite_versions=True, key="mydata.zarr")
 zarr.open(lazy.path, mode="w")["test"] = np.array(["test"])
 artifact = lazy.save()

 Cloud storage (e.g. S3): use "zarr.storage.FsspecStore" to
 stream arrays:

 lazy = ln.Artifact.from_lazy(suffix=".zarr", overwrite_versions=True, key="mydata.zarr")
 store = zarr.storage.FsspecStore.from_url(lazy.path.as_posix())
 group = zarr.open(store, mode="w")
 group["ones"] = np.ones(3)
 artifact = lazy.save()

 classmethod from_dataframe(df, *, key=None, description=None, run=None, revises=None, schema=None, features=None, parquet_kwargs=None, csv_kwargs=None, **kwargs)

 Create from "DataFrame", optionally validate & annotate.

 Sets ".otype" to ""DataFrame"" and populates ".n_observations".

 Parameters:
| * **df** (pd.DataFrame | UPathStr) -- A "DataFrame" object or |
 a "UPathStr" pointing to a "DataFrame" in storage, e.g. a
 ".parquet" or ".csv" file.

| * **key** (str | None, default: "None") -- A relative path |
 within default storage, e.g., ""myfolder/myfile.parquet"".

| * **description** (str | None, default: "None") -- A |
 description.

| * **revises** (Artifact | None, default: "None") -- An old |
 version of the artifact.

| * **run** (Run | None, default: "None") -- The run that |
 creates the artifact.

| * **schema** (Schema | Literal['valid_features'] | None, |
 default: "None") -- A schema that defines how to validate &
 annotate.

| * **features** (dict[str, Any] | None, default: "None") -- |
 Additional external features to annotate the artifact via
 "set_values".

| * **parquet_kwargs** (dict[str, Any] | None, default: "None") |
 -- Additional keyword arguments passed to the
 "pandas.DataFrame.to_parquet" method, which are passed on
 to "pyarrow.parquet.ParquetWriter".

| * **csv_kwargs** (dict[str, Any] | None, default: "None") -- |
 Additional keyword arguments passed to the
 "pandas.DataFrame.to_csv" method.

 Return type:
 Artifact

 -[ Examples ]-

 No validation and annotation:

 ln.Artifact.from_dataframe(df, key="examples/dataset1.parquet").save()

 With validation and annotation:

 ln.Artifact.from_dataframe(df, key="examples/dataset1.parquet", schema="valid_features").save()

 Under-the-hood, this uses the following build-in schema
 ("valid_features()"):

 schema = ln.Schema(name="valid_features", itype="Feature").save()

 External features:

 import lamindb as ln
 from datetime import date

 df = ln.examples.datasets.mini_immuno.get_dataset1(otype="DataFrame")

 temperature = ln.Feature(name="temperature", dtype=float).save()
 date_of_study = ln.Feature(name="date_of_study", dtype=date).save()
 external_schema = ln.Schema(features=[temperature, date_of_study]).save()

 concentration = ln.Feature(name="concentration", dtype=str).save()
 donor = ln.Feature(name="donor", dtype=str, nullable=True).save()
 schema = ln.Schema(
 features=[concentration, donor],
 slots={"__external__": external_schema},
 otype="DataFrame",
 ).save()

 artifact = ln.Artifact.from_dataframe(
 df,
 key="examples/dataset1.parquet",
 features={"temperature": 21.6, "date_of_study": date(2024, 10, 1)},
 schema=schema,
 ).save()
 artifact.describe()

 Parquet kwargs:

 import lamindb as ln
 import pandas as pd
 import pyarrow.parquet as pq

 def test_parquet_kwargs():
 df = pd.DataFrame(
 {
 "a": [3, 1, 4, 2],
 "b": ["c", "a", "d", "b"],
 "c": [3.3, 1.1, 4.4, 2.2],
 }
 )
 df_sorted = df.sort_values(by=["a", "b"])
 sorting_columns = [
 pq.SortingColumn(0, descending=False, nulls_first=False),
 pq.SortingColumn(1, descending=False, nulls_first=False),
 ]
 artifact = ln.Artifact.from_dataframe(
 df_sorted,
 key="df_sorted.parquet",
 parquet_kwargs={"sorting_columns": sorting_columns},
 ).save()
 pyarrow_dataset = artifact.open()
 fragment = next(pyarrow_dataset.get_fragments())
 assert list(fragment.metadata.row_group(0).sorting_columns) == sorting_columns

 classmethod from_anndata(adata, *, key=None, description=None, run=None, revises=None, schema=None, **kwargs)

 Create from "AnnData", optionally validate & annotate.

 Sets ".otype" to ""AnnData"" and populates ".n_observations".

 Parameters:
| * **adata** ("AnnData" | lamindb.core.types.UPathStr) -- An |
 "AnnData" object or a path of AnnData-like.

| * **key** ("str" | "None", default: "None") -- A relative |
 path within default storage, e.g.,
 ""myfolder/myfile.h5ad"".

| * **description** ("str" | "None", default: "None") -- A |
 description.

| * **revises** ("Artifact" | "None", default: "None") -- An |
 old version of the artifact.

| * **run** ("Run" | "None", default: "None") -- The run that |
 creates the artifact.

| * **schema** ("Schema" |
| "Literal"["'ensembl_gene_ids_and_valid_features_in_obs'"] |
 "None", default: "None") -- A schema that defines how to
 validate & annotate.

 Return type:
 "Artifact"

 See also:

 "Collection()"
 Track collections.

 "Feature"
 Track features.

 -[ Example ]-

 No validation and annotation:

 ln.Artifact.from_anndata(adata, key="examples/dataset1.h5ad").save()

 With validation and annotation:

 ln.Artifact.from_anndata(adata, key="examples/dataset1.h5ad", schema="ensembl_gene_ids_and_valid_features_in_obs").save()

 Under-the-hood, this uses the following build-in schema
 ("anndata_ensembl_gene_ids_and_valid_features_in_obs()"):

 import bionty as bt

 import lamindb as ln

 obs_schema = ln.examples.schemas.valid_features()
 varT_schema = ln.Schema(
 name="valid_ensembl_gene_ids", itype=bt.Gene.ensembl_gene_id
 ).save()
 schema = ln.Schema(
 name="anndata_ensembl_gene_ids_and_valid_features_in_obs",
 otype="AnnData",
 slots={"obs": obs_schema, "var.T": varT_schema},
 ).save()

 This schema tranposes the "var" DataFrame during curation, so
 that one validates and annotates the columns of "var.T", i.e.,
 "[ENSG00000153563, ENSG00000010610, ENSG00000170458]". If one
 doesn't transpose, one would annotate the columns of "var",
 i.e., "[gene_symbol, gene_type]".

 [image]

 classmethod from_mudata(mdata, *, key=None, description=None, run=None, revises=None, schema=None, **kwargs)

 Create from "MuData", optionally validate & annotate.

 Sets ".otype" to ""MuData"".

 Parameters:
| * **mdata** ("MuData" | lamindb.core.types.UPathStr) -- A |
 "MuData" object.

| * **key** ("str" | "None", default: "None") -- A relative |
 path within default storage, e.g.,
 ""myfolder/myfile.h5mu"".

| * **description** ("str" | "None", default: "None") -- A |
 description.

| * **revises** ("Artifact" | "None", default: "None") -- An |
 old version of the artifact.

| * **run** ("Run" | "None", default: "None") -- The run that |
 creates the artifact.

| * **schema** ("Schema" | "None", default: "None") -- A schema |
 that defines how to validate & annotate.

 Return type:
 "Artifact"

 See also:

 "Collection()"
 Track collections.

 "Feature"
 Track features.

 Example:

 import lamindb as ln

 mdata = ln.examples.datasets.mudata_papalexi21_subset()
 artifact = ln.Artifact.from_mudata(mdata, key="mudata_papalexi21_subset.h5mu").save()

 classmethod from_spatialdata(sdata, *, key=None, description=None, run=None, revises=None, schema=None, **kwargs)

 Create from "SpatialData", optionally validate & annotate.

 Sets ".otype" to ""SpatialData"".

 Parameters:
| * **sdata** (SpatialData | UPathStr) -- A "SpatialData" |
 object.

| * **key** (str | None, default: "None") -- A relative path |
 within default storage, e.g., ""myfolder/myfile.zarr"".

| * **description** (str | None, default: "None") -- A |
 description.

| * **revises** (Artifact | None, default: "None") -- An old |
 version of the artifact.

| * **run** (Run | None, default: "None") -- The run that |
 creates the artifact.

| * **schema** (Schema | None, default: "None") -- A schema |
 that defines how to validate & annotate.

 Return type:
 Artifact

 See also:

 "Collection()"
 Track collections.

 "Feature"
 Track features.

 -[ Example ]-

 No validation and annotation:

 import lamindb as ln

 artifact = ln.Artifact.from_spatialdata(sdata, key="my_dataset.zarr").save()

 With validation and annotation.

 import lamindb as ln
 import bionty as bt

 attrs_schema = ln.Schema(
 features=[
 ln.Feature(name="bio", dtype=dict).save(),
 ln.Feature(name="tech", dtype=dict).save(),
 ],
 ).save()

 sample_schema = ln.Schema(
 features=[
 ln.Feature(name="disease", dtype=bt.Disease, coerce=True).save(),
 ln.Feature(
 name="developmental_stage",
 dtype=bt.DevelopmentalStage,
 coerce=True,
 ).save(),
 ],
 ).save()

 tech_schema = ln.Schema(
 features=[
 ln.Feature(name="assay", dtype=bt.ExperimentalFactor, coerce=True).save(),
 ],
 ).save()

 obs_schema = ln.Schema(
 features=[
 ln.Feature(name="sample_region", dtype="str").save(),
 ],
 ).save()

 uns_schema = ln.Schema(
 features=[
 ln.Feature(name="analysis", dtype="str").save(),
 ],
 ).save()

 # Schema enforces only registered Ensembl Gene IDs are valid (maximal_set=True)
 varT_schema = ln.Schema(itype=bt.Gene.ensembl_gene_id, maximal_set=True).save()

 sdata_schema = ln.Schema(
 name="spatialdata_blobs_schema",
 otype="SpatialData",
 slots={
 "attrs:bio": sample_schema,
 "attrs:tech": tech_schema,
 "attrs": attrs_schema,
 "tables:table:obs": obs_schema,
 "tables:table:var.T": varT_schema,
 },
 ).save()

 import lamindb as ln

 spatialdata = ln.examples.datasets.spatialdata_blobs()
 sdata_schema = ln.Schema.get(name="spatialdata_blobs_schema")
 curator = ln.curators.SpatialDataCurator(spatialdata, sdata_schema)
 try:
 curator.validate()
 except ln.errors.ValidationError:
 pass

 spatialdata.tables["table"].var.drop(index="ENSG00000999999", inplace=True)

 # validate again (must pass now) and save artifact
 artifact = ln.Artifact.from_spatialdata(
 spatialdata, key="examples/spatialdata1.zarr", schema=sdata_schema
 ).save()
 artifact.describe()

 classmethod from_tiledbsoma(exp, *, key=None, description=None, run=None, revises=None, **kwargs)

 Create from a "tiledbsoma.Experiment" store.

 Sets ".otype" to ""tiledbsoma"" and populates ".n_observations".

 Parameters:
| * **exp** (SOMAExperiment | UPathStr) -- TileDB-SOMA |
 Experiment object or path to Experiment store.

| * **key** (str | None, default: "None") -- A relative path |
 within default storage, e.g.,
 ""myfolder/mystore.tiledbsoma"".

| * **description** (str | None, default: "None") -- A |
 description.

| * **revises** (Artifact | None, default: "None") -- An old |
 version of the artifact.

| * **run** (Run | None, default: "None") -- The run that |
 creates the artifact.

 Return type:
 Artifact

 Example:

 import lamindb as ln

 artifact = ln.Artifact.from_tiledbsoma("s3://mybucket/store.tiledbsoma", description="a tiledbsoma store").save()

 classmethod from_dir(path, *, key=None, run=None)

 Create a list of "Artifact" objects from a directory.

 Hint:

 If you have a high number of files (several 100k) and don't
 want to track them individually, create a single "Artifact"
 via "Artifact(path)" for them. See, e.g., RxRx: cell imaging.

 Parameters:
 * **path** (lamindb.core.types.UPathStr) -- Source path of
 folder.

| * **key** ("str" | "None", default: "None") -- Key for |
 storage destination. If "None" and directory is in a
 registered location, the inferred "key" will reflect the
 relative position. If "None" and directory is outside of a
 registered storage location, the inferred key defaults to
 "path.name".

| * **run** ("Run" | "None", default: "None") -- A "Run" |
 object.

 Return type:
 "SQLRecordList"

 Example:

 import lamindb as ln

 dir_path = ln.examples.datasets.dir_scrnaseq_cellranger("sample_001", ln.settings.storage)
 ln.Artifact.from_dir(dir_path).save()  # creates one artifact per file in dir_path

 replace(data, run=None, format=None)

 Replace the artifact content in storage **without** making a new
 version.

 **Note:** If you want to create a new version, do **not** use
 the ".replace()" method but rather any "Artifact" constructor.

 Parameters:
| * **data** (lamindb.core.types.UPathStr | "DataFrame" |
| "AnnData" | "MuData") -- A file path or in-memory dataset |
 object like a "DataFrame", "AnnData", "MuData", or
 "SpatialData".

| * **run** ("Run" | "bool" | "None", default: "None") -- "Run |
| bool | None = None" The run that creates the artifact. If |
 "False", suppress tracking the run. If "None", infer the
 run from the global run context.

| * **format** ("str" | "None", default: "None") -- "str | None |
 = None" The format of the data to write into storage. If
 "None", infer the format from the data.

 Return type:
 "None"

 -[ Example ]-

 Query a text file and replace its content:

 artifact = ln.Artifact.get(key="my_file.txt")
 artifact.replace("./my_new_file.txt")
 artifact.save()

 Note that you need to call ".save()" to persist the changes in
 storage.

 open(mode='r', engine='pyarrow', is_run_input=None, **kwargs)

 Open a dataset for streaming.

 Works for the following object types (storage formats):

 * "DataFrame" (".parquet", ".csv", ".ipc" files or directories
 with such files)

 * "AnnData" (".h5ad", ".zarr")

 * "SpatialData" (".zarr")

 * "tiledbsoma" (".tiledbsoma")

 * generic arrays (".h5", ".zarr")

 Parameters:
 * **mode** (str, default: "'r'") -- can be ""r"" or ""w""
 (write mode) for "tiledbsoma" stores, ""r"" or ""r+"" for
 "AnnData" or "SpatialData" "zarr" stores, otherwise should
 be always ""r"" (read-only mode).

 * **engine** (Literal['pyarrow', 'polars'], default:
 "'pyarrow'") -- Which module to use for lazy loading of a
 dataframe from "pyarrow" or "polars" compatible formats.
 This has no effect if the artifact is not a dataframe, i.e.
 if it is an "AnnData," "hdf5", "zarr", "tiledbsoma" object
 etc.

| * **is_run_input** (bool | None, default: "None") -- Whether |
 to track this artifact as run input.

 * ****kwargs** -- Keyword arguments for the accessor, i.e.
 "h5py" or "zarr" connection, "pyarrow.dataset.dataset",
 "polars.scan_*" function.

 Return type:
| PyArrowDataset | Iterator[PolarsLazyFrame] | AnnDataAccessor |
| SpatialDataAccessor | BackedAccessor | SOMACollection |
| SOMAExperiment | SOMAMeasurement |

 Returns:
 Streaming accessors, in particular, a
 "pyarrow.dataset.Dataset" object, a context manager yielding
 a polars.LazyFrame, and objects of type "AnnDataAccessor",
 "SpatialDataAccessor", "BackedAccessor",
 "tiledbsoma.Collection", "tiledbsoma.Experiment",
 "tiledbsoma.Measurement".

 -[ Examples ]-

 Open a "DataFrame"-like artifact via "pyarrow.dataset.Dataset":

 artifact = ln.Artifact.get(key="sequences/mydataset.parquet")
 artifact.open()
 #> pyarrow._dataset.FileSystemDataset

 Open a "DataFrame"-like artifact via polars.LazyFrame:

 artifact = ln.Artifact.get(key="sequences/mydataset.parquet")
 with artifact.open(engine="polars") as df:
 # use the `polars.LazyFrame` object similar to a `DataFrame` object

 Open an "AnnData"-like artifact via "AnnDataAccessor":

 import lamindb as ln

 artifact = ln.Artifact.get(key="scrna/mydataset.h5ad")
 with artifact.open() as adata:
 # use the `AnnDataAccessor` similar to an `AnnData` object

 For more examples and background, see guide: Stream datasets
 from storage .

 load(*, is_run_input=None, mute=False, **kwargs)

 Cache artifact in local cache and then load it into memory.

 See: "loaders".

 Parameters:
| * **is_run_input** ("bool" | "None", default: "None") -- |
 Whether to track this artifact as run input.

 * **mute** ("bool", default: "False") -- Silence logging of
 caching progress.

 * ****kwargs** -- Keyword arguments for the loader.

 Return type:
 "Any"

 -[ Examples ]-

 Load a "DataFrame"-like artifact:

 df = artifact.load()

 Load an "AnnData"-like artifact:

 adata = artifact.load()

 cache(*, is_run_input=None, mute=False, **kwargs)

 Download cloud artifact to local cache.

 Follows synching logic: only caches an artifact if it's outdated
 in the local cache.

 Returns a path to a locally cached on-disk object (say a ".jpg"
 file).

 Parameters:
 * **mute** ("bool", default: "False") -- Silence logging of
 caching progress.

| * **is_run_input** ("bool" | "None", default: "None") -- |
 Whether to track this artifact as run input.

 Return type:
 "Path"

 -[ Example ]-

 Sync the artifact from the cloud and return the local path to
 the cached file:

 artifact.cache()
 #> PosixPath('/home/runner/work/Caches/lamindb/lamindata/pbmc68k.h5ad')

 delete(permanent=None, storage=None, using_key=None)

 Trash or permanently delete.

 A first call to ".delete()" puts an artifact into the trash
 (sets "branch_id" to "-1"). A second call permanently deletes
 the artifact.

 For an "artifact" that has multiple versions and for which
 "artifact.overwrite_versions is True", the default behavior for
 folders, deleting a non-latest version will not delete the
 underlying storage unless "storage=True" is passed. Deleting the
 latest version will delete all versions.

 Parameters:
| * **permanent** ("bool" | "None", default: "None") -- |
 Permanently delete the artifact (skip trash).

| * **storage** ("bool" | "None", default: "None") -- Indicate |
 whether you want to delete the artifact in storage.

 Return type:
 "None"

 -[ Examples ]-

 Delete a single file artifact:

 import lamindb as ln

 artifact = ln.Artifact.get(key="some.csv")
 artifact.delete() # delete a single file artifact

 Delete an old version of a folder-like artifact:

 artifact = ln.Artifact.filter(key="folder.zarr", is_latest=False).first()
 artiact.delete() # delete an old version, the data will not be deleted

 Delete all versions of a folder-like artifact:

 artifact = ln.Artifact.get(key="folder.zarr". is_latest=True)
 artifact.delete() # delete all versions, the data will be deleted or prompted for deletion.

 save(upload=None, transfer='record', **kwargs)

 Save to database & storage.

 Parameters:
| * **upload** ("bool" | "None", default: "None") -- Trigger |
 upload to cloud storage in instances with hybrid storage
 mode.

 * **transfer** ("Literal"["'record'", "'annotations'"],
 default: "'record'") -- In case artifact was queried on a
 different instance, dictates behavior of transfer. If
 "record", only the artifact record is transferred to the
 current instance. If "annotations", also the annotations
 linked in the source instance are transferred.

 Return type:
 "Artifact"

 See also: Transfer data

 -[ Example ]-

 Save a file-like artifact after creating it with the default
 constructor "Artifact()":

 import lamindb as ln

 artifact = ln.Artifact("./myfile.csv", key="myfile.parquet").save()

 view_lineage(with_children=True, return_graph=False)

 View data lineage graph.

 Return type:
| "Digraph" | "None" |