lamindb.models.ArtifactSet¶
- class lamindb.models.ArtifactSet¶
- Bases: - Iterable- Abstract class representing sets of artifacts returned by queries. - This class automatically extends - BasicQuerySetand- QuerySetwhen the base model is- Artifact.- Examples - >>> artifacts = ln.Artifact.filter(otype="AnnData") >>> artifacts # an instance of ArtifactQuerySet inheriting from ArtifactSet - Methods¶- load(join='outer', is_run_input=None, **kwargs)¶
- Cache and load to memory. - Returns an in-memory concatenated - DataFrameor- AnnDataobject.- Return type:
- DataFrame|- AnnData
 
 - open(engine='pyarrow', is_run_input=None, **kwargs)¶
- Open a dataset for streaming. - Works for - pyarrowand- polarscompatible formats (- .parquet,- .csv,- .ipcetc. files or directories with such files).- Parameters:
- engine ( - Literal[- 'pyarrow',- 'polars'], default:- 'pyarrow') – Which module to use for lazy loading of a dataframe from- pyarrowor- polarscompatible formats.
- is_run_input ( - bool|- None, default:- None) – Whether to track this artifact as run input.
- **kwargs – Keyword arguments for - pyarrow.dataset.datasetor- polars.scan_*functions.
 
- Return type:
- Dataset|- Iterator[- LazyFrame]
 - Notes - For more info, see guide: Slice & stream arrays. 
 - mapped(layers_keys=None, obs_keys=None, obsm_keys=None, obs_filter=None, join='inner', encode_labels=True, unknown_label=None, cache_categories=True, parallel=False, dtype=None, stream=False, is_run_input=None)¶
- Return a map-style dataset. - Returns a pytorch map-style dataset by virtually concatenating - AnnDataarrays.- By default ( - stream=False)- AnnDataarrays are moved into a local cache first.- __getitem__of the- MappedCollectionobject takes a single integer index and returns a dictionary with the observation data sample for this index from the- AnnDataobjects in the collection. The dictionary has keys for- layers_keys(- .Xis in- "X"),- obs_keys,- obsm_keys(under- f"obsm_{key}") and also- "_store_idx"for the index of the- AnnDataobject containing this observation sample.- Note - For a guide, see Train a machine learning model on a collection. - This method currently only works for collections or query sets of - AnnDataartifacts.- Parameters:
- layers_keys ( - str|- list[- str] |- None, default:- None) – Keys from the- .layersslot.- layers_keys=Noneor- "X"in the list retrieves- .X.
- obs_keys ( - str|- list[- str] |- None, default:- None) – Keys from the- .obsslots.
- obsm_keys ( - str|- list[- str] |- None, default:- None) – Keys from the- .obsmslots.
- obs_filter ( - dict[- str,- str|- list[- str]] |- None, default:- None) – Select only observations with these values for the given obs columns. Should be a dictionary with obs column names as keys and filtering values (a string or a list of strings) as values.
- join ( - Literal[- 'inner',- 'outer'] |- None, default:- 'inner') –- "inner"or- "outer"virtual joins. If- Noneis passed, does not join.
- encode_labels ( - bool|- list[- str], default:- True) – Encode labels into integers. Can be a list with elements from- obs_keys.
- unknown_label ( - str|- dict[- str,- str] |- None, default:- None) – Encode this label to -1. Can be a dictionary with keys from- obs_keysif- encode_labels=Trueor from- encode_labelsif it is a list.
- cache_categories ( - bool, default:- True) – Enable caching categories of- obs_keysfor faster access.
- parallel ( - bool, default:- False) – Enable sampling with multiple processes.
- dtype ( - str|- None, default:- None) – Convert numpy arrays from- .X,- .layersand- .obsm
- stream ( - bool, default:- False) – Whether to stream data from the array backend.
- is_run_input ( - bool|- None, default:- None) – Whether to track this collection as run input.
 
- Return type:
 - Examples - >>> import lamindb as ln >>> from torch.utils.data import DataLoader >>> ds = ln.Collection.get(description="my collection") >>> mapped = collection.mapped(obs_keys=["cell_type", "batch"]) >>> dl = DataLoader(mapped, batch_size=128, shuffle=True) >>> # also works for query sets of artifacts, '...' represents some filtering condition >>> # additional filtering on artifacts of the collection >>> mapped = collection.artifacts.all().filter(...).order_by("-created_at").mapped() >>> # or directly from a query set of artifacts >>> mapped = ln.Artifact.filter(..., otype="AnnData").order_by("-created_at").mapped()