lamindb.models.ArtifactSet¶
- class lamindb.models.ArtifactSet¶
Bases:
Iterable
Abstract class representing sets of artifacts returned by queries.
This class automatically extends
BasicQuerySet
andQuerySet
when the base model isArtifact
.Examples
>>> artifacts = ln.Artifact.filter(otype="AnnData") >>> artifacts # an instance of ArtifactQuerySet inheriting from ArtifactSet
Methods¶
- load(join='outer', is_run_input=None, **kwargs)¶
Cache and load to memory.
Returns an in-memory concatenated
DataFrame
orAnnData
object.- Return type:
DataFrame
|AnnData
- mapped(layers_keys=None, obs_keys=None, obsm_keys=None, obs_filter=None, join='inner', encode_labels=True, unknown_label=None, cache_categories=True, parallel=False, dtype=None, stream=False, is_run_input=None)¶
Return a map-style dataset.
Returns a pytorch map-style dataset by virtually concatenating
AnnData
arrays.By default (
stream=False
)AnnData
arrays are moved into a local cache first.__getitem__
of theMappedCollection
object takes a single integer index and returns a dictionary with the observation data sample for this index from theAnnData
objects in the collection. The dictionary has keys forlayers_keys
(.X
is in"X"
),obs_keys
,obsm_keys
(underf"obsm_{key}"
) and also"_store_idx"
for the index of theAnnData
object containing this observation sample.Note
For a guide, see Train a machine learning model on a collection.
This method currently only works for collections or query sets of
AnnData
artifacts.- Parameters:
layers_keys (
str
|list
[str
] |None
, default:None
) – Keys from the.layers
slot.layers_keys=None
or"X"
in the list retrieves.X
.obs_keys (
str
|list
[str
] |None
, default:None
) – Keys from the.obs
slots.obsm_keys (
str
|list
[str
] |None
, default:None
) – Keys from the.obsm
slots.obs_filter (
dict
[str
,str
|list
[str
]] |None
, default:None
) – Select only observations with these values for the given obs columns. Should be a dictionary with obs column names as keys and filtering values (a string or a list of strings) as values.join (
Literal
['inner'
,'outer'
] |None
, default:'inner'
) –"inner"
or"outer"
virtual joins. IfNone
is passed, does not join.encode_labels (
bool
|list
[str
], default:True
) – Encode labels into integers. Can be a list with elements fromobs_keys
.unknown_label (
str
|dict
[str
,str
] |None
, default:None
) – Encode this label to -1. Can be a dictionary with keys fromobs_keys
ifencode_labels=True
or fromencode_labels
if it is a list.cache_categories (
bool
, default:True
) – Enable caching categories ofobs_keys
for faster access.parallel (
bool
, default:False
) – Enable sampling with multiple processes.dtype (
str
|None
, default:None
) – Convert numpy arrays from.X
,.layers
and.obsm
stream (
bool
, default:False
) – Whether to stream data from the array backend.is_run_input (
bool
|None
, default:None
) – Whether to track this collection as run input.
- Return type:
Examples
>>> import lamindb as ln >>> from torch.utils.data import DataLoader >>> ds = ln.Collection.get(description="my collection") >>> mapped = collection.mapped(obs_keys=["cell_type", "batch"]) >>> dl = DataLoader(mapped, batch_size=128, shuffle=True) >>> # also works for query sets of artifacts, '...' represents some filtering condition >>> # additional filtering on artifacts of the collection >>> mapped = collection.artifacts.all().filter(...).order_by("-created_at").mapped() >>> # or directly from a query set of artifacts >>> mapped = ln.Artifact.filter(..., otype="AnnData").order_by("-created_at").mapped()
- open(engine='pyarrow', is_run_input=None, **kwargs)¶
Open a dataset for streaming.
Works for
pyarrow
andpolars
compatible formats (.parquet
,.csv
,.ipc
etc. files or directories with such files).- Parameters:
engine (
Literal
['pyarrow'
,'polars'
], default:'pyarrow'
) – Which module to use for lazy loading of a dataframe frompyarrow
orpolars
compatible formats.is_run_input (
bool
|None
, default:None
) – Whether to track this artifact as run input.**kwargs – Keyword arguments for
pyarrow.dataset.dataset
orpolars.scan_*
functions.
- Return type:
Dataset
|Iterator
[LazyFrame
]
Notes
For more info, see guide: Slice arrays.