lamindb.FeatureSet

class lamindb.FeatureSet(features: Iterable[Record], dtype: str | None = None, name: str | None = None)

Bases: Record, TracksRun

Feature sets.

Stores references to sets of Feature and other registries that may be used to identify features (e.g., class:~bionty.Gene or class:~bionty.Protein).

Why does LaminDB model feature sets, not just features?
  1. Performance: Imagine you measure the same panel of 20k transcripts in 1M samples. By modeling the panel as a feature set, you can link all your artifacts against one feature set and only need to store 1M instead of 1M x 20k = 20B links.

  2. Interpretation: Model protein panels, gene panels, etc.

  3. Data integration: Feature sets provide the currency that determines whether two collections can be easily concatenated.

These reasons do not hold for label sets. Hence, LaminDB does not model label sets.

Parameters:
  • featuresIterable[Record] An iterable of Feature records to hash, e.g., [Feature(...), Feature(...)]. Is turned into a set upon instantiation. If you’d like to pass values, use from_values() or from_df().

  • dtypestr | None = None The simple type. Defaults to None for sets of Feature records. nd otherwise defaults to "number" (e.g., for sets of Gene).

  • namestr | None = None A name.

Note

A feature set is identified by the hash of the feature uids in the set.

A slot provides a string key to access feature sets. It’s typically the accessor within the registered data object, here pd.DataFrame.columns.

See also

from_values()

Create from values.

from_df()

Create from dataframe columns.

Examples

Create a featureset from df with types:

>>> df = pd.DataFrame({"feat1": [1, 2], "feat2": [3.1, 4.2], "feat3": ["cond1", "cond2"]})
>>> feature_set = ln.FeatureSet.from_df(df)

Create a featureset from features:

>>> features = ln.Feature.from_values(["feat1", "feat2"], type=float)
>>> feature_set = ln.FeatureSet(features)

Create a featureset from feature values:

>>> import bionty as bt
>>> feature_set = ln.FeatureSet.from_values(adata.var["ensemble_id"], Gene.ensembl_gene_id, orgaism="mouse")
>>> feature_set.save()

Link a feature set to an artifact:

>>> artifact.features.add_feature_set(feature_set, slot="var")

Link features to an artifact (will create a featureset under the hood):

>>> artifact.features.add_values(features)

Attributes

property members: QuerySet

A queryset for the individual records of the set.

Fields

run: Run

Last run that created or updated the record.

id: int

Internal id, valid only in one DB instance.

uid: str

A universal id (hash of the set of feature values).

name: str

A name (optional).

n

Number of features in the set.

dtype: str

Data type, e.g., “number”, “float”, “int”. Is None for Feature.

For Feature, types are expected to be heterogeneous and defined on a per-feature level.

registry: str

The registry that stores the feature identifiers, e.g., 'core.Feature' or 'bionty.Gene'.

Depending on the registry, .members stores, e.g. Feature or Gene records.

hash: str

The hash of the set.

created_at: datetime

Time of creation of record.

created_by: User

Creator of record.

Methods

classmethod from_df(df, field=FieldAttr(Feature.name), name=None, mute=False, organism=None, source=None)

Create feature set for validated features.

Return type:

FeatureSet | None

classmethod from_values(values, field=FieldAttr(Feature.name), type=None, name=None, mute=False, organism=None, source=None, raise_validation_error=True)

Create feature set for validated features.

Parameters:
  • values (List[str] | Series | array) – A list of values, like feature names or ids.

  • field (DeferredAttribute, default: FieldAttr(Feature.name)) – The field of a reference registry to map values.

  • type (str | None, default: None) – The simple type. Defaults to None if reference registry is Feature, defaults to "float" otherwise.

  • name (str | None, default: None) – A name.

  • organism (str | Record | None, default: None) – An organism to resolve gene mapping.

  • source (Record | None, default: None) – A public ontology to resolve feature identifier mapping.

  • raise_validation_error (bool, default: True) – Whether to raise a validation error if some values are not valid.

Raises:

ValidationError – If some values are not valid.

Return type:

FeatureSet

Examples

>>> features = ["feat1", "feat2"]
>>> feature_set = ln.FeatureSet.from_values(features)
>>> genes = ["ENS980983409", "ENS980983410"]
>>> feature_set = ln.FeatureSet.from_values(features, bt.Gene.ensembl_gene_id, float)
save(*args, **kwargs)

Save.

Return type:

FeatureSet