lamindb.Feature

class lamindb.Feature(name: str, type: str | list[type[Registry]], unit: str | None, description: str | None, synonyms: str | None)

Bases: Registry, CanValidate, TracksRun, TracksUpdates

Dataset dimensions.

A feature is a random variable or, equivalently, dimension of a dataset. The Feature registry helps to

  1. manage metadata of features

  2. annotate datasets by whether they measured a feature

Learn more: Tutorial: Features & labels.

Parameters:
  • namestr Name of the feature, typically. column name.

  • typestr | list[Type[Registry]] Data type (“number”, “cat”, “int”, “float”, “bool”, “datetime”). For categorical types, can define from which registry values are sampled, e.g., cat[ULabel] or cat[bionty.CellType].

  • unitstr | None = None Unit of measure, ideally SI ("m", "s", "kg", etc.) or "normalized" etc.

  • descriptionstr | None = None A description.

  • synonymsstr | None = None Bar-separated synonyms.

Note

For more control, you can use bionty registries to manage basic biological entities like genes, proteins & cell markers. Or you define custom registries to manage high-level derived features like gene sets.

See also

from_df()

Create feature records from DataFrame.

features

Feature manager of an artifact or collection.

ULabel

Universal labels.

FeatureSet

Feature sets.

Example

>>> ln.Feature("cell_type_by_expert", dtype="cat", description="Expert cell type annotation").save()

Hint

Features and labels denote two ways of using entities to organize data:

  1. A feature qualifies what is measured, i.e.. numerical or categorical random variable

  2. A label is a measured value, i.e.. category

Consider annotating a dataset by that it measured expression of 30k genes: genes relate to the dataset as feature identifiers through a feature set with 30k members. Now consider annotating the artifact by whether that it measured the knock-out of 3 genes: here, the 3 genes act as labels of the dataset.

Re-shaping data can introduce ambiguity among features & labels. If this happened. sk yourself what the joint measurement was: a feature qualifies variables in a joint measurement. The canonical data matrix lists jointly measured variables in the columns.

Fields

run ForeignKey

Last run that created or updated the record. Run

id AutoField

Internal id, valid only in one DB instance.

uid CharField

Universal id, valid across DB instances.

name CharField

Name of feature (required).

dtype CharField

Data type (“number”, “cat”, “int”, “float”, “bool”, “datetime”).

For categorical types, can define from which registry values are sampled, e.g., cat[ULabel] or cat[bionty.CellType].

unit CharField

Unit of measure, ideally SI (m, s, kg, etc.) or ‘normalized’ etc. (optional).

description TextField

A description.

synonyms TextField

Bar-separated (|) synonyms (optional).

previous_runs ManyToManyField

Sequence of runs that created or updated the record.

feature_sets ManyToManyField

Feature sets linked to this feature.

created_at DateTimeField

Time of creation of record.

created_by ForeignKey

Creator of record. User

updated_at DateTimeField

Time of last update to record.

Methods

classmethod from_df(df, field=None)

Create Feature records for columns..

Return type:

RecordsList

save(*args, **kwargs)

Save.

Return type:

Feature