lamindb.Feature

class lamindb.Feature(name: str, type: str | list[type[Record]], unit: str | None, description: str | None, synonyms: str | None)

Bases: Record, CanValidate, TracksRun, TracksUpdates

Dataset dimensions.

Features denote dataset dimensions, i.e., the variables that measure labels & numbers.

The Feature registry helps to

  1. manage metadata of features

  2. annotate datasets by whether they measured a feature

Learn more: Tutorial: Features & labels.

Parameters:
  • namestr Name of the feature, typically. column name.

  • typestr | list[Type[Record]] Data type (“number”, “cat”, “int”, “float”, “bool”, “datetime”). For categorical types, can define from which registry values are sampled, e.g., cat[ULabel] or cat[bionty.CellType].

  • unitstr | None = None Unit of measure, ideally SI ("m", "s", "kg", etc.) or "normalized" etc.

  • descriptionstr | None = None A description.

  • synonymsstr | None = None Bar-separated synonyms.

Note

For more control, you can use bionty registries to manage basic biological entities like genes, proteins & cell markers. Or you define custom registries to manage high-level derived features like gene sets.

See also

from_df()

Create feature records from DataFrame.

features

Feature manager of an artifact or collection.

ULabel

Universal labels.

FeatureSet

Feature sets.

Example

>>> ln.Feature("cell_type_by_expert", dtype="cat", description="Expert cell type annotation").save()

Hint

Features and labels denote two ways of using entities to organize data:

  1. A feature qualifies what is measured, i.e.. numerical or categorical random variable

  2. A label is a measured value, i.e.. category

Consider annotating a dataset by that it measured expression of 30k genes: genes relate to the dataset as feature identifiers through a feature set with 30k members. Now consider annotating the artifact by whether that it measured the knock-out of 3 genes: here, the 3 genes act as labels of the dataset.

Re-shaping data can introduce ambiguity among features & labels. If this happened. sk yourself what the joint measurement was: a feature qualifies variables in a joint measurement. The canonical data matrix lists jointly measured variables in the columns.

Fields

run: Run

Last run that created or updated the record.

id: int

Internal id, valid only in one DB instance.

uid: str

Universal id, valid across DB instances.

name: str

Name of feature (required).

dtype: str

Data type (“number”, “cat”, “int”, “float”, “bool”, “datetime”).

For categorical types, can define from which registry values are sampled, e.g., cat[ULabel] or cat[bionty.CellType].

unit: str

Unit of measure, ideally SI (m, s, kg, etc.) or ‘normalized’ etc. (optional).

description: str

A description.

synonyms: str

Bar-separated (|) synonyms (optional).

previous_runs: Run

Sequence of runs that created or updated the record.

feature_sets: FeatureSet

Feature sets linked to this feature.

created_at: datetime

Time of creation of record.

created_by: User

Creator of record.

updated_at: datetime

Time of last update to record.

Methods

classmethod from_df(df, field=None)

Create Feature records for columns.

Return type:

RecordsList

save(*args, **kwargs)

Save.

Return type:

Feature