lamindb.Feature¶
- class lamindb.Feature(name: str, type: str | list[type[Record]], unit: str | None, description: str | None, synonyms: str | None)¶
Bases:
Record
,CanValidate
,TracksRun
,TracksUpdates
Dataset dimensions.
Features denote dataset dimensions, i.e., the variables that measure labels & numbers.
The
Feature
registry helps tomanage metadata of features
annotate datasets by whether they measured a feature
Learn more: Tutorial: Features & labels.
- Parameters:
name –
str
Name of the feature, typically. column name.type –
str | list[Type[Record]]
Data type (“number”, “cat”, “int”, “float”, “bool”, “datetime”). For categorical types, can define from which registry values are sampled, e.g.,cat[ULabel]
orcat[bionty.CellType]
.unit –
str | None = None
Unit of measure, ideally SI ("m"
,"s"
,"kg"
, etc.) or"normalized"
etc.description –
str | None = None
A description.synonyms –
str | None = None
Bar-separated synonyms.
Note
For more control, you can use
bionty
registries to manage basic biological entities like genes, proteins & cell markers. Or you define custom registries to manage high-level derived features like gene sets.See also
from_df()
Create feature records from DataFrame.
features
Feature manager of an artifact or collection.
ULabel
Universal labels.
FeatureSet
Feature sets.
Example
>>> ln.Feature("cell_type_by_expert", dtype="cat", description="Expert cell type annotation").save()
Hint
Features and labels denote two ways of using entities to organize data:
A feature qualifies what is measured, i.e.. numerical or categorical random variable
A label is a measured value, i.e.. category
Consider annotating a dataset by that it measured expression of 30k genes: genes relate to the dataset as feature identifiers through a feature set with 30k members. Now consider annotating the artifact by whether that it measured the knock-out of 3 genes: here, the 3 genes act as labels of the dataset.
Re-shaping data can introduce ambiguity among features & labels. If this happened. sk yourself what the joint measurement was: a feature qualifies variables in a joint measurement. The canonical data matrix lists jointly measured variables in the columns.
Fields¶
-
id:
int
¶ Internal id, valid only in one DB instance.
-
uid:
str
¶ Universal id, valid across DB instances.
-
name:
str
¶ Name of feature (required).
-
dtype:
str
¶ Data type (“number”, “cat”, “int”, “float”, “bool”, “datetime”).
For categorical types, can define from which registry values are sampled, e.g.,
cat[ULabel]
orcat[bionty.CellType]
.
-
unit:
str
¶ Unit of measure, ideally SI (
m
,s
,kg
, etc.) or ‘normalized’ etc. (optional).
-
description:
str
¶ A description.
-
synonyms:
str
¶ Bar-separated (|) synonyms (optional).
-
feature_sets:
FeatureSet
¶ Feature sets linked to this feature.
-
created_at:
datetime
¶ Time of creation of record.
-
updated_at:
datetime
¶ Time of last update to record.
Methods¶
- classmethod from_df(df, field=None)¶
Create Feature records for columns.
- Return type: