
class lamindb.Feature(name: str, dtype: Dtype | Registry | list[Registry] | FieldAttr, type: Feature | None = None, is_type: bool = False, unit: str | None = None, description: str | None = None, synonyms: str | None = None, nullable: bool = True, default_value: str | None = None, coerce_dtype: bool = False, cat_filters: dict[str, str] | None = None)

Bases: Record, CanCurate, TracksRun, TracksUpdates

Dataset dimensions.

A feature represents a dimension of a dataset, such as a column in a DataFrame. The Feature registry organizes metadata of features.

The Feature registry helps you organize and query datasets based on their features and corresponding label annotations. For instance, when working with a “T cell” label, it could be measured through different features such as "cell_type_by_expert" where an expert manually classified the cell, or "cell_type_by_model" where a computational model made the classification.

The two most important metadata of a feature are its name and the dtype. In addition to typical data types, LaminDB has a "num" dtype to concisely denote the union of all numerical types.

  • namestr Name of the feature, typically. column name.

  • dtypeDtype | Registry | list[Registry] | FieldAttr See Dtype. For categorical types, you can define to which registry values are restricted, e.g., ULabel or [ULabel, bionty.CellType].

  • unitstr | None = None Unit of measure, ideally SI ("m", "s", "kg", etc.) or "normalized" etc.

  • descriptionstr | None = None A description.

  • synonymsstr | None = None Bar-separated synonyms.

  • nullablebool = True Whether the feature can have null-like values (None, pd.NA, NaN, etc.), see nullable.

  • default_valueAny | None = None Default value for the feature.

  • coerce_dtypebool = False When True, attempts to coerce values to the specified dtype during validation, see coerce_dtype.

  • cat_filtersdict[str, str] | None = None Subset a registry by additional filters to define valid categories.


For more control, you can use bionty registries to manage simple biological entities like genes, proteins & cell markers. Or you define custom registries to manage high-level derived features like gene sets.

See also


Create feature records from DataFrame.


Feature manager of an artifact or collection.


Universal labels.


Feature sets.


A simple "str" feature.

>>> ln.Feature(
...     name="sample_note",
...     dtype="str",
... ).save()

A dtype "cat[ULabel]" can be more easily passed as below.

>>> ln.Feature(
...     name="project",
...     dtype=ln.ULabel,
... ).save()

A dtype "cat[ULabel|bionty.CellType]" can be more easily passed as below.

>>> ln.Feature(
...     name="cell_type",
...     dtype=[ln.ULabel, bt.CellType],
... ).save()


Features and labels denote two ways of using entities to organize data:

  1. A feature qualifies what is measured, i.e., a numerical or categorical random variable

  2. A label is a measured value, i.e., a category

Consider annotating a dataset by that it measured expression of 30k genes: genes relate to the dataset as feature identifiers through a feature set with 30k members. Now consider annotating the artifact by whether that it measured the knock-out of 3 genes: here, the 3 genes act as labels of the dataset.

Re-shaping data can introduce ambiguity among features & labels. If this happened, ask yourself what the joint measurement was: a feature qualifies variables in a joint measurement. The canonical data matrix lists jointly measured variables in the columns.


property coerce_dtype: bool

Whether dtypes should be coerced during validation.

For example, a objects-dtyped pandas column can be coerced to categorical and would pass validation if this is true.

property default_value: Any

A default value that overwrites missing values (default None).

This takes effect when you call Curator.standardize().

If default_value = None, missing values like pd.NA or np.nan are kept.

property nullable: bool

Indicates whether the feature can have nullable values (default True).


import lamindb as ln
import pandas as pd

disease = ln.Feature(name="disease", dtype=ln.ULabel, nullable=False).save()
schema = ln.Schema(features=[disease]).save()
dataset = {"disease": pd.Categorical([pd.NA, "asthma"])}
df = pd.DataFrame(dataset)
curator = ln.curators.DataFrameCurator(df, schema)
except ln.errors.ValidationError as e:
    assert str(e).startswith("non-nullable series 'disease' contains null values")

Simple fields

uid: str

Universal id, valid across DB instances.

name: str

Name of feature (hard unique constraint unique=True).

dtype: Dtype | None

Data type (Dtype).

is_type: bool

Distinguish types from instances of the type.

unit: str | None

Unit of measure, ideally SI (m, s, kg, etc.) or ‘normalized’ etc. (optional).

description: str | None

A description.

array_rank: int

Rank of feature.

Number of indices of the array: 0 for scalar, 1 for vector, 2 for matrix.

Is called .ndim in numpy and pytorch but shouldn’t be confused with the dimension of the feature space.

array_size: int

Number of elements of the feature.

Total number of elements (product of shape components) of the array.

  • A number or string (a scalar): 1

  • A 50-dimensional embedding: 50

  • A 25 x 25 image: 625

array_shape: list[int] | None

Shape of the feature.

  • A number or string (a scalar): [1]

  • A 50-dimensional embedding: [50]

  • A 25 x 25 image: [25, 25]

Is stored as a list rather than a tuple because it’s serialized as JSON.

proxy_dtype: Dtype | None

Proxy data type.

If the feature is an image it’s often stored via a path to the image file. Hence, while the dtype might be image with a certain shape, the proxy dtype would be str.

synonyms: str | None

Bar-separated (|) synonyms (optional).

created_at: datetime

Time of creation of record.

updated_at: datetime

Time of last update to record.

Relational fields

space: Space

The space in which the record lives.

created_by: User

Creator of record.

run: Run | None

Run that created record.

type: Feature | None

Type of feature (e.g., ‘Readout’, ‘Metric’, ‘Metadata’, ‘ExpertAnnotation’, ‘ModelPrediction’).

Allows to group features by type, e.g., all read outs, all metrics, etc.

schemas: Schema

Feature sets linked to this feature.

records: Feature

Records of this type.

values: FeatureValue

Values for this feature.


Accessor to the related objects manager on the forward and reverse sides of a many-to-many relation.

In the example:

class Pizza(Model):
    toppings = ManyToManyField(Topping, related_name='pizzas')

Pizza.toppings and Topping.pizzas are ManyToManyDescriptor instances.

Most of the implementation is delegated to a dynamically defined manager class built by create_forward_many_to_many_manager() defined below.

Class methods

classmethod df(include=None, features=False, limit=100)

Convert to pd.DataFrame.

By default, shows all direct fields, except updated_at.

Use arguments include or feature to include other data.

  • include (str | list[str] | None, default: None) – Related fields to include as columns. Takes strings of form "ulabels__name", "cell_types__name", etc. or a list of such strings.

  • features (bool | list[str], default: False) – If True, map all features of the Feature registry onto the resulting DataFrame. Only available for Artifact.

  • limit (int, default: 100) – Maximum number of rows to display from a Pandas DataFrame. Defaults to 100 to reduce database load.

Return type:



Include the name of the creator in the DataFrame:

>>> ln.ULabel.df(include="created_by__name"])

Include display of features for Artifact:

>>> df = ln.Artifact.df(features=True)
>>> ln.view(df)  # visualize with type annotations

Only include select features:

>>> df = ln.Artifact.df(features=["cell_type_by_expert", "cell_type_by_model"])
classmethod filter(*queries, **expressions)

Query records.

  • queries – One or multiple Q objects.

  • expressions – Fields and values passed as Django query expressions.

Return type:



A QuerySet.

See also


>>> ln.ULabel(name="my label").save()
>>> ln.ULabel.filter(name__startswith="my").df()
classmethod from_df(df, field=None)

Create Feature records for columns.

Return type:


classmethod from_values(values, field=None, create=False, organism=None, source=None, mute=False)

Bulk create validated records by parsing values for an identifier such as a name or an id).

  • values (list[str] | Series | array) – A list of values for an identifier, e.g. ["name1", "name2"].

  • field (str | DeferredAttribute | None, default: None) – A Record field to look up, e.g.,

  • create (bool, default: False) – Whether to create records if they don’t exist.

  • organism (Record | str | None, default: None) – A bionty.Organism name or record.

  • source (Record | None, default: None) – A bionty.Source record to validate against to create records for.

  • mute (bool, default: False) – Whether to mute logging.

Return type:



A list of validated records. For bionty registries. Also returns knowledge-coupled records.


For more info, see tutorial: Manage biological registries.


import bionty as bt

# Bulk create from non-validated values will log warnings & returns empty list
ulabels = ln.ULabel.from_values(["benchmark", "prediction", "test"])
assert len(ulabels) == 0

# Bulk create records from validated values returns the corresponding existing records
ulabels = ln.ULabel.from_values(["benchmark", "prediction", "test"], create=True).save()
assert len(ulabels) == 3

# Bulk create records from public reference
bt.CellType.from_values(["T cell", "B cell"]).save()
classmethod get(idlike=None, **expressions)

Get a single record.

  • idlike (int | str | None, default: None) – Either a uid stub, uid or an integer id.

  • expressions – Fields and values passed as Django query expressions.


lamindb.errors.DoesNotExist – In case no matching record is found.

Return type:


See also


ulabel = ln.ULabel.get("FvtpPJLJ")
ulabel = ln.ULabel.get(name="my-label")
classmethod inspect(values, field=None, *, mute=False, organism=None, source=None, strict_source=False)

Inspect if values are mappable to a field.

Being mappable means that an exact match exists.

  • values (list[str] | Series | array) – Values that will be checked against the field.

  • field (str | DeferredAttribute | None, default: None) – The field of values. Examples are 'ontology_id' to map against the source ID or 'name' to map against the ontologies field names.

  • mute (bool, default: False) – Whether to mute logging.

  • organism (str | Record | None, default: None) – An Organism name or record.

  • source (Record | None, default: None) – A bionty.Source record that specifies the version to inspect against.

  • strict_source (bool, default: False) – Determines the validation behavior against records in the registry. - If False, validation will include all records in the registry, ignoring the specified source. - If True, validation will only include records in the registry that are linked to the specified source. Note: this parameter won’t affect validation against public sources.

Return type:


See also



import bionty as bt

# save some gene records
bt.Gene.from_values(["A1CF", "A1BG", "BRCA2"], field="symbol", organism="human").save()

# inspect gene symbols
gene_symbols = ["A1CF", "A1BG", "FANCD1", "FANCD20"]
result = bt.Gene.inspect(gene_symbols, field=bt.Gene.symbol, organism="human")
assert result.validated == ["A1CF", "A1BG"]
assert result.non_validated == ["FANCD1", "FANCD20"]
classmethod lookup(field=None, return_field=None)

Return an auto-complete object for a field.

  • field (str | DeferredAttribute | None, default: None) – The field to look up the values for. Defaults to first string field.

  • return_field (str | DeferredAttribute | None, default: None) – The field to return. If None, returns the whole record.

Return type:



A NamedTuple of lookup information of the field values with a dictionary converter.

See also



>>> import bionty as bt
>>> bt.settings.organism = "human"
>>> bt.Gene.from_source(symbol="ADGB-DT").save()
>>> lookup = bt.Gene.lookup()
>>> lookup.adgb_dt
>>> lookup_dict = lookup.dict()
>>> lookup_dict['ADGB-DT']
>>> lookup_by_ensembl_id = bt.Gene.lookup(field="ensembl_gene_id")
>>> genes.ensg00000002745
>>> lookup_return_symbols = bt.Gene.lookup(field="ensembl_gene_id", return_field="symbol")
classmethod search(string, *, field=None, limit=20, case_sensitive=False)


  • string (str) – The input string to match against the field ontology values.

  • field (str | DeferredAttribute | None, default: None) – The field or fields to search. Search all string fields by default.

  • limit (int | None, default: 20) – Maximum amount of top results to return.

  • case_sensitive (bool, default: False) – Whether the match is case sensitive.

Return type:



A sorted DataFrame of search results with a score in column score. If return_queryset is True. QuerySet.

See also

filter() lookup()


>>> ulabels = ln.ULabel.from_values(["ULabel1", "ULabel2", "ULabel3"], field="name")
classmethod standardize(values, field=None, *, return_field=None, return_mapper=False, case_sensitive=False, mute=False, source_aware=True, keep='first', synonyms_field='synonyms', organism=None, source=None, strict_source=False)

Maps input synonyms to standardized names.

  • values (Iterable) – Identifiers that will be standardized.

  • field (str | DeferredAttribute | None, default: None) – The field representing the standardized names.

  • return_field (str | DeferredAttribute | None, default: None) – The field to return. Defaults to field.

  • return_mapper (bool, default: False) – If True, returns {input_value: standardized_name}.

  • case_sensitive (bool, default: False) – Whether the mapping is case sensitive.

  • mute (bool, default: False) – Whether to mute logging.

  • source_aware (bool, default: True) – Whether to standardize from public source. Defaults to True for BioRecord registries.

  • keep (Literal['first', 'last', False], default: 'first') –

    When a synonym maps to multiple names, determines which duplicates to mark as pd.DataFrame.duplicated: - "first": returns the first mapped standardized name - "last": returns the last mapped standardized name - False: returns all mapped standardized name.

    When keep is False, the returned list of standardized names will contain nested lists in case of duplicates.

    When a field is converted into return_field, keep marks which matches to keep when multiple return_field values map to the same field value.

  • synonyms_field (str, default: 'synonyms') – A field containing the concatenated synonyms.

  • organism (str | Record | None, default: None) – An Organism name or record.

  • source (Record | None, default: None) – A bionty.Source record that specifies the version to validate against.

  • strict_source (bool, default: False) – Determines the validation behavior against records in the registry. - If False, validation will include all records in the registry, ignoring the specified source. - If True, validation will only include records in the registry that are linked to the specified source. Note: this parameter won’t affect validation against public sources.

Return type:

list[str] | dict[str, str]


If return_mapper is False – a list of standardized names. Otherwise, a dictionary of mapped values with mappable synonyms as keys and standardized names as values.

See also


Add synonyms.


Remove synonyms.


import bionty as bt

# save some gene records
bt.Gene.from_values(["A1CF", "A1BG", "BRCA2"], field="symbol", organism="human").save()

# standardize gene synonyms
gene_synonyms = ["A1CF", "A1BG", "FANCD1", "FANCD20"]
#> ['A1CF', 'A1BG', 'BRCA2', 'FANCD20']
classmethod using(instance)

Use a non-default LaminDB instance.


instance (str | None) – An instance identifier of form “account_handle/instance_name”.

Return type:



>>> ln.ULabel.using("account_handle/instance_name").search("ULabel7", field="name")
            uid    score
ULabel7  g7Hk9b2v  100.0
ULabel5  t4Jm6s0q   75.0
ULabel6  r2Xw8p1z   75.0
classmethod validate(values, field=None, *, mute=False, organism=None, source=None, strict_source=False)

Validate values against existing values of a string field.

Note this is strict_source validation, only asserts exact matches.

  • values (list[str] | Series | array) – Values that will be validated against the field.

  • field (str | DeferredAttribute | None, default: None) – The field of values. Examples are 'ontology_id' to map against the source ID or 'name' to map against the ontologies field names.

  • mute (bool, default: False) – Whether to mute logging.

  • organism (str | Record | None, default: None) – An Organism name or record.

  • source (Record | None, default: None) – A bionty.Source record that specifies the version to validate against.

  • strict_source (bool, default: False) – Determines the validation behavior against records in the registry. - If False, validation will include all records in the registry, ignoring the specified source. - If True, validation will only include records in the registry that are linked to the specified source. Note: this parameter won’t affect validation against public sources.

Return type:



A vector of booleans indicating if an element is validated.

See also



import bionty as bt

bt.Gene.from_values(["A1CF", "A1BG", "BRCA2"], field="symbol", organism="human").save()

gene_symbols = ["A1CF", "A1BG", "FANCD1", "FANCD20"]
bt.Gene.validate(gene_symbols, field=bt.Gene.symbol, organism="human")
#> array([ True,  True, False, False])


add_synonym(synonym, force=False, save=None)

Add synonyms to a record.

  • synonym (str | list[str] | Series | array) – The synonyms to add to the record.

  • force (bool, default: False) – Whether to add synonyms even if they are already synonyms of other records.

  • save (bool | None, default: None) – Whether to save the record to the database.

See also


Remove synonyms.


import bionty as bt

# save "T cell" record
record = bt.CellType.from_source(name="T cell").save()
#> "T-cell|T lymphocyte|T-lymphocyte"

# add a synonym
record.add_synonym("T cells")
#> "T cells|T-cell|T-lymphocyte|T lymphocyte"


Return type:



Remove synonyms from a record.


synonym (str | list[str] | Series | array) – The synonym values to remove.

See also


Add synonyms


import bionty as bt

# save "T cell" record
record = bt.CellType.from_source(name="T cell").save()
#> "T-cell|T lymphocyte|T-lymphocyte"

# remove a synonym
#> "T lymphocyte|T-lymphocyte"
save(*args, **kwargs)


Return type:



Set value for abbr field and add to synonyms.


value (str) – A value for an abbreviation.

See also



import bionty as bt

# save an experimental factor record
scrna = bt.ExperimentalFactor.from_source(name="single-cell RNA sequencing").save()
assert scrna.abbr is None
assert scrna.synonyms == "single-cell RNA-seq|single-cell transcriptome sequencing|scRNA-seq|single cell RNA sequencing"

# set abbreviation
assert scrna.abbr == "scRNA"
# synonyms are updated
assert scrna.synonyms == "scRNA|single-cell RNA-seq|single cell RNA sequencing|single-cell transcriptome sequencing|scRNA-seq"