lamindb.Feature¶
- class lamindb.Feature(name: str, type: str | list[type[Record]], unit: str | None, description: str | None, synonyms: str | None)¶
Bases:
Record
,CanCurate
,TracksRun
,TracksUpdates
Dataset dimensions.
Features denote dataset dimensions, i.e., the variables that measure labels & numbers.
The
Feature
registry helps tomanage metadata of features
annotate datasets by whether they measured a feature
Learn more: Tutorial: Features & labels.
- Parameters:
name –
str
Name of the feature, typically. column name.dtype –
str | list[Type[Record]]
Data type (“number”, “cat”, “int”, “float”, “bool”, “datetime”). For categorical types, can define from which registry values are sampled, e.g.,cat[ULabel]
orcat[bionty.CellType]
.unit –
str | None = None
Unit of measure, ideally SI ("m"
,"s"
,"kg"
, etc.) or"normalized"
etc.description –
str | None = None
A description.synonyms –
str | None = None
Bar-separated synonyms.
Note
For more control, you can use
bionty
registries to manage simple biological entities like genes, proteins & cell markers. Or you define custom registries to manage high-level derived features like gene sets.See also
from_df()
Create feature records from DataFrame.
features
Feature manager of an artifact or collection.
ULabel
Universal labels.
FeatureSet
Feature sets.
Example
>>> ln.Feature( ... name="cell_type_by_expert", ... dtype="cat[bionty.CellType]", ... description="Expert cell type annotation" ... ).save()
Hint
Features and labels denote two ways of using entities to organize data:
A feature qualifies what is measured, i.e., a numerical or categorical random variable
A label is a measured value, i.e., a category
Consider annotating a dataset by that it measured expression of 30k genes: genes relate to the dataset as feature identifiers through a feature set with 30k members. Now consider annotating the artifact by whether that it measured the knock-out of 3 genes: here, the 3 genes act as labels of the dataset.
Re-shaping data can introduce ambiguity among features & labels. If this happened, ask yourself what the joint measurement was: a feature qualifies variables in a joint measurement. The canonical data matrix lists jointly measured variables in the columns.
Simple fields¶
-
uid:
str
¶ Universal id, valid across DB instances.
-
name:
str
¶ Name of feature (required).
-
dtype:
str
¶ Data type (“number”, “cat”, “int”, “float”, “bool”, “datetime”).
For categorical types, can define from which registry values are sampled, e.g.,
cat[ULabel]
orcat[bionty.CellType]
.
-
unit:
str
¶ Unit of measure, ideally SI (
m
,s
,kg
, etc.) or ‘normalized’ etc. (optional).
-
description:
str
¶ A description.
-
synonyms:
str
¶ Bar-separated (|) synonyms (optional).
-
created_at:
datetime
¶ Time of creation of record.
-
updated_at:
datetime
¶ Time of last update to record.
Relational fields¶
-
feature_sets:
FeatureSet
¶ Feature sets linked to this feature.
-
values:
FeatureValue
¶ Values for this feature.
Class methods¶
- classmethod df(include=None, join='inner', limit=100)¶
Convert to
pd.DataFrame
.By default, shows all direct fields, except
updated_at
.Use parameter
include
to include other fields.- Parameters:
include (
str
|list
[str
] |None
, default:None
) – Related fields to include as columns. Takes strings of form"labels__name"
,"cell_types__name"
, etc. or a list of such strings.join (
str
, default:'inner'
) – Thejoin
parameter ofpandas
.limit (
int
, default:100
) – Maximum number of rows to display from a Pandas DataFrame. Defaults to 100 to reduce database load.
- Return type:
DataFrame
Examples
>>> labels = [ln.ULabel(name="Label {i}") for i in range(3)] >>> ln.save(labels) >>> ln.ULabel.filter().df(include=["created_by__name"])
- classmethod filter(*queries, **expressions)¶
Query records.
- Parameters:
queries – One or multiple
Q
objects.expressions – Fields and values passed as Django query expressions.
- Return type:
QuerySet
- Returns:
A
QuerySet
.
See also
Guide: Query & search registries
Django documentation: Queries
Examples
>>> ln.ULabel(name="my ulabel").save() >>> ulabel = ln.ULabel.get(name="my ulabel")
- classmethod from_df(df, field=None)¶
Create Feature records for columns.
- Return type:
- classmethod from_values(values, field=None, create=False, organism=None, source=None, mute=False)¶
Bulk create validated records by parsing values for an identifier such as a name or an id).
- Parameters:
values (
List
[str
] |Series
|array
) – A list of values for an identifier, e.g.["name1", "name2"]
.field (
str
|DeferredAttribute
|None
, default:None
) – ARecord
field to look up, e.g.,bt.CellMarker.name
.create (
bool
, default:False
) – Whether to create records if they don’t exist.organism (
Record
|str
|None
, default:None
) – Abionty.Organism
name or record.source (
Record
|None
, default:None
) – Abionty.Source
record to validate against to create records for.mute (
bool
, default:False
) – Whether to mute logging.
- Return type:
list
[Record
]- Returns:
A list of validated records. For bionty registries. Also returns knowledge-coupled records.
Notes
For more info, see tutorial: Manage biological registries.
Examples
Bulk create from non-validated values will log warnings & returns empty list:
>>> ulabels = ln.ULabel.from_values(["benchmark", "prediction", "test"], field="name") >>> assert len(ulabels) == 0
Bulk create records from validated values returns the corresponding existing records:
>>> ln.save([ln.ULabel(name=name) for name in ["benchmark", "prediction", "test"]]) >>> ulabels = ln.ULabel.from_values(["benchmark", "prediction", "test"], field="name") >>> assert len(ulabels) == 3
Bulk create records from public reference:
>>> import bionty as bt >>> records = bt.CellType.from_values(["T cell", "B cell"], field="name") >>> records
- classmethod get(idlike=None, **expressions)¶
Get a single record.
- Parameters:
idlike (
int
|str
|None
, default:None
) – Either a uid stub, uid or an integer id.expressions – Fields and values passed as Django query expressions.
- Return type:
- Returns:
A record.
- Raises:
lamindb.core.exceptions.DoesNotExist – In case no matching record is found.
See also
Guide: Query & search registries
Django documentation: Queries
Examples
>>> ulabel = ln.ULabel.get("2riu039") >>> ulabel = ln.ULabel.get(name="my-label")
- classmethod inspect(values, field=None, *, mute=False, organism=None, source=None)¶
Inspect if values are mappable to a field.
Being mappable means that an exact match exists.
- Parameters:
values (
List
[str
] |Series
|array
) – Values that will be checked against the field.field (
str
|DeferredAttribute
|None
, default:None
) – The field of values. Examples are'ontology_id'
to map against the source ID or'name'
to map against the ontologies field names.mute (
bool
, default:False
) – Whether to mute logging.organism (
Record
|str
|None
, default:None
) – An Organism name or record.source (
Record
|None
, default:None
) – Abionty.Source
record that specifies the version to inspect against.
- Return type:
See also
Examples
>>> import bionty as bt >>> bt.settings.organism = "human" >>> ln.save(bt.Gene.from_values(["A1CF", "A1BG", "BRCA2"], field="symbol")) >>> gene_symbols = ["A1CF", "A1BG", "FANCD1", "FANCD20"] >>> result = bt.Gene.inspect(gene_symbols, field=bt.Gene.symbol) >>> result.validated ['A1CF', 'A1BG'] >>> result.non_validated ['FANCD1', 'FANCD20']
- classmethod lookup(field=None, return_field=None)¶
Return an auto-complete object for a field.
- Parameters:
field (
str
|DeferredAttribute
|None
, default:None
) – The field to look up the values for. Defaults to first string field.return_field (
str
|DeferredAttribute
|None
, default:None
) – The field to return. IfNone
, returns the whole record.
- Return type:
NamedTuple
- Returns:
A
NamedTuple
of lookup information of the field values with a dictionary converter.
See also
Examples
>>> import bionty as bt >>> bt.settings.organism = "human" >>> bt.Gene.from_source(symbol="ADGB-DT").save() >>> lookup = bt.Gene.lookup() >>> lookup.adgb_dt >>> lookup_dict = lookup.dict() >>> lookup_dict['ADGB-DT'] >>> lookup_by_ensembl_id = bt.Gene.lookup(field="ensembl_gene_id") >>> genes.ensg00000002745 >>> lookup_return_symbols = bt.Gene.lookup(field="ensembl_gene_id", return_field="symbol")
- classmethod search(string, *, field=None, limit=20, case_sensitive=False)¶
Search.
- Parameters:
string (
str
) – The input string to match against the field ontology values.field (
str
|DeferredAttribute
|None
, default:None
) – The field or fields to search. Search all string fields by default.limit (
int
|None
, default:20
) – Maximum amount of top results to return.case_sensitive (
bool
, default:False
) – Whether the match is case sensitive.
- Return type:
QuerySet
- Returns:
A sorted
DataFrame
of search results with a score in columnscore
. Ifreturn_queryset
isTrue
.QuerySet
.
Examples
>>> ulabels = ln.ULabel.from_values(["ULabel1", "ULabel2", "ULabel3"], field="name") >>> ln.save(ulabels) >>> ln.ULabel.search("ULabel2")
- classmethod standardize(values, field=None, *, return_field=None, return_mapper=False, case_sensitive=False, mute=False, public_aware=True, keep='first', synonyms_field='synonyms', organism=None, source=None)¶
Maps input synonyms to standardized names.
- Parameters:
values (
List
[str
] |Series
|array
) – Identifiers that will be standardized.field (
str
|DeferredAttribute
|None
, default:None
) – The field representing the standardized names.return_field (
str
|None
, default:None
) – The field to return. Defaults to field.return_mapper (
bool
, default:False
) – IfTrue
, returns{input_value: standardized_name}
.case_sensitive (
bool
, default:False
) – Whether the mapping is case sensitive.mute (
bool
, default:False
) – Whether to mute logging.public_aware (
bool
, default:True
) – Whether to standardize from Bionty reference. Defaults toTrue
for Bionty registries.keep (
Literal
['first'
,'last'
,False
], default:'first'
) –- When a synonym maps to multiple names, determines which duplicates to mark as
pd.DataFrame.duplicated
: "first"
: returns the first mapped standardized name"last"
: returns the last mapped standardized nameFalse
: returns all mapped standardized name.
When
keep
isFalse
, the returned list of standardized names will contain nested lists in case of duplicates.When a field is converted into return_field, keep marks which matches to keep when multiple return_field values map to the same field value.
- When a synonym maps to multiple names, determines which duplicates to mark as
synonyms_field (
str
, default:'synonyms'
) – A field containing the concatenated synonyms.organism (
Record
|str
|None
, default:None
) – An Organism name or record.source (
Record
|None
, default:None
) – Abionty.Source
record that specifies the version to validate against.
- Return type:
list
[str
] |dict
[str
,str
]- Returns:
If
return_mapper
isFalse
– a list of standardized names. Otherwise, a dictionary of mapped values with mappable synonyms as keys and standardized names as values.
See also
add_synonym()
Add synonyms.
remove_synonym()
Remove synonyms.
Examples
>>> import bionty as bt >>> bt.settings.organism = "human" >>> ln.save(bt.Gene.from_values(["A1CF", "A1BG", "BRCA2"], field="symbol")) >>> gene_synonyms = ["A1CF", "A1BG", "FANCD1", "FANCD20"] >>> standardized_names = bt.Gene.standardize(gene_synonyms) >>> standardized_names ['A1CF', 'A1BG', 'BRCA2', 'FANCD20']
- classmethod using(instance)¶
Use a non-default LaminDB instance.
- Parameters:
instance (
str
|None
) – An instance identifier of form “account_handle/instance_name”.- Return type:
QuerySet
Examples
>>> ln.ULabel.using("account_handle/instance_name").search("ULabel7", field="name") uid score name ULabel7 g7Hk9b2v 100.0 ULabel5 t4Jm6s0q 75.0 ULabel6 r2Xw8p1z 75.0
- classmethod validate(values, field=None, *, mute=False, organism=None, source=None)¶
Validate values against existing values of a string field.
Note this is strict validation, only asserts exact matches.
- Parameters:
values (
List
[str
] |Series
|array
) – Values that will be validated against the field.field (
str
|DeferredAttribute
|None
, default:None
) – The field of values. Examples are'ontology_id'
to map against the source ID or'name'
to map against the ontologies field names.mute (
bool
, default:False
) – Whether to mute logging.organism (
Record
|str
|None
, default:None
) – An Organism name or record.source (
Record
|None
, default:None
) – Abionty.Source
record that specifies the version to validate against.
- Return type:
ndarray
- Returns:
A vector of booleans indicating if an element is validated.
See also
Examples
>>> import bionty as bt >>> bt.settings.organism = "human" >>> ln.save(bt.Gene.from_values(["A1CF", "A1BG", "BRCA2"], field="symbol")) >>> gene_symbols = ["A1CF", "A1BG", "FANCD1", "FANCD20"] >>> bt.Gene.validate(gene_symbols, field=bt.Gene.symbol) array([ True, True, False, False])
Methods¶
- add_synonym(synonym, force=False, save=None)¶
Add synonyms to a record.
- Parameters:
synonym (
str
|List
[str
] |Series
|array
) – The synonyms to add to the record.force (
bool
, default:False
) – Whether to add synonyms even if they are already synonyms of other records.save (
bool
|None
, default:None
) – Whether to save the record to the database.
See also
remove_synonym()
Remove synonyms.
Examples
>>> import bionty as bt >>> bt.CellType.from_source(name="T cell").save() >>> lookup = bt.CellType.lookup() >>> record = lookup.t_cell >>> record.synonyms 'T-cell|T lymphocyte|T-lymphocyte' >>> record.add_synonym("T cells") >>> record.synonyms 'T cells|T-cell|T-lymphocyte|T lymphocyte'
- delete()¶
Delete.
- Return type:
None
- remove_synonym(synonym)¶
Remove synonyms from a record.
- Parameters:
synonym (
str
|List
[str
] |Series
|array
) – The synonym values to remove.
See also
add_synonym()
Add synonyms
Examples
>>> import bionty as bt >>> bt.CellType.from_source(name="T cell").save() >>> lookup = bt.CellType.lookup() >>> record = lookup.t_cell >>> record.synonyms 'T-cell|T lymphocyte|T-lymphocyte' >>> record.remove_synonym("T-cell") 'T lymphocyte|T-lymphocyte'
- set_abbr(value)¶
Set value for abbr field and add to synonyms.
- Parameters:
value (
str
) – A value for an abbreviation.
See also
Examples
>>> import bionty as bt >>> bt.ExperimentalFactor.from_source(name="single-cell RNA sequencing").save() >>> scrna = bt.ExperimentalFactor.get(name="single-cell RNA sequencing") >>> scrna.abbr None >>> scrna.synonyms 'single-cell RNA-seq|single-cell transcriptome sequencing|scRNA-seq|single cell RNA sequencing' >>> scrna.set_abbr("scRNA") >>> scrna.abbr 'scRNA' >>> scrna.synonyms 'scRNA|single-cell RNA-seq|single cell RNA sequencing|single-cell transcriptome sequencing|scRNA-seq' >>> scrna.save()