lamindb.FeatureSet¶
- class lamindb.FeatureSet(features: Iterable[Record], dtype: str | None = None, name: str | None = None)¶
-
Feature sets.
Stores references to sets of
Feature
and other registries that may be used to identify features (e.g.,Gene
orProtein
).Why does LaminDB model feature sets, not just features?
Performance: Imagine you measure the same panel of 20k transcripts in 1M samples. By modeling the panel as a feature set, you can link all your artifacts against one feature set and only need to store 1M instead of 1M x 20k = 20B links.
Interpretation: Model protein panels, gene panels, etc.
Data integration: Feature sets provide the currency that determines whether two collections can be easily concatenated.
These reasons do not hold for label sets. Hence, LaminDB does not model label sets.
- Parameters:
features –
Iterable[Record]
An iterable ofFeature
records to hash, e.g.,[Feature(...), Feature(...)]
. Is turned into a set upon instantiation. If you’d like to pass values, usefrom_values()
orfrom_df()
.dtype –
str | None = None
The simple type. Defaults toNone
for sets ofFeature
records. Otherwise defaults to"num"
(e.g., for sets ofGene
).name –
str | None = None
A name.
Note
A feature set can be identified by the
hash
its feature uids. It’s stored in the.hash
field.A
slot
provides a string key to access feature sets. It’s typically the accessor within the registered data object, herepd.DataFrame.columns
.See also
from_values()
Create from values.
from_df()
Create from dataframe columns.
Examples
Create a featureset from df with types:
>>> df = pd.DataFrame({"feat1": [1, 2], "feat2": [3.1, 4.2], "feat3": ["cond1", "cond2"]}) >>> feature_set = ln.FeatureSet.from_df(df)
Create a featureset from features:
>>> features = [ln.Feature(name=feat, dtype="float").save() for feat in ["feat1", "feat2"]] >>> feature_set = ln.FeatureSet(features)
Create a featureset from feature values:
>>> import bionty as bt >>> feature_set = ln.FeatureSet.from_values(adata.var["ensemble_id"], Gene.ensembl_gene_id, organism="mouse") >>> feature_set.save()
Link a feature set to an artifact:
>>> artifact.features.add_feature_set(feature_set, slot="var")
Link features to an artifact (will create a featureset under the hood):
>>> artifact.features.add_values(features)
Attributes¶
Simple fields¶
-
uid:
str
¶ A universal id (hash of the set of feature values).
-
name:
str
|None
¶ A name (optional).
- n¶
Number of features in the set.
-
dtype:
str
|None
¶ Data type, e.g., “num”, “float”, “int”. Is
None
forFeature
.For
Feature
, types are expected to be heterogeneous and defined on a per-feature level.
-
registry:
str
¶ The registry that stores the feature identifiers, e.g.,
'core.Feature'
or'bionty.Gene'
.Depending on the registry,
.members
stores, e.g.Feature
orGene
records.
-
hash:
str
|None
¶ The hash of the set.
-
created_at:
datetime
¶ Time of creation of record.
Relational fields¶
Class methods¶
- classmethod df(include=None, features=False, limit=100)¶
Convert to
pd.DataFrame
.By default, shows all direct fields, except
updated_at
.Use arguments
include
orfeature
to include other data.- Parameters:
include (
str
|list
[str
] |None
, default:None
) – Related fields to include as columns. Takes strings of form"ulabels__name"
,"cell_types__name"
, etc. or a list of such strings.features (
bool
|list
[str
], default:False
) – IfTrue
, map all features of theFeature
registry onto the resultingDataFrame
. Only available forArtifact
.limit (
int
, default:100
) – Maximum number of rows to display from a Pandas DataFrame. Defaults to 100 to reduce database load.
- Return type:
DataFrame
Examples
Include the name of the creator in the
DataFrame
:>>> ln.ULabel.df(include="created_by__name"])
Include display of features for
Artifact
:>>> df = ln.Artifact.df(features=True) >>> ln.view(df) # visualize with type annotations
Only include select features:
>>> df = ln.Artifact.df(features=["cell_type_by_expert", "cell_type_by_model"])
- classmethod filter(*queries, **expressions)¶
Query records.
- Parameters:
queries – One or multiple
Q
objects.expressions – Fields and values passed as Django query expressions.
- Return type:
QuerySet
- Returns:
A
QuerySet
.
See also
Guide: Query & search registries
Django documentation: Queries
Examples
>>> ln.ULabel(name="my label").save() >>> ln.ULabel.filter(name__startswith="my").df()
- classmethod from_df(df, field=FieldAttr(Feature.name), name=None, mute=False, organism=None, source=None)¶
Create feature set for validated features.
- Return type:
FeatureSet
|None
- classmethod from_values(values, field=FieldAttr(Feature.name), type=None, name=None, mute=False, organism=None, source=None, raise_validation_error=True)¶
Create feature set for validated features.
- Parameters:
values (
List
[str
] |Series
|array
) – A list of values, like feature names or ids.field (
DeferredAttribute
, default:FieldAttr(Feature.name)
) – The field of a reference registry to map values.type (
str
|None
, default:None
) – The simple type. Defaults toNone
if reference registry isFeature
, defaults to"float"
otherwise.name (
str
|None
, default:None
) – A name.organism (
Record
|str
|None
, default:None
) – An organism to resolve gene mapping.source (
Record
|None
, default:None
) – A public ontology to resolve feature identifier mapping.raise_validation_error (
bool
, default:True
) – Whether to raise a validation error if some values are not valid.
- Raises:
ValidationError – If some values are not valid.
- Return type:
Examples
>>> features = [ln.Feature(name=feat, dtype="str").save() for feat in ["feat11", "feat21"]] >>> feature_set = ln.FeatureSet.from_values(features)
>>> genes = ["ENSG00000139618", "ENSG00000198786"] >>> feature_set = ln.FeatureSet.from_values(features, bt.Gene.ensembl_gene_id, "float")
- classmethod get(idlike=None, **expressions)¶
Get a single record.
- Parameters:
idlike (
int
|str
|None
, default:None
) – Either a uid stub, uid or an integer id.expressions – Fields and values passed as Django query expressions.
- Return type:
- Returns:
A record.
- Raises:
lamindb.core.exceptions.DoesNotExist – In case no matching record is found.
See also
Guide: Query & search registries
Django documentation: Queries
Examples
>>> ulabel = ln.ULabel.get("FvtpPJLJ") >>> ulabel = ln.ULabel.get(name="my-label")
- classmethod lookup(field=None, return_field=None)¶
Return an auto-complete object for a field.
- Parameters:
field (
str
|DeferredAttribute
|None
, default:None
) – The field to look up the values for. Defaults to first string field.return_field (
str
|DeferredAttribute
|None
, default:None
) – The field to return. IfNone
, returns the whole record.
- Return type:
NamedTuple
- Returns:
A
NamedTuple
of lookup information of the field values with a dictionary converter.
See also
Examples
>>> import bionty as bt >>> bt.settings.organism = "human" >>> bt.Gene.from_source(symbol="ADGB-DT").save() >>> lookup = bt.Gene.lookup() >>> lookup.adgb_dt >>> lookup_dict = lookup.dict() >>> lookup_dict['ADGB-DT'] >>> lookup_by_ensembl_id = bt.Gene.lookup(field="ensembl_gene_id") >>> genes.ensg00000002745 >>> lookup_return_symbols = bt.Gene.lookup(field="ensembl_gene_id", return_field="symbol")
- classmethod search(string, *, field=None, limit=20, case_sensitive=False)¶
Search.
- Parameters:
string (
str
) – The input string to match against the field ontology values.field (
str
|DeferredAttribute
|None
, default:None
) – The field or fields to search. Search all string fields by default.limit (
int
|None
, default:20
) – Maximum amount of top results to return.case_sensitive (
bool
, default:False
) – Whether the match is case sensitive.
- Return type:
QuerySet
- Returns:
A sorted
DataFrame
of search results with a score in columnscore
. Ifreturn_queryset
isTrue
.QuerySet
.
Examples
>>> ulabels = ln.ULabel.from_values(["ULabel1", "ULabel2", "ULabel3"], field="name") >>> ln.save(ulabels) >>> ln.ULabel.search("ULabel2")
- classmethod using(instance)¶
Use a non-default LaminDB instance.
- Parameters:
instance (
str
|None
) – An instance identifier of form “account_handle/instance_name”.- Return type:
QuerySet
Examples
>>> ln.ULabel.using("account_handle/instance_name").search("ULabel7", field="name") uid score name ULabel7 g7Hk9b2v 100.0 ULabel5 t4Jm6s0q 75.0 ULabel6 r2Xw8p1z 75.0
Methods¶
- delete()¶
Delete.
- Return type:
None
- save(*args, **kwargs)¶
Save.
- Return type: