bionty.Protein¶
- class bionty.Protein(name: str | None, uniprotkb_id: str | None, synonyms: str | None, length: int | None, gene_symbol: str | None, ensembl_gene_ids: str | None, organism: Organism | None, source: Source | None)¶
Bases:
BioRecord
,TracksRun
,TracksUpdates
Proteins - Uniprot.
Notes
For more info, see tutorials Manage biological registries and Protein.
Bulk create records via
from_values()
.Example:
import bionty as bt record = bt.Protein.from_source(name="Synaptotagmin-15B", organism="human") record = bt.Protein.from_source(gene_symbol="SYT15B", organism="human")
Simple fields¶
- uid: str¶
A universal id (base62-encoded hash of defining fields).
- name: str | None¶
Unique name of a protein.
- uniprotkb_id: str | None¶
UniProt protein ID, 6 alphanumeric characters, possibly suffixed by 4 more.
- synonyms: str | None¶
Bar-separated (|) synonyms that correspond to this protein.
- description: str | None¶
Description of the protein.
- length: int | None¶
Length of the protein sequence.
- gene_symbol: str | None¶
The primary gene symbol corresponds to this protein.
- ensembl_gene_ids: str | None¶
Bar-separated (|) Ensembl Gene IDs that correspond to this protein.
- created_at: datetime¶
Time of creation of record.
- updated_at: datetime¶
Time of last update to record.
Relational fields¶
Class methods¶
- classmethod add_source(source, df=None)¶
Add a source of the entity.
- Parameters:
source (
Source
) – Source record to add (this can be from another entity).df (
DataFrame
|None
, default:None
) – DataFrame to add to the source.dataframe_artifact.currently_used – Whether to set this source as currently used.
- Return type:
- Returns:
A Source record with this entity.
Example:
import bionty as bt internal_source = bt.Source( entity="bionty.Gene", name="internal", version="0.0.1", organism="rabbit", description="internal gene reference", ).save() source_df = pd.DataFrame( { "ensembl_gene_id": ["ENSOCUG00000017195"], "symbol": ["SEL1L3"], "description": ["SEL1L family member 3"], } ) bt.Gene.add_source(internal_source, df)
- classmethod df(include=None, features=False, limit=100)¶
Convert to
pd.DataFrame
.By default, shows all direct fields, except
updated_at
.Use arguments
include
orfeature
to include other data.- Parameters:
include (
str
|list
[str
] |None
, default:None
) – Related fields to include as columns. Takes strings of form"ulabels__name"
,"cell_types__name"
, etc. or a list of such strings.features (
bool
|list
[str
], default:False
) – IfTrue
, map all features of theFeature
registry onto the resultingDataFrame
. Only available forArtifact
.limit (
int
, default:100
) – Maximum number of rows to display from a Pandas DataFrame. Defaults to 100 to reduce database load.
- Return type:
DataFrame
Examples
Include the name of the creator in the
DataFrame
:>>> ln.ULabel.df(include="created_by__name"])
Include display of features for
Artifact
:>>> df = ln.Artifact.df(features=True) >>> ln.view(df) # visualize with type annotations
Only include select features:
>>> df = ln.Artifact.df(features=["cell_type_by_expert", "cell_type_by_model"])
- classmethod filter(*queries, **expressions)¶
Query records.
- Parameters:
queries – One or multiple
Q
objects.expressions – Fields and values passed as Django query expressions.
- Return type:
- Returns:
A
QuerySet
.
See also
Guide: Query & search registries
Django documentation: Queries
Examples
>>> ln.ULabel(name="my label").save() >>> ln.ULabel.filter(name__startswith="my").df()
- classmethod from_source(*, name=None, uniprotkb_id=None, gene_symbol=None, organism=None, source=None, mute=False, **kwargs)¶
Create a Protein record from source based on a single identifying field.
- Parameters:
name (
str
|None
, default:None
) – Protein name (e.g. “Synaptotagmin-15B”)uniprotkb_id (
str
|None
, default:None
) – UniProt protein ID (e.g. “Q8N6N3”)gene_symbol (
str
|None
, default:None
) – Gene symbol (e.g. “SYT15B”)organism (
str
|Organism
|None
, default:None
) – Organism name or Organism record source: Optional Source record to usemute (
bool
, default:False
) – Whether to suppress logging
- Return type:
- Returns:
A single Protein record, list of Protein records, or None if not found
Example:
import bionty as bt record = bt.Protein.from_source(name="Synaptotagmin-15B", organism="human") record = bt.Protein.from_source(uniprotkb_id="Q8N6N3") record = bt.Protein.from_source(gene_symbol="SYT15B", organism="human")
- classmethod from_values(values, field=None, create=False, organism=None, source=None, mute=False)¶
Bulk create validated records by parsing values for an identifier such as a name or an id).
- Parameters:
values (
list
[str
] |Series
|array
) – A list of values for an identifier, e.g.["name1", "name2"]
.field (
str
|DeferredAttribute
|None
, default:None
) – ARecord
field to look up, e.g.,bt.CellMarker.name
.create (
bool
, default:False
) – Whether to create records if they don’t exist.organism (
Record
|str
|None
, default:None
) – Abionty.Organism
name or record.source (
Record
|None
, default:None
) – Abionty.Source
record to validate against to create records for.mute (
bool
, default:False
) – Whether to mute logging.
- Return type:
- Returns:
A list of validated records. For bionty registries. Also returns knowledge-coupled records.
Notes
For more info, see tutorial: Manage biological registries.
Example:
import bionty as bt # Bulk create from non-validated values will log warnings & returns empty list ulabels = ln.ULabel.from_values(["benchmark", "prediction", "test"]) assert len(ulabels) == 0 # Bulk create records from validated values returns the corresponding existing records ulabels = ln.ULabel.from_values(["benchmark", "prediction", "test"], create=True).save() assert len(ulabels) == 3 # Bulk create records from public reference bt.CellType.from_values(["T cell", "B cell"]).save()
- classmethod get(idlike=None, **expressions)¶
Get a single record.
- Parameters:
idlike (
int
|str
|None
, default:None
) – Either a uid stub, uid or an integer id.expressions – Fields and values passed as Django query expressions.
- Raises:
lamindb.errors.DoesNotExist – In case no matching record is found.
- Return type:
See also
Guide: Query & search registries
Django documentation: Queries
Examples:
ulabel = ln.ULabel.get("FvtpPJLJ") ulabel = ln.ULabel.get(name="my-label")
- classmethod import_source(source=None, update_records=False, *, organism=None, ignore_conflicts=True)¶
Bulk save records from a Bionty ontology.
Use this method to initialize your registry with public ontology.
- Parameters:
source (
Source
|None
, default:None
) – Source record to import records from.update_records (
bool
, default:False
) – Whether to update existing records with the new source.organism (
str
|Record
|None
, default:None
) – Organism name or record. Required for entities with a required organism foreign key when no source is passed.ignore_conflicts (
bool
, default:True
) – Whether to ignore conflicts during bulk record creation.
Example:
import bionty as bt # import all records from a source bt.CellType.import_source(source1) # update existing records with a new source bt.CellType.import_source(source2, update_records=True)
- classmethod inspect(values, field=None, *, mute=False, organism=None, source=None, strict_source=False)¶
Inspect if values are mappable to a field.
Being mappable means that an exact match exists.
- Parameters:
values (
list
[str
] |Series
|array
) – Values that will be checked against the field.field (
str
|DeferredAttribute
|None
, default:None
) – The field of values. Examples are'ontology_id'
to map against the source ID or'name'
to map against the ontologies field names.mute (
bool
, default:False
) – Whether to mute logging.organism (
str
|Record
|None
, default:None
) – An Organism name or record.source (
Record
|None
, default:None
) – Abionty.Source
record that specifies the version to inspect against.strict_source (
bool
, default:False
) – Determines the validation behavior against records in the registry. - IfFalse
, validation will include all records in the registry, ignoring the specified source. - IfTrue
, validation will only include records in the registry that are linked to the specified source. Note: this parameter won’t affect validation against public sources.
- Return type:
See also
Example:
import bionty as bt # save some gene records bt.Gene.from_values(["A1CF", "A1BG", "BRCA2"], field="symbol", organism="human").save() # inspect gene symbols gene_symbols = ["A1CF", "A1BG", "FANCD1", "FANCD20"] result = bt.Gene.inspect(gene_symbols, field=bt.Gene.symbol, organism="human") assert result.validated == ["A1CF", "A1BG"] assert result.non_validated == ["FANCD1", "FANCD20"]
- classmethod lookup(field=None, return_field=None)¶
Return an auto-complete object for a field.
- Parameters:
field (
str
|DeferredAttribute
|None
, default:None
) – The field to look up the values for. Defaults to first string field.return_field (
str
|DeferredAttribute
|None
, default:None
) – The field to return. IfNone
, returns the whole record.
- Return type:
NamedTuple
- Returns:
A
NamedTuple
of lookup information of the field values with a dictionary converter.
See also
Examples
>>> import bionty as bt >>> bt.settings.organism = "human" >>> bt.Gene.from_source(symbol="ADGB-DT").save() >>> lookup = bt.Gene.lookup() >>> lookup.adgb_dt >>> lookup_dict = lookup.dict() >>> lookup_dict['ADGB-DT'] >>> lookup_by_ensembl_id = bt.Gene.lookup(field="ensembl_gene_id") >>> genes.ensg00000002745 >>> lookup_return_symbols = bt.Gene.lookup(field="ensembl_gene_id", return_field="symbol")
- classmethod public(organism=None, source=None)¶
The corresponding
bionty.base.PublicOntology
object.Note that the source is auto-configured and tracked via
bionty.Source
. :rtype:PublicOntology
|StaticReference
See also
Example:
import bionty as bt celltype_pub = bt.CellType.public() celltype_pub #> PublicOntology #> Entity: CellType #> Organism: all #> Source: cl, 2023-04-20 #> #terms: 2698
- classmethod search(string, *, field=None, limit=20, case_sensitive=False)¶
Search.
- Parameters:
string (
str
) – The input string to match against the field ontology values.field (
str
|DeferredAttribute
|None
, default:None
) – The field or fields to search. Search all string fields by default.limit (
int
|None
, default:20
) – Maximum amount of top results to return.case_sensitive (
bool
, default:False
) – Whether the match is case sensitive.
- Return type:
- Returns:
A sorted
DataFrame
of search results with a score in columnscore
. Ifreturn_queryset
isTrue
.QuerySet
.
Examples
>>> ulabels = ln.ULabel.from_values(["ULabel1", "ULabel2", "ULabel3"], field="name") >>> ln.save(ulabels) >>> ln.ULabel.search("ULabel2")
- classmethod standardize(values, field=None, *, return_field=None, return_mapper=False, case_sensitive=False, mute=False, source_aware=True, keep='first', synonyms_field='synonyms', organism=None, source=None, strict_source=False)¶
Maps input synonyms to standardized names.
- Parameters:
values (
Iterable
) – Identifiers that will be standardized.field (
str
|DeferredAttribute
|None
, default:None
) – The field representing the standardized names.return_field (
str
|DeferredAttribute
|None
, default:None
) – The field to return. Defaults to field.return_mapper (
bool
, default:False
) – IfTrue
, returns{input_value: standardized_name}
.case_sensitive (
bool
, default:False
) – Whether the mapping is case sensitive.mute (
bool
, default:False
) – Whether to mute logging.source_aware (
bool
, default:True
) – Whether to standardize from public source. Defaults toTrue
for BioRecord registries.keep (
Literal
['first'
,'last'
,False
], default:'first'
) –When a synonym maps to multiple names, determines which duplicates to mark as
pd.DataFrame.duplicated
: -"first"
: returns the first mapped standardized name -"last"
: returns the last mapped standardized name -False
: returns all mapped standardized name.When
keep
isFalse
, the returned list of standardized names will contain nested lists in case of duplicates.When a field is converted into return_field, keep marks which matches to keep when multiple return_field values map to the same field value.
synonyms_field (
str
, default:'synonyms'
) – A field containing the concatenated synonyms.organism (
str
|Record
|None
, default:None
) – An Organism name or record.source (
Record
|None
, default:None
) – Abionty.Source
record that specifies the version to validate against.strict_source (
bool
, default:False
) – Determines the validation behavior against records in the registry. - IfFalse
, validation will include all records in the registry, ignoring the specified source. - IfTrue
, validation will only include records in the registry that are linked to the specified source. Note: this parameter won’t affect validation against public sources.
- Return type:
list
[str
] |dict
[str
,str
]- Returns:
If
return_mapper
isFalse
– a list of standardized names. Otherwise, a dictionary of mapped values with mappable synonyms as keys and standardized names as values.
See also
add_synonym()
Add synonyms.
remove_synonym()
Remove synonyms.
Example:
import bionty as bt # save some gene records bt.Gene.from_values(["A1CF", "A1BG", "BRCA2"], field="symbol", organism="human").save() # standardize gene synonyms gene_synonyms = ["A1CF", "A1BG", "FANCD1", "FANCD20"] bt.Gene.standardize(gene_synonyms) #> ['A1CF', 'A1BG', 'BRCA2', 'FANCD20']
- classmethod using(instance)¶
Use a non-default LaminDB instance.
- Parameters:
instance (
str
|None
) – An instance identifier of form “account_handle/instance_name”.- Return type:
Examples
>>> ln.ULabel.using("account_handle/instance_name").search("ULabel7", field="name") uid score name ULabel7 g7Hk9b2v 100.0 ULabel5 t4Jm6s0q 75.0 ULabel6 r2Xw8p1z 75.0
- classmethod validate(values, field=None, *, mute=False, organism=None, source=None, strict_source=False)¶
Validate values against existing values of a string field.
Note this is strict_source validation, only asserts exact matches.
- Parameters:
values (
list
[str
] |Series
|array
) – Values that will be validated against the field.field (
str
|DeferredAttribute
|None
, default:None
) – The field of values. Examples are'ontology_id'
to map against the source ID or'name'
to map against the ontologies field names.mute (
bool
, default:False
) – Whether to mute logging.organism (
str
|Record
|None
, default:None
) – An Organism name or record.source (
Record
|None
, default:None
) – Abionty.Source
record that specifies the version to validate against.strict_source (
bool
, default:False
) – Determines the validation behavior against records in the registry. - IfFalse
, validation will include all records in the registry, ignoring the specified source. - IfTrue
, validation will only include records in the registry that are linked to the specified source. Note: this parameter won’t affect validation against public sources.
- Return type:
ndarray
- Returns:
A vector of booleans indicating if an element is validated.
See also
Example:
import bionty as bt bt.Gene.from_values(["A1CF", "A1BG", "BRCA2"], field="symbol", organism="human").save() gene_symbols = ["A1CF", "A1BG", "FANCD1", "FANCD20"] bt.Gene.validate(gene_symbols, field=bt.Gene.symbol, organism="human") #> array([ True, True, False, False])
Methods¶
- add_synonym(synonym, force=False, save=None)¶
Add synonyms to a record.
- Parameters:
synonym (
str
|list
[str
] |Series
|array
) – The synonyms to add to the record.force (
bool
, default:False
) – Whether to add synonyms even if they are already synonyms of other records.save (
bool
|None
, default:None
) – Whether to save the record to the database.
See also
remove_synonym()
Remove synonyms.
Example:
import bionty as bt # save "T cell" record record = bt.CellType.from_source(name="T cell").save() record.synonyms #> "T-cell|T lymphocyte|T-lymphocyte" # add a synonym record.add_synonym("T cells") record.synonyms #> "T cells|T-cell|T-lymphocyte|T lymphocyte"
- delete()¶
Delete.
- Return type:
None
- remove_synonym(synonym)¶
Remove synonyms from a record.
- Parameters:
synonym (
str
|list
[str
] |Series
|array
) – The synonym values to remove.
See also
add_synonym()
Add synonyms
Example:
import bionty as bt # save "T cell" record record = bt.CellType.from_source(name="T cell").save() record.synonyms #> "T-cell|T lymphocyte|T-lymphocyte" # remove a synonym record.remove_synonym("T-cell") record.synonyms #> "T lymphocyte|T-lymphocyte"
- save(*args, **kwargs)¶
Save the record and its parents recursively.
Example:
import bionty as bt record = bt.CellType.from_source(name="T cell") record.save()
- Return type:
- set_abbr(value)¶
Set value for abbr field and add to synonyms.
- Parameters:
value (
str
) – A value for an abbreviation.
See also
Example:
import bionty as bt # save an experimental factor record scrna = bt.ExperimentalFactor.from_source(name="single-cell RNA sequencing").save() assert scrna.abbr is None assert scrna.synonyms == "single-cell RNA-seq|single-cell transcriptome sequencing|scRNA-seq|single cell RNA sequencing" # set abbreviation scrna.set_abbr("scRNA") assert scrna.abbr == "scRNA" # synonyms are updated assert scrna.synonyms == "scRNA|single-cell RNA-seq|single cell RNA sequencing|single-cell transcriptome sequencing|scRNA-seq"
- view_parents(field=None, with_children=False, distance=5)¶
View parents in an ontology.
- Parameters:
field (
str
|DeferredAttribute
|None
, default:None
) – Field to display on graphwith_children (
bool
, default:False
) – Whether to also show children.distance (
int
, default:5
) – Maximum distance still shown.
Ontological hierarchies:
ULabel
(project & sub-project),CellType
(cell type & subtype).Examples
>>> import bionty as bt >>> bt.Tissue.from_source(name="subsegmental bronchus").save() >>> record = bt.Tissue.get(name="respiratory tube") >>> record.view_parents() >>> tissue.view_parents(with_children=True)