bionty.Protein

class bionty.Protein(name: str | None, uniprotkb_id: str | None, synonyms: str | None, length: int | None, gene_symbol: str | None, ensembl_gene_ids: str | None, organism: Organism | None, source: Source | None)

Bases: BioRecord, TracksRun, TracksUpdates

Proteins - Uniprot.

Notes

For more info, see tutorials Manage biological registries and Protein.

Bulk create records via from_values().

Example:

import bionty as bt

record = bt.Protein.from_source(name="Synaptotagmin-15B", organism="human")
record = bt.Protein.from_source(gene_symbol="SYT15B", organism="human")

Simple fields

uid: str

A universal id (base62-encoded hash of defining fields).

name: str | None

Unique name of a protein.

uniprotkb_id: str | None

UniProt protein ID, 6 alphanumeric characters, possibly suffixed by 4 more.

synonyms: str | None

Bar-separated (|) synonyms that correspond to this protein.

description: str | None

Description of the protein.

length: int | None

Length of the protein sequence.

gene_symbol: str | None

The primary gene symbol corresponds to this protein.

ensembl_gene_ids: str | None

Bar-separated (|) Ensembl Gene IDs that correspond to this protein.

created_at: datetime

Time of creation of record.

updated_at: datetime

Time of last update to record.

Relational fields

space: Space

The space in which the record lives.

created_by: User

Creator of record.

run: Run | None

Run that created record.

source

Source this record associates with.

organism: Organism

Organism this protein associates with.

artifacts: Artifact

Artifacts linked to the protein.

schemas: Schema

Featuresets linked to this protein.

Class methods

classmethod add_source(source, df=None)

Add a source of the entity.

Parameters:
  • source (Source) – Source record to add (this can be from another entity).

  • df (DataFrame | None, default: None) – DataFrame to add to the source.dataframe_artifact.

  • currently_used – Whether to set this source as currently used.

Return type:

Source

Returns:

A Source record with this entity.

Example:

import bionty as bt

internal_source = bt.Source(
    entity="bionty.Gene",
    name="internal",
    version="0.0.1",
    organism="rabbit",
    description="internal gene reference",
).save()

source_df = pd.DataFrame(
    {
        "ensembl_gene_id": ["ENSOCUG00000017195"],
        "symbol": ["SEL1L3"],
        "description": ["SEL1L family member 3"],
    }
)

bt.Gene.add_source(internal_source, df)
classmethod df(include=None, features=False, limit=100)

Convert to pd.DataFrame.

By default, shows all direct fields, except updated_at.

Use arguments include or feature to include other data.

Parameters:
  • include (str | list[str] | None, default: None) – Related fields to include as columns. Takes strings of form "ulabels__name", "cell_types__name", etc. or a list of such strings.

  • features (bool | list[str], default: False) – If True, map all features of the Feature registry onto the resulting DataFrame. Only available for Artifact.

  • limit (int, default: 100) – Maximum number of rows to display from a Pandas DataFrame. Defaults to 100 to reduce database load.

Return type:

DataFrame

Examples

Include the name of the creator in the DataFrame:

>>> ln.ULabel.df(include="created_by__name"])

Include display of features for Artifact:

>>> df = ln.Artifact.df(features=True)
>>> ln.view(df)  # visualize with type annotations

Only include select features:

>>> df = ln.Artifact.df(features=["cell_type_by_expert", "cell_type_by_model"])
classmethod filter(*queries, **expressions)

Query records.

Parameters:
  • queries – One or multiple Q objects.

  • expressions – Fields and values passed as Django query expressions.

Return type:

QuerySet

Returns:

A QuerySet.

See also

Examples

>>> ln.ULabel(name="my label").save()
>>> ln.ULabel.filter(name__startswith="my").df()
classmethod from_source(*, name=None, uniprotkb_id=None, gene_symbol=None, organism=None, source=None, mute=False, **kwargs)

Create a Protein record from source based on a single identifying field.

Parameters:
  • name (str | None, default: None) – Protein name (e.g. “Synaptotagmin-15B”)

  • uniprotkb_id (str | None, default: None) – UniProt protein ID (e.g. “Q8N6N3”)

  • gene_symbol (str | None, default: None) – Gene symbol (e.g. “SYT15B”)

  • organism (str | Organism | None, default: None) – Organism name or Organism record source: Optional Source record to use

  • mute (bool, default: False) – Whether to suppress logging

Return type:

Protein | list[Protein] | None

Returns:

A single Protein record, list of Protein records, or None if not found

Example:

import bionty as bt

record = bt.Protein.from_source(name="Synaptotagmin-15B", organism="human")
record = bt.Protein.from_source(uniprotkb_id="Q8N6N3")
record = bt.Protein.from_source(gene_symbol="SYT15B", organism="human")
classmethod from_values(values, field=None, create=False, organism=None, source=None, mute=False)

Bulk create validated records by parsing values for an identifier such as a name or an id).

Parameters:
  • values (list[str] | Series | array) – A list of values for an identifier, e.g. ["name1", "name2"].

  • field (str | DeferredAttribute | None, default: None) – A Record field to look up, e.g., bt.CellMarker.name.

  • create (bool, default: False) – Whether to create records if they don’t exist.

  • organism (Record | str | None, default: None) – A bionty.Organism name or record.

  • source (Record | None, default: None) – A bionty.Source record to validate against to create records for.

  • mute (bool, default: False) – Whether to mute logging.

Return type:

RecordList

Returns:

A list of validated records. For bionty registries. Also returns knowledge-coupled records.

Notes

For more info, see tutorial: Manage biological registries.

Example:

import bionty as bt

# Bulk create from non-validated values will log warnings & returns empty list
ulabels = ln.ULabel.from_values(["benchmark", "prediction", "test"])
assert len(ulabels) == 0

# Bulk create records from validated values returns the corresponding existing records
ulabels = ln.ULabel.from_values(["benchmark", "prediction", "test"], create=True).save()
assert len(ulabels) == 3

# Bulk create records from public reference
bt.CellType.from_values(["T cell", "B cell"]).save()
classmethod get(idlike=None, **expressions)

Get a single record.

Parameters:
  • idlike (int | str | None, default: None) – Either a uid stub, uid or an integer id.

  • expressions – Fields and values passed as Django query expressions.

Raises:

lamindb.errors.DoesNotExist – In case no matching record is found.

Return type:

Record

See also

Examples:

ulabel = ln.ULabel.get("FvtpPJLJ")
ulabel = ln.ULabel.get(name="my-label")
classmethod import_source(source=None, update_records=False, *, organism=None, ignore_conflicts=True)

Bulk save records from a Bionty ontology.

Use this method to initialize your registry with public ontology.

Parameters:
  • source (Source | None, default: None) – Source record to import records from.

  • update_records (bool, default: False) – Whether to update existing records with the new source.

  • organism (str | Record | None, default: None) – Organism name or record. Required for entities with a required organism foreign key when no source is passed.

  • ignore_conflicts (bool, default: True) – Whether to ignore conflicts during bulk record creation.

Example:

import bionty as bt

# import all records from a source
bt.CellType.import_source(source1)

# update existing records with a new source
bt.CellType.import_source(source2, update_records=True)
classmethod inspect(values, field=None, *, mute=False, organism=None, source=None, strict_source=False)

Inspect if values are mappable to a field.

Being mappable means that an exact match exists.

Parameters:
  • values (list[str] | Series | array) – Values that will be checked against the field.

  • field (str | DeferredAttribute | None, default: None) – The field of values. Examples are 'ontology_id' to map against the source ID or 'name' to map against the ontologies field names.

  • mute (bool, default: False) – Whether to mute logging.

  • organism (str | Record | None, default: None) – An Organism name or record.

  • source (Record | None, default: None) – A bionty.Source record that specifies the version to inspect against.

  • strict_source (bool, default: False) – Determines the validation behavior against records in the registry. - If False, validation will include all records in the registry, ignoring the specified source. - If True, validation will only include records in the registry that are linked to the specified source. Note: this parameter won’t affect validation against public sources.

Return type:

InspectResult

See also

validate()

Example:

import bionty as bt

# save some gene records
bt.Gene.from_values(["A1CF", "A1BG", "BRCA2"], field="symbol", organism="human").save()

# inspect gene symbols
gene_symbols = ["A1CF", "A1BG", "FANCD1", "FANCD20"]
result = bt.Gene.inspect(gene_symbols, field=bt.Gene.symbol, organism="human")
assert result.validated == ["A1CF", "A1BG"]
assert result.non_validated == ["FANCD1", "FANCD20"]
classmethod lookup(field=None, return_field=None)

Return an auto-complete object for a field.

Parameters:
  • field (str | DeferredAttribute | None, default: None) – The field to look up the values for. Defaults to first string field.

  • return_field (str | DeferredAttribute | None, default: None) – The field to return. If None, returns the whole record.

Return type:

NamedTuple

Returns:

A NamedTuple of lookup information of the field values with a dictionary converter.

See also

search()

Examples

>>> import bionty as bt
>>> bt.settings.organism = "human"
>>> bt.Gene.from_source(symbol="ADGB-DT").save()
>>> lookup = bt.Gene.lookup()
>>> lookup.adgb_dt
>>> lookup_dict = lookup.dict()
>>> lookup_dict['ADGB-DT']
>>> lookup_by_ensembl_id = bt.Gene.lookup(field="ensembl_gene_id")
>>> genes.ensg00000002745
>>> lookup_return_symbols = bt.Gene.lookup(field="ensembl_gene_id", return_field="symbol")
classmethod public(organism=None, source=None)

The corresponding bionty.base.PublicOntology object.

Note that the source is auto-configured and tracked via bionty.Source. :rtype: PublicOntology | StaticReference

Example:

import bionty as bt

celltype_pub = bt.CellType.public()
celltype_pub
#> PublicOntology
#> Entity: CellType
#> Organism: all
#> Source: cl, 2023-04-20
#> #terms: 2698
classmethod search(string, *, field=None, limit=20, case_sensitive=False)

Search.

Parameters:
  • string (str) – The input string to match against the field ontology values.

  • field (str | DeferredAttribute | None, default: None) – The field or fields to search. Search all string fields by default.

  • limit (int | None, default: 20) – Maximum amount of top results to return.

  • case_sensitive (bool, default: False) – Whether the match is case sensitive.

Return type:

QuerySet

Returns:

A sorted DataFrame of search results with a score in column score. If return_queryset is True. QuerySet.

See also

filter() lookup()

Examples

>>> ulabels = ln.ULabel.from_values(["ULabel1", "ULabel2", "ULabel3"], field="name")
>>> ln.save(ulabels)
>>> ln.ULabel.search("ULabel2")
classmethod standardize(values, field=None, *, return_field=None, return_mapper=False, case_sensitive=False, mute=False, source_aware=True, keep='first', synonyms_field='synonyms', organism=None, source=None, strict_source=False)

Maps input synonyms to standardized names.

Parameters:
  • values (Iterable) – Identifiers that will be standardized.

  • field (str | DeferredAttribute | None, default: None) – The field representing the standardized names.

  • return_field (str | DeferredAttribute | None, default: None) – The field to return. Defaults to field.

  • return_mapper (bool, default: False) – If True, returns {input_value: standardized_name}.

  • case_sensitive (bool, default: False) – Whether the mapping is case sensitive.

  • mute (bool, default: False) – Whether to mute logging.

  • source_aware (bool, default: True) – Whether to standardize from public source. Defaults to True for BioRecord registries.

  • keep (Literal['first', 'last', False], default: 'first') –

    When a synonym maps to multiple names, determines which duplicates to mark as pd.DataFrame.duplicated: - "first": returns the first mapped standardized name - "last": returns the last mapped standardized name - False: returns all mapped standardized name.

    When keep is False, the returned list of standardized names will contain nested lists in case of duplicates.

    When a field is converted into return_field, keep marks which matches to keep when multiple return_field values map to the same field value.

  • synonyms_field (str, default: 'synonyms') – A field containing the concatenated synonyms.

  • organism (str | Record | None, default: None) – An Organism name or record.

  • source (Record | None, default: None) – A bionty.Source record that specifies the version to validate against.

  • strict_source (bool, default: False) – Determines the validation behavior against records in the registry. - If False, validation will include all records in the registry, ignoring the specified source. - If True, validation will only include records in the registry that are linked to the specified source. Note: this parameter won’t affect validation against public sources.

Return type:

list[str] | dict[str, str]

Returns:

If return_mapper is False – a list of standardized names. Otherwise, a dictionary of mapped values with mappable synonyms as keys and standardized names as values.

See also

add_synonym()

Add synonyms.

remove_synonym()

Remove synonyms.

Example:

import bionty as bt

# save some gene records
bt.Gene.from_values(["A1CF", "A1BG", "BRCA2"], field="symbol", organism="human").save()

# standardize gene synonyms
gene_synonyms = ["A1CF", "A1BG", "FANCD1", "FANCD20"]
bt.Gene.standardize(gene_synonyms)
#> ['A1CF', 'A1BG', 'BRCA2', 'FANCD20']
classmethod using(instance)

Use a non-default LaminDB instance.

Parameters:

instance (str | None) – An instance identifier of form “account_handle/instance_name”.

Return type:

QuerySet

Examples

>>> ln.ULabel.using("account_handle/instance_name").search("ULabel7", field="name")
            uid    score
name
ULabel7  g7Hk9b2v  100.0
ULabel5  t4Jm6s0q   75.0
ULabel6  r2Xw8p1z   75.0
classmethod validate(values, field=None, *, mute=False, organism=None, source=None, strict_source=False)

Validate values against existing values of a string field.

Note this is strict_source validation, only asserts exact matches.

Parameters:
  • values (list[str] | Series | array) – Values that will be validated against the field.

  • field (str | DeferredAttribute | None, default: None) – The field of values. Examples are 'ontology_id' to map against the source ID or 'name' to map against the ontologies field names.

  • mute (bool, default: False) – Whether to mute logging.

  • organism (str | Record | None, default: None) – An Organism name or record.

  • source (Record | None, default: None) – A bionty.Source record that specifies the version to validate against.

  • strict_source (bool, default: False) – Determines the validation behavior against records in the registry. - If False, validation will include all records in the registry, ignoring the specified source. - If True, validation will only include records in the registry that are linked to the specified source. Note: this parameter won’t affect validation against public sources.

Return type:

ndarray

Returns:

A vector of booleans indicating if an element is validated.

See also

inspect()

Example:

import bionty as bt

bt.Gene.from_values(["A1CF", "A1BG", "BRCA2"], field="symbol", organism="human").save()

gene_symbols = ["A1CF", "A1BG", "FANCD1", "FANCD20"]
bt.Gene.validate(gene_symbols, field=bt.Gene.symbol, organism="human")
#> array([ True,  True, False, False])

Methods

add_synonym(synonym, force=False, save=None)

Add synonyms to a record.

Parameters:
  • synonym (str | list[str] | Series | array) – The synonyms to add to the record.

  • force (bool, default: False) – Whether to add synonyms even if they are already synonyms of other records.

  • save (bool | None, default: None) – Whether to save the record to the database.

See also

remove_synonym()

Remove synonyms.

Example:

import bionty as bt

# save "T cell" record
record = bt.CellType.from_source(name="T cell").save()
record.synonyms
#> "T-cell|T lymphocyte|T-lymphocyte"

# add a synonym
record.add_synonym("T cells")
record.synonyms
#> "T cells|T-cell|T-lymphocyte|T lymphocyte"
delete()

Delete.

Return type:

None

query_children()

Query children in an ontology.

Return type:

QuerySet

query_parents()

Query parents in an ontology.

Return type:

QuerySet

remove_synonym(synonym)

Remove synonyms from a record.

Parameters:

synonym (str | list[str] | Series | array) – The synonym values to remove.

See also

add_synonym()

Add synonyms

Example:

import bionty as bt

# save "T cell" record
record = bt.CellType.from_source(name="T cell").save()
record.synonyms
#> "T-cell|T lymphocyte|T-lymphocyte"

# remove a synonym
record.remove_synonym("T-cell")
record.synonyms
#> "T lymphocyte|T-lymphocyte"
save(*args, **kwargs)

Save the record and its parents recursively.

Example:

import bionty as bt

record = bt.CellType.from_source(name="T cell")
record.save()
Return type:

BioRecord

set_abbr(value)

Set value for abbr field and add to synonyms.

Parameters:

value (str) – A value for an abbreviation.

See also

add_synonym()

Example:

import bionty as bt

# save an experimental factor record
scrna = bt.ExperimentalFactor.from_source(name="single-cell RNA sequencing").save()
assert scrna.abbr is None
assert scrna.synonyms == "single-cell RNA-seq|single-cell transcriptome sequencing|scRNA-seq|single cell RNA sequencing"

# set abbreviation
scrna.set_abbr("scRNA")
assert scrna.abbr == "scRNA"
# synonyms are updated
assert scrna.synonyms == "scRNA|single-cell RNA-seq|single cell RNA sequencing|single-cell transcriptome sequencing|scRNA-seq"
view_parents(field=None, with_children=False, distance=5)

View parents in an ontology.

Parameters:
  • field (str | DeferredAttribute | None, default: None) – Field to display on graph

  • with_children (bool, default: False) – Whether to also show children.

  • distance (int, default: 5) – Maximum distance still shown.

Ontological hierarchies: ULabel (project & sub-project), CellType (cell type & subtype).

Examples

>>> import bionty as bt
>>> bt.Tissue.from_source(name="subsegmental bronchus").save()
>>> record = bt.Tissue.get(name="respiratory tube")
>>> record.view_parents()
>>> tissue.view_parents(with_children=True)