bionty.Protein

class bionty.Protein(name: str | None, uniprotkb_id: str | None, synonyms: str | None, length: int | None, gene_symbol: str | None, ensembl_gene_ids: str | None, organism: Organism | None, source: Source | None)

Bases: BioRecord, TracksRun, TracksUpdates

Proteins - Uniprot.

Notes

For more info, see tutorials Manage biological ontologies and Protein.

Bulk create records via from_values().

Example:

import bionty as bt

record = bt.Protein.from_source(name="Synaptotagmin-15B", organism="human")
record = bt.Protein.from_source(gene_symbol="SYT15B", organism="human")

Simple fields

uid: str

A universal id (base62-encoded hash of defining fields).

name: str | None

Unique name of a protein.

uniprotkb_id: str | None

UniProt protein ID, 6 alphanumeric characters, possibly suffixed by 4 more.

synonyms: str | None

Bar-separated (|) synonyms that correspond to this protein.

description: str | None

Description of the protein.

length: int | None

Length of the protein sequence.

gene_symbol: str | None

The primary gene symbol corresponds to this protein.

ensembl_gene_ids: str | None

Bar-separated (|) Ensembl Gene IDs that correspond to this protein.

is_locked: bool

Whether the record is locked for edits.

created_at: datetime

Time of creation of record.

updated_at: datetime

Time of last update to record.

Relational fields

branch: Branch

Whether record is on a branch or in another “special state”.

space: Space

The space in which the record lives.

created_by: User

Creator of record.

run: Run | None

Run that created record.

source: Source

Source this record associates with.

organism: Organism

Organism this protein associates with.

artifacts: Artifact

Artifacts linked to the protein.

records: Record

Records linked to the protein.

schemas: Schema

Featuresets linked to this protein.

Class methods

classmethod from_source(cls, *, name=None, uniprotkb_id=None, gene_symbol=None, organism=None, source=None, mute=False, **kwargs)

Create a Protein record from source based on a single identifying field.

Parameters:
  • name (str | None, default: None) – Protein name (e.g. “Synaptotagmin-15B”)

  • uniprotkb_id (str | None, default: None) – UniProt protein ID (e.g. “Q8N6N3”)

  • gene_symbol (str | None, default: None) – Gene symbol (e.g. “SYT15B”)

  • organism (str | Organism | None, default: None) – Organism name or Organism record source: Optional Source record to use

  • mute (bool, default: False) – Whether to suppress logging

Return type:

Protein | list[Protein] | None

Returns:

A single Protein record, list of Protein records, or None if not found

Example:

import bionty as bt

record = bt.Protein.from_source(name="Synaptotagmin-15B", organism="human")
record = bt.Protein.from_source(uniprotkb_id="Q8N6N3")
record = bt.Protein.from_source(gene_symbol="SYT15B", organism="human")
classmethod import_source(source=None, update_records=False, *, organism=None, ignore_conflicts=True)

Bulk save records from a Bionty ontology.

Use this method to initialize your registry with public ontology.

Parameters:
  • source (Source | None, default: None) – Source record to import records from.

  • update_records (bool, default: False) –

    If True, update existing records with the new source.

    • If a record has the same metadata in the new source, link the record to the new source.

    • If a record has no artifacts associated, update it’s metadata and link to the new source.

    • If a record associated artifacts, but different name in the new source, create a new record with the new source.

  • organism (str | SQLRecord | None, default: None) – Organism name or record. Required for entities with a required organism foreign key when no source is passed.

  • ignore_conflicts (bool, default: True) – Whether to ignore conflicts during bulk record creation.

Examples:

import bionty as bt

# import all records from a default source
default_sources = bt.Source.filter(entity="bionty.CellType", currently_used=True).to_dataframe()
bt.CellType.import_source()

# import all records from a specific source
source = bt.Source.get(entity="bionty.CellType", source="cl", version="2022-08-16")
bt.CellType.import_source(source)
bt.CellType.to_dataframe()  # all records from the source are now in the registry

# update existing records with a new source (version update)
source = bt.Source.get(entity="bionty.CellType", source="cl", version="2024-08-16")
bt.CellType.import_source(source, update_records=True)
classmethod add_source(source, *, df=None, version=None, organism=None)

Link a source record to the entity with a reference DataFrame.

Creates or retrieves a Source record for the entity and optionally associates it with a DataFrame artifact containing the ontology data. If the source already exists with a DataFrame artifact, returns the existing source.

Parameters:
  • source (Source | PublicOntology | str) –

    Source specification. Can be:

    • Source record: Existing bionty.Source instance

    • PublicOntology: PublicOntology object with source metadata

    • str: Source name (e.g., “mondo”, “cl”, “go”)

  • df (DataFrame | None, default: None) – Optional DataFrame containing ontology data to store as Artifact. If None and source is a PublicOntology, uses the ontology’s DataFrame.

  • version (str | None, default: None) – Source version string. Required when source is str and no existing source found. Examples: “2025-06-03”, “v1.0”, “release-112”

  • organism (str | None, default: None) – Organism identifier. Required for organism-specific entities when source is str. Use “all” for cross-organism ontologies.

Return type:

Source

Examples

Add a source by name with version and organism:

import bionty as bt
source = bt.Disease.add_source("mondo", version="2025-06-03", organism="all")

Add a source to an entity with a custom DataFrame:

import pandas as pd
df = pd.DataFrame({"name": ["disease1"], "ontology_id": ["MONDO:123"]})
source = bt.Source(
    entity="bionty.Disease",
    name="new mondo",
    version="99.999",
    organism="human",
)
source = bt.Disease.add_source(source=source, df=df)

Add from existing PublicOntology:

pub_ont = bt.Disease.public()
source = bt.Disease.add_source(pub_ont)

Add organism-specific source:

source = bt.Gene.add_source("ensembl", version="release-112", organism="human")
classmethod public(organism=None, source=None)

The corresponding bionty.base.PublicOntology object.

Note that the source is auto-configured and tracked via bionty.Source.

Parameters:
  • organism (str | SQLRecord | None, default: None) – Organism name or record to filter by

  • source (Source | None, default: None) – Source record to use instead of default

Return type:

PublicOntology | StaticReference

Example:

import bionty as bt

# default source
celltype_pub = bt.CellType.public()
celltype_pub
#> PublicOntology
#> Entity: CellType
#> Organism: all
#> Source: cl, 2023-04-20
#> #terms: 2698

# default source of a organism
gene_pub = bt.Gene.public(organism="mouse")
gene_pub
#> PublicOntology
#> Entity: Gene
#> Organism: mouse
#> Source: ensembl, release-112
#> #terms: 57510
classmethod filter(*queries, **expressions)

Query records.

Parameters:
  • queries – One or multiple Q objects.

  • expressions – Fields and values passed as Django query expressions.

Return type:

QuerySet

Returns:

A QuerySet.

See also

Examples

>>> ln.ULabel(name="my label").save()
>>> ln.ULabel.filter(name__startswith="my").to_dataframe()
classmethod get(idlike=None, **expressions)

Get a single record.

Parameters:
  • idlike (int | str | None, default: None) – Either a uid stub, uid or an integer id.

  • expressions – Fields and values passed as Django query expressions.

Raises:

docs:lamindb.errors.DoesNotExist – In case no matching record is found.

Return type:

SQLRecord

See also

Examples

ulabel = ln.ULabel.get("FvtpPJLJ")
ulabel = ln.ULabel.get(name="my-label")
classmethod to_dataframe(include=None, features=False, limit=100)

Convert to pd.DataFrame.

By default, shows all direct fields, except updated_at.

Use arguments include or feature to include other data.

Parameters:
  • include (str | list[str] | None, default: None) – Related fields to include as columns. Takes strings of form "ulabels__name", "cell_types__name", etc. or a list of such strings.

  • features (bool | list[str], default: False) – If a list of feature names, filters Feature down to these features. If True, prints all features with dtypes in the core schema module. If "queryset", infers the features used within the set of artifacts or records. Only available for Artifact and Record.

  • limit (int, default: 100) – Maximum number of rows to display from a Pandas DataFrame. Defaults to 100 to reduce database load.

Return type:

DataFrame

Examples

Include the name of the creator in the DataFrame:

>>> ln.ULabel.to_dataframe(include="created_by__name"])

Include display of features for Artifact:

>>> df = ln.Artifact.to_dataframe(features=True)
>>> ln.view(df)  # visualize with type annotations

Only include select features:

>>> df = ln.Artifact.to_dataframe(features=["cell_type_by_expert", "cell_type_by_model"])
classmethod search(string, *, field=None, limit=20, case_sensitive=False)

Search.

Parameters:
  • string (str) – The input string to match against the field ontology values.

  • field (str | DeferredAttribute | None, default: None) – The field or fields to search. Search all string fields by default.

  • limit (int | None, default: 20) – Maximum amount of top results to return.

  • case_sensitive (bool, default: False) – Whether the match is case sensitive.

Return type:

QuerySet

Returns:

A sorted DataFrame of search results with a score in column score. If return_queryset is True. QuerySet.

See also

filter() lookup()

Examples

>>> ulabels = ln.ULabel.from_values(["ULabel1", "ULabel2", "ULabel3"], field="name")
>>> ln.save(ulabels)
>>> ln.ULabel.search("ULabel2")
classmethod lookup(field=None, return_field=None)

Return an auto-complete object for a field.

Parameters:
  • field (str | DeferredAttribute | None, default: None) – The field to look up the values for. Defaults to first string field.

  • return_field (str | DeferredAttribute | None, default: None) – The field to return. If None, returns the whole record.

  • keep – When multiple records are found for a lookup, how to return the records. - "first": return the first record. - "last": return the last record. - False: return all records.

Return type:

NamedTuple

Returns:

A NamedTuple of lookup information of the field values with a dictionary converter.

See also

search()

Examples

>>> import bionty as bt
>>> bt.settings.organism = "human"
>>> bt.Gene.from_source(symbol="ADGB-DT").save()
>>> lookup = bt.Gene.lookup()
>>> lookup.adgb_dt
>>> lookup_dict = lookup.dict()
>>> lookup_dict['ADGB-DT']
>>> lookup_by_ensembl_id = bt.Gene.lookup(field="ensembl_gene_id")
>>> genes.ensg00000002745
>>> lookup_return_symbols = bt.Gene.lookup(field="ensembl_gene_id", return_field="symbol")
classmethod using(instance)

Use a non-default LaminDB instance.

Parameters:

instance (str | None) – An instance identifier of form “account_handle/instance_name”.

Return type:

QuerySet

Examples

>>> ln.ULabel.using("account_handle/instance_name").search("ULabel7", field="name")
            uid    score
name
ULabel7  g7Hk9b2v  100.0
ULabel5  t4Jm6s0q   75.0
ULabel6  r2Xw8p1z   75.0
classmethod inspect(values, field=None, *, mute=False, organism=None, source=None, from_source=True, strict_source=False)

Inspect if values are mappable to a field.

Being mappable means that an exact match exists.

Parameters:
  • values (list[str] | Series | array) – Values that will be checked against the field.

  • field (str | DeferredAttribute | None, default: None) – The field of values. Examples are 'ontology_id' to map against the source ID or 'name' to map against the ontologies field names.

  • mute (bool, default: False) – Whether to mute logging.

  • organism (str | SQLRecord | None, default: None) – An Organism name or record.

  • source (SQLRecord | None, default: None) – A bionty.Source record that specifies the version to inspect against.

  • strict_source (bool, default: False) – Determines the validation behavior against records in the registry. - If False, validation will include all records in the registry, ignoring the specified source. - If True, validation will only include records in the registry that are linked to the specified source. Note: this parameter won’t affect validation against public sources.

Return type:

bionty.base.dev.InspectResult

See also

validate()

Example:

import bionty as bt

# save some gene records
bt.Gene.from_values(["A1CF", "A1BG", "BRCA2"], field="symbol", organism="human").save()

# inspect gene symbols
gene_symbols = ["A1CF", "A1BG", "FANCD1", "FANCD20"]
result = bt.Gene.inspect(gene_symbols, field=bt.Gene.symbol, organism="human")
assert result.validated == ["A1CF", "A1BG"]
assert result.non_validated == ["FANCD1", "FANCD20"]
classmethod validate(values, field=None, *, mute=False, organism=None, source=None, strict_source=False)

Validate values against existing values of a string field.

Note this is strict_source validation, only asserts exact matches.

Parameters:
  • values (list[str] | Series | array) – Values that will be validated against the field.

  • field (str | DeferredAttribute | None, default: None) – The field of values. Examples are 'ontology_id' to map against the source ID or 'name' to map against the ontologies field names.

  • mute (bool, default: False) – Whether to mute logging.

  • organism (str | SQLRecord | None, default: None) – An Organism name or record.

  • source (SQLRecord | None, default: None) – A bionty.Source record that specifies the version to validate against.

  • strict_source (bool, default: False) – Determines the validation behavior against records in the registry. - If False, validation will include all records in the registry, ignoring the specified source. - If True, validation will only include records in the registry that are linked to the specified source. Note: this parameter won’t affect validation against public sources.

Return type:

ndarray

Returns:

A vector of booleans indicating if an element is validated.

See also

inspect()

Example:

import bionty as bt

bt.Gene.from_values(["A1CF", "A1BG", "BRCA2"], field="symbol", organism="human").save()

gene_symbols = ["A1CF", "A1BG", "FANCD1", "FANCD20"]
bt.Gene.validate(gene_symbols, field=bt.Gene.symbol, organism="human")
#> array([ True,  True, False, False])
classmethod from_values(values, field=None, create=False, organism=None, source=None, mute=False)

Bulk create validated records by parsing values for an identifier such as a name or an id).

Parameters:
  • values (list[str] | Series | array) – A list of values for an identifier, e.g. ["name1", "name2"].

  • field (str | DeferredAttribute | None, default: None) – A SQLRecord field to look up, e.g., bt.CellMarker.name.

  • create (bool, default: False) – Whether to create records if they don’t exist.

  • organism (SQLRecord | str | None, default: None) – A bionty.Organism name or record.

  • source (SQLRecord | None, default: None) – A bionty.Source record to validate against to create records for.

  • mute (bool, default: False) – Whether to mute logging.

Return type:

SQLRecordList

Returns:

A list of validated records. For bionty registries. Also returns knowledge-coupled records.

Notes

For more info, see tutorial: Manage biological ontologies.

Example:

import bionty as bt

# Bulk create from non-validated values will log warnings & returns empty list
ulabels = ln.ULabel.from_values(["benchmark", "prediction", "test"])
assert len(ulabels) == 0

# Bulk create records from validated values returns the corresponding existing records
ulabels = ln.ULabel.from_values(["benchmark", "prediction", "test"], create=True).save()
assert len(ulabels) == 3

# Bulk create records from public reference
bt.CellType.from_values(["T cell", "B cell"]).save()
classmethod standardize(values, field=None, *, return_field=None, return_mapper=False, case_sensitive=False, mute=False, source_aware=True, keep='first', synonyms_field='synonyms', organism=None, source=None, strict_source=False)

Maps input synonyms to standardized names.

Parameters:
  • values (Iterable) – Identifiers that will be standardized.

  • field (str | DeferredAttribute | None, default: None) – The field representing the standardized names.

  • return_field (str | DeferredAttribute | None, default: None) – The field to return. Defaults to field.

  • return_mapper (bool, default: False) – If True, returns {input_value: standardized_name}.

  • case_sensitive (bool, default: False) – Whether the mapping is case sensitive.

  • mute (bool, default: False) – Whether to mute logging.

  • source_aware (bool, default: True) – Whether to standardize from public source. Defaults to True for BioRecord registries.

  • keep (Literal['first', 'last', False], default: 'first') –

    When a synonym maps to multiple names, determines which duplicates to mark as pd.DataFrame.duplicated: - "first": returns the first mapped standardized name - "last": returns the last mapped standardized name - False: returns all mapped standardized name.

    When keep is False, the returned list of standardized names will contain nested lists in case of duplicates.

    When a field is converted into return_field, keep marks which matches to keep when multiple return_field values map to the same field value.

  • synonyms_field (str, default: 'synonyms') – A field containing the concatenated synonyms.

  • organism (str | SQLRecord | None, default: None) – An Organism name or record.

  • source (SQLRecord | None, default: None) – A bionty.Source record that specifies the version to validate against.

  • strict_source (bool, default: False) – Determines the validation behavior against records in the registry. - If False, validation will include all records in the registry, ignoring the specified source. - If True, validation will only include records in the registry that are linked to the specified source. Note: this parameter won’t affect validation against public sources.

Return type:

list[str] | dict[str, str]

Returns:

If return_mapper is False – a list of standardized names. Otherwise, a dictionary of mapped values with mappable synonyms as keys and standardized names as values.

See also

add_synonym()

Add synonyms.

remove_synonym()

Remove synonyms.

Example:

import bionty as bt

# save some gene records
bt.Gene.from_values(["A1CF", "A1BG", "BRCA2"], field="symbol", organism="human").save()

# standardize gene synonyms
gene_synonyms = ["A1CF", "A1BG", "FANCD1", "FANCD20"]
bt.Gene.standardize(gene_synonyms)
#> ['A1CF', 'A1BG', 'BRCA2', 'FANCD20']

Methods

save(*args, **kwargs)

Save the record and its parents recursively.

Example:

import bionty as bt

record = bt.CellType.from_source(name="T cell")
record.save()
Return type:

BioRecord

restore()

Restore from trash onto the main branch.

Return type:

None

delete(permanent=None, **kwargs)

Delete record.

Parameters:

permanent (bool | None, default: None) – Whether to permanently delete the record (skips trash). If None, performs soft delete if the record is not already in the trash.

Return type:

None

Examples

For any SQLRecord object record, call:

>>> record.delete()
view_parents(field=None, with_children=False, distance=5)

View parents in an ontology.

Parameters:
  • field (str | DeferredAttribute | None, default: None) – Field to display on graph

  • with_children (bool, default: False) – Whether to also show children.

  • distance (int, default: 5) – Maximum distance still shown.

Ontological hierarchies: ULabel (project & sub-project), CellType (cell type & subtype).

Examples

>>> import bionty as bt
>>> bt.Tissue.from_source(name="subsegmental bronchus").save()
>>> record = bt.Tissue.get(name="respiratory tube")
>>> record.view_parents()
>>> tissue.view_parents(with_children=True)
view_children(field=None, distance=5)

View children in an ontology.

Parameters:
  • field (str | DeferredAttribute | None, default: None) – Field to display on graph

  • distance (int, default: 5) – Maximum distance still shown.

Ontological hierarchies: ULabel (project & sub-project), CellType (cell type & subtype).

Examples

>>> import bionty as bt
>>> bt.Tissue.from_source(name="subsegmental bronchus").save()
>>> record = bt.Tissue.get(name="respiratory tube")
>>> record.view_parents()
>>> tissue.view_parents(with_children=True)
query_parents()

Query parents in an ontology.

Return type:

QuerySet

query_children()

Query children in an ontology.

Return type:

QuerySet

add_synonym(synonym, force=False, save=None)

Add synonyms to a record.

Parameters:
  • synonym (str | list[str] | Series | array) – The synonyms to add to the record.

  • force (bool, default: False) – Whether to add synonyms even if they are already synonyms of other records.

  • save (bool | None, default: None) – Whether to save the record to the database.

See also

remove_synonym()

Remove synonyms.

Example:

import bionty as bt

# save "T cell" record
record = bt.CellType.from_source(name="T cell").save()
record.synonyms
#> "T-cell|T lymphocyte|T-lymphocyte"

# add a synonym
record.add_synonym("T cells")
record.synonyms
#> "T cells|T-cell|T-lymphocyte|T lymphocyte"
remove_synonym(synonym)

Remove synonyms from a record.

Parameters:

synonym (str | list[str] | Series | array) – The synonym values to remove.

See also

add_synonym()

Add synonyms

Example:

import bionty as bt

# save "T cell" record
record = bt.CellType.from_source(name="T cell").save()
record.synonyms
#> "T-cell|T lymphocyte|T-lymphocyte"

# remove a synonym
record.remove_synonym("T-cell")
record.synonyms
#> "T lymphocyte|T-lymphocyte"
set_abbr(value)

Set value for abbr field and add to synonyms.

Parameters:

value (str) – A value for an abbreviation.

See also

add_synonym()

Example:

import bionty as bt

# save an experimental factor record
scrna = bt.ExperimentalFactor.from_source(name="single-cell RNA sequencing").save()
assert scrna.abbr is None
assert scrna.synonyms == "single-cell RNA-seq|single-cell transcriptome sequencing|scRNA-seq|single cell RNA sequencing"

# set abbreviation
scrna.set_abbr("scRNA")
assert scrna.abbr == "scRNA"
# synonyms are updated
assert scrna.synonyms == "scRNA|single-cell RNA-seq|single cell RNA sequencing|single-cell transcriptome sequencing|scRNA-seq"