lamindb.core.QuerySet

class lamindb.core.QuerySet(model=None, query=None, using=None, hints=None)

Bases: QuerySet

Sets of records returned by queries.

See also

django QuerySet

Examples

>>> ln.ULabel(name="my label").save()
>>> queryset = ln.ULabel.filter(name="my label")
>>> queryset

Attributes

property db

Return the database used if this query is executed now.

property ordered

Return True if the QuerySet is ordered – i.e. has an order_by() clause or a default ordering on the model (or is empty).

property query

Class methods

classmethod as_manager()

Methods

async aaggregate(*args, **kwargs)
async abulk_create(objs, batch_size=None, ignore_conflicts=False, update_conflicts=False, update_fields=None, unique_fields=None)
async abulk_update(objs, fields, batch_size=None)
async acontains(obj)
async acount()
async acreate(**kwargs)
async aearliest(*fields)
async aexists()
async aexplain(*, format=None, **options)
async afirst()
async aget(*args, **kwargs)
async aget_or_create(defaults=None, **kwargs)
aggregate(*args, **kwargs)

Return a dictionary containing the calculations (aggregation) over the current queryset.

If args is present the expression is passed as a kwarg using the Aggregate object’s default alias.

async ain_bulk(id_list=None, *, field_name='pk')
async aiterator(chunk_size=2000)

An asynchronous iterator over the results from applying this QuerySet to the database.

async alast()
async alatest(*fields)
alias(*args, **kwargs)

Return a query set with added aliases for extra data or aggregations.

all()

Return a new QuerySet that is a copy of the current one. This allows a QuerySet to proxy for a model manager in some cases.

annotate(*args, **kwargs)

Return a query set in which the returned objects have been annotated with extra data or aggregations.

async aupdate(**kwargs)
async aupdate_or_create(defaults=None, create_defaults=None, **kwargs)
bulk_create(objs, batch_size=None, ignore_conflicts=False, update_conflicts=False, update_fields=None, unique_fields=None)

Insert each of the instances into the database. Do not call save() on each of the instances, do not send any pre/post_save signals, and do not set the primary key attribute if it is an autoincrement field (except if features.can_return_rows_from_bulk_insert=True). Multi-table models are not supported.

bulk_update(objs, fields, batch_size=None)

Update the given fields in each of the given objects in the database.

complex_filter(filter_obj)

Return a new QuerySet instance with filter_obj added to the filters.

filter_obj can be a Q object or a dictionary of keyword lookup arguments.

This exists to support framework features such as ‘limit_choices_to’, and usually it will be more natural to use other methods.

contains(obj)

Return True if the QuerySet contains the provided obj, False otherwise.

count()

Perform a SELECT COUNT() and return the number of records as an integer.

If the QuerySet is already fully cached, return the length of the cached results set to avoid multiple SELECT COUNT(*) calls.

create(**kwargs)

Create a new object with the given kwargs, saving it to the database and returning the created object.

dates(field_name, kind, order='ASC')

Return a list of date objects representing all available dates for the given field_name, scoped to ‘kind’.

datetimes(field_name, kind, order='ASC', tzinfo=None)

Return a list of datetime objects representing all available datetimes for the given field_name, scoped to ‘kind’.

defer(*fields)

Defer the loading of data for certain fields until they are accessed. Add the set of deferred fields to any existing set of deferred fields. The only exception to this is if None is passed in as the only parameter, in which case removal all deferrals.

delete(*args, **kwargs)

Delete all records in the query set.

df(include=None, join='inner')

Convert to pd.DataFrame.

By default, shows all direct fields, except updated_at.

Use parameter include to include other fields.

Parameters:
  • include (str | list[str] | None, default: None) – Related fields to include as columns. Takes strings of form "labels__name", "cell_types__name", etc. or a list of such strings.

  • join (str, default: 'inner') – The join parameter of pandas.

  • limit – Maximum number of rows to display from a Pandas DataFrame. Defaults to 100 to reduce database load.

Return type:

DataFrame

Examples

>>> labels = [ln.ULabel(name="Label {i}") for i in range(3)]
>>> ln.save(labels)
>>> ln.ULabel.filter().df(include=["created_by__name"])
difference(*other_qs)
distinct(*field_names)

Return a new QuerySet instance that will select only distinct results.

earliest(*fields)
exclude(*args, **kwargs)

Return a new QuerySet instance with NOT (args) ANDed to the existing set.

exists()

Return True if the QuerySet would have any results, False otherwise.

explain(*, format=None, **options)

Runs an EXPLAIN on the SQL query this QuerySet would perform, and returns the results.

extra(select=None, where=None, params=None, tables=None, order_by=None, select_params=None)

Add extra SQL fragments to the query.

filter(*queries, **expressions)

Query a set of records.

Return type:

QuerySet

first()

If non-empty, the first result in the query set, otherwise None.

Return type:

Record | None

Examples

>>> queryset.first()
get(idlike=None, **expressions)

Query a single record. Raises error if there are more or none.

Return type:

Record

get_or_create(defaults=None, **kwargs)

Look up an object with the given kwargs, creating one if necessary. Return a tuple of (object, created), where created is a boolean specifying whether an object was created.

in_bulk(id_list=None, *, field_name='pk')

Return a dictionary mapping each of the given IDs to the object with that ID. If id_list isn’t provided, evaluate the entire QuerySet.

inspect(values, field=None, **kwargs)

Inspect if values are mappable to a field.

Being mappable means that an exact match exists.

Parameters:
  • values (List[str] | Series | array) – Values that will be checked against the field.

  • field (str | DeferredAttribute | None, default: None) – The field of values. Examples are 'ontology_id' to map against the source ID or 'name' to map against the ontologies field names.

  • mute – Whether to mute logging.

  • organism – An Organism name or record.

  • source – A bionty.Source record that specifies the version to inspect against.

See also

validate()

Examples

>>> import bionty as bt
>>> bt.settings.organism = "human"
>>> ln.save(bt.Gene.from_values(["A1CF", "A1BG", "BRCA2"], field="symbol"))
>>> gene_symbols = ["A1CF", "A1BG", "FANCD1", "FANCD20"]
>>> result = bt.Gene.inspect(gene_symbols, field=bt.Gene.symbol)
>>> result.validated
['A1CF', 'A1BG']
>>> result.non_validated
['FANCD1', 'FANCD20']
intersection(*other_qs)
iterator(chunk_size=None)

An iterator over the results from applying this QuerySet to the database. chunk_size must be provided for QuerySets that prefetch related objects. Otherwise, a default chunk_size of 2000 is supplied.

last()

Return the last object of a query or None if no match is found.

latest(*fields)

Return the latest object according to fields (if given) or by the model’s Meta.get_latest_by.

latest_version()

Filter every version family by latest version.

Return type:

QuerySet

list(field=None)

Populate a list with the results.

Return type:

list[Record]

Examples

>>> queryset.list()  # list of records
>>> queryset.list("name")  # list of values
lookup(field=None, **kwargs)

Return an auto-complete object for a field.

Parameters:
  • field (str | DeferredAttribute | None, default: None) – The field to look up the values for. Defaults to first string field.

  • return_field – The field to return. If None, returns the whole record.

Return type:

NamedTuple

Returns:

A NamedTuple of lookup information of the field values with a dictionary converter.

See also

search()

Examples

>>> import bionty as bt
>>> bt.settings.organism = "human"
>>> bt.Gene.from_source(symbol="ADGB-DT").save()
>>> lookup = bt.Gene.lookup()
>>> lookup.adgb_dt
>>> lookup_dict = lookup.dict()
>>> lookup_dict['ADGB-DT']
>>> lookup_by_ensembl_id = bt.Gene.lookup(field="ensembl_gene_id")
>>> genes.ensg00000002745
>>> lookup_return_symbols = bt.Gene.lookup(field="ensembl_gene_id", return_field="symbol")
none()

Return an empty QuerySet.

one()

Exactly one result. Raises error if there are more or none.

Return type:

Record

one_or_none()

At most one result. Returns it if there is one, otherwise returns None.

Return type:

Record | None

Examples

>>> ln.ULabel.filter(name="benchmark").one_or_none()
>>> ln.ULabel.filter(name="non existing label").one_or_none()
only(*fields)

Essentially, the opposite of defer(). Only the fields passed into this method and that are not already specified as deferred are loaded immediately when the queryset is evaluated.

order_by(*field_names)

Return a new QuerySet instance with the ordering changed.

Return a new QuerySet instance that will prefetch the specified Many-To-One and Many-To-Many related objects when the QuerySet is evaluated.

When prefetch_related() is called more than once, append to the list of prefetch lookups. If prefetch_related(None) is called, clear the list.

raw(raw_query, params=(), translations=None, using=None)
resolve_expression(*args, **kwargs)
reverse()

Reverse the ordering of the QuerySet.

search(string, **kwargs)

Search.

Parameters:
  • string (str) – The input string to match against the field ontology values.

  • field – The field or fields to search. Search all string fields by default.

  • limit – Maximum amount of top results to return.

  • case_sensitive – Whether the match is case sensitive.

Returns:

A sorted DataFrame of search results with a score in column score. If return_queryset is True. QuerySet.

See also

filter() lookup()

Examples

>>> ulabels = ln.ULabel.from_values(["ULabel1", "ULabel2", "ULabel3"], field="name")
>>> ln.save(ulabels)
>>> ln.ULabel.search("ULabel2")
select_for_update(nowait=False, skip_locked=False, of=(), no_key=False)

Return a new QuerySet instance that will select objects with a FOR UPDATE lock.

Return a new QuerySet instance that will select related objects.

If fields are specified, they must be ForeignKey fields and only those related objects are included in the selection.

If select_related(None) is called, clear the list.

standardize(values, field=None, **kwargs)

Maps input synonyms to standardized names.

Parameters:
  • values (Iterable) – Identifiers that will be standardized.

  • field (str | DeferredAttribute | None, default: None) – The field representing the standardized names.

  • return_field – The field to return. Defaults to field.

  • return_mapper – If True, returns {input_value: standardized_name}.

  • case_sensitive – Whether the mapping is case sensitive.

  • mute – Whether to mute logging.

  • public_aware – Whether to standardize from Bionty reference. Defaults to True for Bionty registries.

  • keep

    When a synonym maps to multiple names, determines which duplicates to mark as pd.DataFrame.duplicated:
    • "first": returns the first mapped standardized name

    • "last": returns the last mapped standardized name

    • False: returns all mapped standardized name.

    When keep is False, the returned list of standardized names will contain nested lists in case of duplicates.

    When a field is converted into return_field, keep marks which matches to keep when multiple return_field values map to the same field value.

  • synonyms_field – A field containing the concatenated synonyms.

  • organism – An Organism name or record.

  • source – A bionty.Source record that specifies the version to validate against.

Returns:

If return_mapper is False – a list of standardized names. Otherwise, a dictionary of mapped values with mappable synonyms as keys and standardized names as values.

See also

add_synonym()

Add synonyms.

remove_synonym()

Remove synonyms.

Examples

>>> import bionty as bt
>>> bt.settings.organism = "human"
>>> ln.save(bt.Gene.from_values(["A1CF", "A1BG", "BRCA2"], field="symbol"))
>>> gene_synonyms = ["A1CF", "A1BG", "FANCD1", "FANCD20"]
>>> standardized_names = bt.Gene.standardize(gene_synonyms)
>>> standardized_names
['A1CF', 'A1BG', 'BRCA2', 'FANCD20']
union(*other_qs, all=False)
update(**kwargs)

Update all elements in the current QuerySet, setting all the given fields to the appropriate values.

update_or_create(defaults=None, create_defaults=None, **kwargs)

Look up an object with the given kwargs, updating one with defaults if it exists, otherwise create a new one. Optionally, an object can be created with different values than defaults by using create_defaults. Return a tuple (object, created), where created is a boolean specifying whether an object was created.

using(alias)

Select which database this QuerySet should execute against.

validate(values, field=None, **kwargs)

Validate values against existing values of a string field.

Note this is strict validation, only asserts exact matches.

Parameters:
  • values (List[str] | Series | array) – Values that will be validated against the field.

  • field (str | DeferredAttribute | None, default: None) – The field of values. Examples are 'ontology_id' to map against the source ID or 'name' to map against the ontologies field names.

  • mute – Whether to mute logging.

  • organism – An Organism name or record.

  • source – A bionty.Source record that specifies the version to validate against.

Returns:

A vector of booleans indicating if an element is validated.

See also

inspect()

Examples

>>> import bionty as bt
>>> bt.settings.organism = "human"
>>> ln.save(bt.Gene.from_values(["A1CF", "A1BG", "BRCA2"], field="symbol"))
>>> gene_symbols = ["A1CF", "A1BG", "FANCD1", "FANCD20"]
>>> bt.Gene.validate(gene_symbols, field=bt.Gene.symbol)
array([ True,  True, False, False])
values(*fields, **expressions)
values_list(*fields, flat=False, named=False)