###### Ethnicity

lamindb provides access to the following public Ethnicity ontologies
through bionty:

1. Human Ancestry Ontology

Here we show how to access and search Ethnicity ontologies to
standardize new data.

 import bionty as bt
 import pandas as pd

##### PublicOntology objects

Let us create a public ontology accessor with ".public" method, which
chooses a default public ontology source from "Source". It's a
PublicOntology object, which you can think about as a public registry:

 ethnicitys = bt.Ethnicity.public(organism="human")
 ethnicitys

As for registries, you can export the ontology as a "DataFrame":

 df = ethnicitys.to_dataframe()
 df.head()

Unlike registries, you can also export it as a Pronto object via
"public.ontology".

##### Look up terms

As for registries, terms can be looked up with auto-complete:

 lookup = ethnicitys.lookup()

The "." accessor provides normalized terms (lower case, only contains
alphanumeric characters and underscores):

 lookup.american

To look up the exact original strings, convert the lookup object to
dict and use the "[]" accessor:

 lookup_dict = lookup.dict()
 lookup_dict["American"]

By default, the "name" field is used to generate lookup keys. You can
specify another field to look up:

 lookup = ethnicitys.lookup(ethnicitys.ontology_id)

 lookup.hancestro_0463

##### Search terms

Search behaves in the same way as it does for registries:

 ethnicitys.search("American").head(3)

By default, search also covers synonyms and all other fields
containing strings:

 ethnicitys.search("Caucasian").head(3)

Search specific field (by default, search is done on all fields
containing strings):

 ethnicitys.search(
 "General characterisation of the Ancestry of a population",
 field=ethnicitys.definition,
 ).head()

##### Standardize Ethnicity identifiers

Let us generate a "DataFrame" that stores a number of Ethnicity
identifiers, some of which corrupted:

 df_orig = pd.DataFrame(
 index=[
 "Mende",
 "European",
 "South Asian",
 "Arab",
 "This ethnicity does not exist",
 ]
 )
 df_orig

We can check whether any of our values are validated against the
ontology reference:

 validated = ethnicitys.validate(df_orig.index, ethnicitys.name)
 df_orig.index[~validated]

##### Ontology source versions

For any given entity, we can choose from a number of versions:

 bt.Source.filter(entity="bionty.Ethnicity").to_dataframe()

 # only lists the sources that are currently used
 bt.Source.filter(entity="bionty.Ethnicity", currently_used=True).to_dataframe()

When instantiating a Bionty object, we can choose a source or version:

 source = bt.Source.filter(
 name="hancestro", organism="human"
 ).first()
 ethnicitys= bt.Ethnicity.public(source=source)
 ethnicitys

The currently used ontologies can be displayed using:

 bt.Source.filter(currently_used=True).to_dataframe()