Query using tiledbsoma

The first guide queried metadata and h5ad artifacts directly through LaminDB.

This guide uses the TileDB-SOMA API to run similar queries.

Setup

Load your LaminDB instance for quering data:

!lamin load laminlabs/cellxgene
💡 connected lamindb: laminlabs/cellxgene
import lamindb as ln
import bionty as bt
import tiledbsoma

census_version = "2024-07-01"
💡 connected lamindb: laminlabs/cellxgene

Create lookup objects

We use metadata records in the laminlabs/cellxgene instance to generate lookups:

human = "homo_sapiens"

features = ln.Feature.lookup(return_field="name")
assays = bt.ExperimentalFactor.lookup(return_field="name")
cell_types = bt.CellType.lookup(return_field="name")
tissues = bt.Tissue.lookup(return_field="name")
ulabels = ln.ULabel.lookup()
suspension_types = ulabels.is_suspension_type.children.all().lookup(return_field="name")

Query data

value_filter = (
    f'{features.tissue} == "{tissues.brain}" and {features.cell_type} in'
    f' ["{cell_types.microglial_cell}", "{cell_types.neuron}"] and'
    f' {features.suspension_type} == "{suspension_types.cell}" and {features.assay} =='
    f' "{assays.ln_10x_3_v3}"'
)
value_filter
'tissue == "brain" and cell_type in ["microglial cell", "neuron"] and suspension_type == "cell" and assay == "10x 3\' v3"'
census_artifact = ln.Artifact.filter(description="Census "+census_version).one()
with census_artifact.open() as census:
    # Reads SOMADataFrame as a slice
    cell_metadata = census["census_data"][human].obs.read(value_filter=value_filter)

    # Concatenates results to pyarrow.Table
    cell_metadata = cell_metadata.concat()

    # Converts to pandas.DataFrame
    cell_metadata = cell_metadata.to_pandas()
cell_metadata.shape
(66418, 28)
cell_metadata.head()
soma_joinid dataset_id assay assay_ontology_term_id cell_type cell_type_ontology_term_id development_stage development_stage_ontology_term_id disease disease_ontology_term_id ... tissue tissue_ontology_term_id tissue_type tissue_general tissue_general_ontology_term_id raw_sum nnz raw_mean_nnz raw_variance_nnz n_measured_vars
0 48182177 c888b684-6c51-431f-972a-6c963044cef0 10x 3' v3 EFO:0009922 microglial cell CL:0000129 68-year-old human stage HsapDv:0000162 glioblastoma MONDO:0018177 ... brain UBERON:0000955 tissue brain UBERON:0000955 15204.0 3959 3.840364 209.374207 27229
1 48182178 c888b684-6c51-431f-972a-6c963044cef0 10x 3' v3 EFO:0009922 microglial cell CL:0000129 68-year-old human stage HsapDv:0000162 glioblastoma MONDO:0018177 ... brain UBERON:0000955 tissue brain UBERON:0000955 39230.0 5885 6.666100 875.502870 27229
2 48182185 c888b684-6c51-431f-972a-6c963044cef0 10x 3' v3 EFO:0009922 microglial cell CL:0000129 68-year-old human stage HsapDv:0000162 glioblastoma MONDO:0018177 ... brain UBERON:0000955 tissue brain UBERON:0000955 9576.0 2738 3.497443 121.333753 27229
3 48182187 c888b684-6c51-431f-972a-6c963044cef0 10x 3' v3 EFO:0009922 microglial cell CL:0000129 68-year-old human stage HsapDv:0000162 glioblastoma MONDO:0018177 ... brain UBERON:0000955 tissue brain UBERON:0000955 19374.0 4096 4.729980 464.331956 27229
4 48182188 c888b684-6c51-431f-972a-6c963044cef0 10x 3' v3 EFO:0009922 microglial cell CL:0000129 68-year-old human stage HsapDv:0000162 glioblastoma MONDO:0018177 ... brain UBERON:0000955 tissue brain UBERON:0000955 8466.0 2477 3.417844 162.555950 27229

5 rows × 28 columns

Create AnnData

with census_artifact.open() as census:
    
    experiment = census["census_data"][human]
    
    adata = experiment.axis_query(
        "RNA",
        obs_query=tiledbsoma.AxisQuery(value_filter=value_filter)
    ).to_anndata(
        X_name="raw",
        column_names={
            "obs": [
                features.assay,
                features.cell_type,
                features.tissue,
                features.disease,
                features.suspension_type,
            ]
        }
    )
adata.var = adata.var.set_index("feature_id")
adata
AnnData object with n_obs × n_vars = 66418 × 60530
    obs: 'assay', 'cell_type', 'tissue', 'disease', 'suspension_type'
    var: 'soma_joinid', 'feature_name', 'feature_length', 'nnz', 'n_measured_obs'
adata.var.head()
soma_joinid feature_name feature_length nnz n_measured_obs
feature_id
ENSG00000000003 0 TSPAN6 4530 4530448 73855064
ENSG00000000005 1 TNMD 1476 236059 61201828
ENSG00000000419 2 DPM1 9276 17576462 74159149
ENSG00000000457 3 SCYL3 6883 9117322 73988868
ENSG00000000460 4 C1orf112 5970 6287794 73636201
adata.obs.head()
assay cell_type tissue disease suspension_type
0 10x 3' v3 microglial cell brain glioblastoma cell
1 10x 3' v3 microglial cell brain glioblastoma cell
2 10x 3' v3 microglial cell brain glioblastoma cell
3 10x 3' v3 microglial cell brain glioblastoma cell
4 10x 3' v3 microglial cell brain glioblastoma cell