Design & architecture

LaminDB is a distributed system like git that can be run or hosted anywhere. It only needs a SQLite or Postgres database and a storage location (file system, S3, GCP, HuggingFace, …).

You can easily create your new local instance:

lamin init --storage ./mydir
import lamindb as ln
ln.setup.init(storage="./mydir")
library(laminr)
lamin_init(storage="./mydir")

Or you can let collaborators connect to a cloud-hosted instance:

lamin connect account/instance
import lamindb as ln
ln.connect("account/instance")
library(laminr)
ln <- import_module("lamindb")
ln <- ln$connect("account/instance")

For learning more about how to create & host LaminDB instances, see Install & setup. LaminDB instances work standalone but can optionally be managed by LaminHub. For an architecture diagram of LaminHub, reach out!

Database schema & API

LaminDB provides a SQL schema for common metadata entities: Artifact, Collection, Transform, Feature, ULabel etc. - see the API reference or the source code.

The core metadata schema is extendable through modules (see green vs. red entities in graphic), e.g., with basic biological (Gene, Protein, CellLine, etc.) & operational entities (Biosample, Techsample, Treatment, etc.).

What is the metadata schema language?

Data models are defined in Python using the Django ORM. Django translates them to SQL tables. Django is one of the most-used & highly-starred projects on GitHub (~1M dependents, ~73k stars) and has been robustly maintained for 15 years.

On top of the metadata schema, LaminDB is a Python API that models datasets as artifacts, abstracts over storage & database access, data transformations, and (biological) ontologies.

Note that the schemas of datasets (e.g., .parquet files, .h5ad arrays, etc.) are modeled through the Feature registry and do not require migrations to be updated.

Schema modules

LaminDB can be extended with schema modules building on the Django ecosystem. Examples are:

  • bionty: Registries for basic biological entities, coupled to public ontologies.

  • wetlab: Registries for samples, treatments, etc.

If you’d like to create your own module:

  1. Create a git repository with registries similar to wetlab

  2. Create & deploy migrations via lamin migrate create and lamin migrate deploy

For more information, see Install & setup.

Repositories

LaminDB and its plugins consist in open-source Python libraries & publicly hosted metadata assets:

  • lamindb: Core package.

  • bionty: Registries for basic biological entities, coupled to public ontologies.

  • wetlab: Registries for samples, treatments, etc.

  • usecases: Use cases as visible on the docs.

All immediate dependencies are available as git submodules here, for instance,

For a comprehensive list of open-sourced software, browse our GitHub account.

LaminHub is not open-sourced.