Design & architecture¶
LaminDB is a distributed system like git that can be run or hosted anywhere. It only needs a SQLite or Postgres database and a storage location (file system, S3, GCP, HuggingFace, …).
You can easily create your new local instance:
lamin init --storage ./mydir
import lamindb as ln
ln.setup.init(storage="./mydir")
library(laminr)
lamin_init(storage="./mydir")
Or you can let collaborators connect to a cloud-hosted instance:
lamin connect account/instance
import lamindb as ln
ln.connect("account/instance")
library(laminr)
ln <- import_module("lamindb")
ln <- ln$connect("account/instance")
For learning more about how to create & host LaminDB instances, see Install & setup. LaminDB instances work standalone but can optionally be managed by LaminHub. For an architecture diagram of LaminHub, reach out!
Database schema & API¶
LaminDB provides a SQL schema for common metadata entities: Artifact
, Collection
, Transform
, Feature
, ULabel
etc. - see the API reference or the source code.
The core metadata schema is extendable through modules (see green vs. red entities in graphic), e.g., with basic biological (Gene
, Protein
, CellLine
, etc.) & operational entities (Biosample
, Techsample
, Treatment
, etc.).
What is the metadata schema language?
Data models are defined in Python using the Django ORM. Django translates them to SQL tables. Django is one of the most-used & highly-starred projects on GitHub (~1M dependents, ~73k stars) and has been robustly maintained for 15 years.
On top of the metadata schema, LaminDB is a Python API that models datasets as artifacts, abstracts over storage & database access, data transformations, and (biological) ontologies.
Note that the schemas of datasets (e.g., .parquet
files, .h5ad
arrays, etc.) are modeled through the Feature
registry and do not require migrations to be updated.
Schema modules¶
LaminDB can be extended with schema modules building on the Django ecosystem. Examples are:
bionty: Registries for basic biological entities, coupled to public ontologies.
wetlab: Registries for samples, treatments, etc.
If you’d like to create your own module:
Create a git repository with registries similar to wetlab
Create & deploy migrations via
lamin migrate create
andlamin migrate deploy
For more information, see Install & setup.
Repositories¶
LaminDB and its plugins consist in open-source Python libraries & publicly hosted metadata assets:
lamindb: Core package.
bionty: Registries for basic biological entities, coupled to public ontologies.
wetlab: Registries for samples, treatments, etc.
usecases: Use cases as visible on the docs.
All immediate dependencies are available as git submodules here, for instance,
lamindb-setup: Setup & configure LaminDB.
lamin-cli: CLI for
lamindb
andlamindb-setup
.
For a comprehensive list of open-sourced software, browse our GitHub account.
lamin-utils: Generic utilities, e.g., a logger.
readfcs: FCS artifact reader.
nbproject: Light-weight Jupyter notebook tracker.
bionty-assets: Assets for public biological ontologies.
LaminHub is not open-sourced.