Glossary

FAIR

FAIR data is data which meets the principles of findability, accessibility, interoperability, and reusability wikipedia.

GUI

Graphical user interface, for instance, a browser-based data catalog.

feature

A feature is a property of a measurement [Wikipedia]. It’s equivalent to a variable in statistics and is typically equated with a dimension of a dataset.

LaminDB comes with a Feature registry to organize dataset dimensions and equates them with statistical variables.

label

A label refers to a descriptor or tag that is assigned to something to describe, identify, or categorize it.

lakehouse

A data lakehouse combines the flexibility and cost-effectiveness of a data lake with the data management and ACID transaction support of a data warehouse, enabling both structured and unstructured data analytics in a single framework. Lakehouse frameworks include Databrick’s Delta Lake, Google’s BigLake, Amazon’s Lake Formation, Dremio, Starburst and others. Here is a blog post from Google, a blog post from AWS, a glossary entry and a paper from Databricks.

ORM

Object-relational mapper. In LaminDB every sub-class of Record (every instance of Registry) is an ORM that corresponds to a SQL table in the underlying metadata database wikipedia.

observation

In statistics (machine learning), an observation refers to a particular measured instance of a set of random variable.

In biology, an observation typically corresponds to measuring (reading out) a set of properties from a biological sample.

record

A record is a data structure that consists in a sequence of typed fields that hold values [Wikipedia].

In LaminDB, a metadata record is modeled as a Record.

sample

In biology, a sample is an instance or part of a biological system.

In statistics (machine learning), a sample is an observation of a set of random variables (features, labels, metadata).

Depending on the observational unit chosen for representing data, the statistical sample might correspond 1:1 to a biological sample. Often, this choice presents an interesting cases, as variation across physical samples - targeted in the experimental design - can directly be explained by variation across statistical (digital) samples.

variable

We almost always mean “random variable”, when we say “variable”.

Random variables and their observations are core to statistics [Wikipedia].

An independent variable is sometimes called a feature, “predictor variable”, “regressor”, “covariate”, “explanatory variable”, “risk factor”, “input variable”, among others [Wikipedia].

A dependent variable is sometimes called a “response variable”, “regressand”, “criterion”, “predicted variable”, “measured variable”, “explained variable”, “experimental variable”, “responding variable”, “outcome variable”, “output variable”, “target” or “label”.