Glossary¶
- FAIR¶
FAIR data is data which meets the principles of findability, accessibility, interoperability, and reusability wikipedia.
- GUI¶
Graphical user interface, for instance, a browser-based data catalog.
- feature¶
A feature is a property of a measurement [Wikipedia]. It’s equivalent to a variable in statistics and is typically equated with a dimension of a dataset.
LaminDB comes with a
Feature
registry to organize dataset dimensions and equates them with statistical variables.- label¶
A label refers to a descriptor or tag that is assigned to something to describe, identify, or categorize it.
- lakehouse¶
A data lakehouse combines the flexibility and cost-effectiveness of a data lake with the data management and ACID transaction support of a data warehouse, enabling both structured and unstructured data analytics in a single framework. Lakehouse frameworks include Databrick’s Delta Lake, Google’s BigLake, Amazon’s Lake Formation, Dremio, Starburst and others. Here is a blog post from Google, a blog post from AWS, a glossary entry and a paper from Databricks.
- ORM¶
Object-relational mapper. In LaminDB every sub-class of
Record
(every instance ofRegistry
) is an ORM that corresponds to a SQL table in the underlying metadata database wikipedia.- observation¶
In statistics (machine learning), an observation refers to a particular measured instance of a set of random variable.
In biology, an observation typically corresponds to measuring (reading out) a set of properties from a biological sample.
- record¶
A record is a data structure that consists in a sequence of typed fields that hold values [Wikipedia].
In LaminDB, a metadata record is modeled as a
Record
.- sample¶
In biology, a sample is an instance or part of a biological system.
In statistics (machine learning), a sample is an observation of a set of random variables (features, labels, metadata).
Depending on the observational unit chosen for representing data, the statistical sample might correspond 1:1 to a biological sample. Often, this choice presents an interesting cases, as variation across physical samples - targeted in the experimental design - can directly be explained by variation across statistical (digital) samples.
- variable¶
We almost always mean “random variable”, when we say “variable”.
Random variables and their observations are core to statistics [Wikipedia].
An independent variable is sometimes called a feature, “predictor variable”, “regressor”, “covariate”, “explanatory variable”, “risk factor”, “input variable”, among others [Wikipedia].
A dependent variable is sometimes called a “response variable”, “regressand”, “criterion”, “predicted variable”, “measured variable”, “explained variable”, “experimental variable”, “responding variable”, “outcome variable”, “output variable”, “target” or “label”.