#### Glossary

artifact
 Stores a dataset or model as a file or folder.

curator
 * Object class designed to ensure your dataset conforms to a
 desired schema.

 * Helps with validation, standardization (e.g., by fixing typos or
 mapping synonyms), and annotation (linking it against metadata
 entities so that it becomes queryable).

FAIR
 FAIR data is data which meets the principles of findability,
 accessibility, interoperability, and reusability wikipedia.

feature
 A feature is a property of a measurement [Wikipedia]. It's
 equivalent to a *variable* in statistics and is typically equated
 with a dimension of a dataset.

 LaminDB comes with a "Feature" registry to organize dataset
 dimensions and equates them with statistical variables.

instance
 Shorthand for "LaminDB instance", a database that manages metadata
 for datasets in different storage locations.

label
 A label refers to a descriptor or tag that is assigned to something
 to describe, identify, or categorize it.

lakehouse
 A data lakehouse combines the flexibility and cost-effectiveness of
 a data lake with the data management and ACID transaction support
 of a data warehouse, enabling both structured and unstructured data
 analytics in a single framework. Lakehouse frameworks include
 Databrick's Delta Lake, Google's BigLake, Amazon's Lake Formation,
 Dremio, Starburst and others. Here is a blog post from Google, a
 blog post from AWS, a glossary entry and a paper from Databricks.

ORM
 Object-relational mapper. In LaminDB every sub-class of "Record"
 (every instance of "Registry") is an ORM that corresponds to a SQL
 table in the underlying metadata database wikipedia.

observation
 In statistics (machine learning), an observation refers to a
 particular measured instance of a set of random variable.

 In biology, an observation typically corresponds to measuring
 (reading out) a set of properties from a biological sample.

record
 A record is a data structure that consists in a sequence of typed
 fields that hold values [Wikipedia].

 In LaminDB, a metadata record is modeled as a "SQLRecord" and
 represents a row in a in a reqistry (a table in the SQL database).

 It automatically sets up important behaviors and methods (like
 filtering, querying, and converting records to DataFrames) needed
 to interact with the metadata database.

sample
 In biology, a sample is an instance or part of a biological system.

 In statistics (machine learning), a sample is an observation of a
 set of random variables (features, labels, metadata).

 Depending on the observational unit chosen for representing data,
 the statistical sample might correspond 1:1 to a biological sample.
 Often, this choice presents an interesting cases, as variation
 across physical samples - targeted in the experimental design - can
 directly be explained by variation across statistical (digital)
 samples.

variable
 We almost always mean "random variable", when we say "variable".

 Random variables and their observations are core to statistics
 [Wikipedia].

 An independent variable is sometimes called a *feature*, "predictor
 variable", "regressor", "covariate", "explanatory variable", "risk
 factor", "input variable", among others [Wikipedia].

 A dependent variable is sometimes called a "response variable",
 "regressand", "criterion", "predicted variable", "measured
 variable", "explained variable", "experimental variable",
 "responding variable", "outcome variable", "output variable",
 "target" or "label".

schema
 Blueprint for your data’s structure. Tool for curating and
 validating the organization of your data, helping maintain data
 integrity as it evolves through various processing steps.

registry
 A table in a SQL database (SQLite/Postgres) holding records.

transform
 A piece of code (script, notebook, pipeline, function) that can be
 applied to input data to produce output data.

UI
 Graphical user interface, for instance, a browser-based data
 catalog.