Changelog 2025¶

Note

Get notified by watching releases for git repositories: lamindb, laminhub, laminr, and bionty.

For other years, see: 2024 · 2023 · 2022

2025-08-12 db 1.10.2¶

✨ Write mode for zarr backed AnnDataAccessor PR @Koncopd
✨ Add ProjectRecord for annotating sheets with projects PR @falexwolf
🚸 Ask for additional confirmation when creating storage locations through switching storage settings PR @falexwolf
🐛 Fix AnnDataAccessor for AnnData objects with indices stored as integers PR @Koncopd
🐛 Annotate artifacts passed to Curator with Schema PR @Zethson

2025-08-06 db 1.10.1¶

🐛 Fix selection with a bool mask in AnnDataAccessor with zarr PR @Koncopd
🔒️ Require SSL for remote postgres PR @Koncopd
📝 Add nested dictionary example for Pydantic/Pandera/LaminDB comparison PR @Zethson
📝 Improve arrays tutorial PR @Koncopd

2025-07-29 db 1.10.0¶

Features.

✨ Enable .open() for SpatialData.tables PR @Koncopd
✨ Add curate_from_croissant() to curate from MLCommons Croissant files PR @falexwolf
✨ Allow to receive and add extra parameters for managed buckets PR @Koncopd
✨ Add Collection.describe() PR @Zethson
✨ Enable reverting database migrations PR @falexwolf
✨ Allow getting settings via the CLI PR @falexwolf

UX.

🚸 Stricter hash uniqueness on artifact and more indexes PR @falexwolf
🚸 Hide VitessceConfig artifacts PR @falexwolf
🚸 Move .datasets from .core to .examples PR @falexwolf
🚸 Cache branch and space settings PR @falexwolf
🚸 Warn on failure of updating access token PR @Koncopd

Refactors and bug fixes.

♻️ Adapt huggingface sync to the changes in their API PR @Koncopd
🐛 Fix auto-search for corrupted local storage location PR @falexwolf

2025-07-23 db 1.9.1¶

🐛 Fix keep-artifacts-local mode when no local storage location is found PR @falexwolf
🐛 Enable anonymous users to access public folders on AWS S3 PR @Koncopd
🚸 Create records under subtype via Curator.add_new_from PR @sunnyosun
🚸 On transform delete, delete TransformProject links because they might be protected through a run of the same transform PR @falexwolf

2025-07-21 db 1.9.0 | bionty 1.6.1¶

Features.

✨ Enable validating & annotating datasets with a CELLxGENE schema. Corresponding new guide. PR @Zethson
✨ Add lamin annotate, enable string-based annotation with non-ulabels, overhaul CLI docs PR @falexwolf

UX.

🚸 Enable ln.track() and ln.finish() for notebooks running on remote servers PR @falexwolf
🚸 Improve UX of working in keep-artifacts-local mode PR @falexwolf
🚸 Expose storage, branch, and space in the Artifact constructor PR @falexwolf

Bug fixes.

🐛 Fix search_local_root in keep-artifacts-local for storage roots without access PR @Koncopd
🐛 Fix sync for cache synchronization for timestamps with a fractional part PR PR @Koncopd
🐛 Rework get_storage_region() to make it reliable and use it in upath.to_url() PR @Koncopd
🐛 Fix creation of a Schema with is_type=True PR @sunnyosun
🐛 Fix index hash calculation for Schema PR @sunnyosun

Performance.

⚡️ Speed up describe() by 6x PR @falexwolf
⚡️ Speed up lamin connect by 2x PR @falexwolf
⚡️ Enable retries for hub requests PR @Koncopd
⚡️ Implement performant synchronization for directories PR @Koncopd

Docs.

📝 Overhaul the Curate datasets guide PR @sunnyosun
📝 Add an actual README.md PR @falexwolf

Bionty.

♻️ New error message when url of ontology doesn’t exist PR @namsaraeva
♻️ Remove Disease constructor overloads PR @namsaraeva
♻️ Update visibility of source.dataframe_artifact PR @sunnyosun
♻️ Make sure ensembl organism has a synonyms column PR @sunnyosun
♻️ Adapt to renaming of UPath.synchronize() to UPath.synchronize_to() PR PR @Koncopd
♻️ Remove pronto warning filters PR @namsaraeva
♻️ Added logging to Ontology.to_df() PR @namsaraeva

2025-07-14 db 1.8.0¶

🚸 Improve the experience of working in keep-artifacts-local mode PR @falexwolf
⚡️ Use native polars Object Store by default PR @Koncopd
🚸 Improve data type issue errors PR @Zethson
🚸 Improve error message when attempting to curate against unsaved schema PR @Zethson
🚸 Improve UX for labeling unsaved records from other instances PR @Zethson
🐛 Properly ignore ln.track() and tracking warnings on read-only connections PR @Koncopd

2025-07-07 db 1.7.1¶

🚸 Cache uid and name of branch and space in setup settings to speed up lamin info PR @Koncopd
🐛 Do not import lamindb within CLI delete when deleting an instance PR @Koncopd

2025-07-06 db 1.7.0 | bionty 1.6.0¶

Features.

✨ Enable switching branches and spaces via the CLI PR @falexwolf
✨ Enable specifying a system-wide cache path PR @Koncopd
✨ Upload the R environment tracked in LaminR PR @falexwolf
✨ Create export functionality for records, enable annotating artifacts by records, enable tracking lineage of records, treat sheets as a record type PR PR PR @falexwolf
✨ Support cat_filters to enable specifying Source versions PR @Zethson
✨ Correctly deal with renamed instances PR PR @Koncopd
✨ Add dtype path PR @sunnyosun

UX improvements.

🚸 Improved storage location management: conveniently create & delete storage locations PR PR PR PR PR @falexwolf @Koncopd
🚸 Introduce run status codes, save source code & run environment of scripts upon ln.track() instead of ln.finish() PR @falexwolf
🚸 Simplify default transfer mode: no longer transfer annotations PR @falexwolf
🚸 Add settings.is_connected to check if the current instance is properly connected for use PR @Koncopd
🚸 No longer persist instance settings on ln.connect() PR @Koncopd
🚸 Case-insensitive uniqueness and lower-case pre-defined branch and space names PR @falexwolf
🚸 Warn when no categorical values are validated rather than throwing an error, e.g. when validating column names of a dataframe or labels of a label vector PR @sunnyosun
🚸 Introduce two-column layout in Artifact.describe() and add information like space, branch, and kind PR @falexwolf
🚸 Support nested sub types in curation PR PR @sunnyosun @falexwolf
🚸 Add Settings pretty display PR @Zethson
🚸 Enable inferring all features of a queryset of records or artifacts by passing features="queryset" to .df() PR @falexwolf
🚸 Mark data transfers via transform key __lamindb_transfer__ PR @falexwolf
🚸 More aggressive checks for anon_public requests to S3 for public instances PR @Koncopd
🚸 Mark upload failures of artifacts via a private boolean indicating a successful save event: artifact._is_saved_to_storage_location PR @Koncopd
🚸 Improve similar names found message PR @Zethson
🚸 Correct treatment of carriage returns in logs PR @Koncopd
🚸 Add keep with default "first" to Lookup class PR @sunnyosun
🚸 Batched bulk saving for persisting large numbers of records efficiently PR @sunnyosun
🚸 Define all exceptions in lamindb_setup.errors PR @falexwolf
🚸 Leverage panderas lazy validation PR @falexwolf
🚸 Enable setting overwrite_versions in Artifact creator PR @falexwolf
🚸 Default to hiding artifacts with kind = "__lamindb_run__" in regular queries PR @falexwolf
🚸 Mark as py.typed to allow mypy inspection and fix return type of BaseSQLRecord.save PR PR @ap–
🚸 Enable passing branch and space everywhere PR @falexwolf

Bug fixes.

🐛 Auto-version scripts in absence of ln.track() PR @falexwolf
🐛 Fix updates of re-loaded Schema auxiliary fields PR @Zethson
🐛 Fix a bug preventing backwards compatibility of instance settings files PR @Koncopd
🐛 Respect anon=True in _tiledb_config_s3 PR @Koncopd
🐛 Have transfer comply with settings.annotation.n_max_records and fix a bug related to repeated schema transfer PR @falexwolf
🐛 Fix transfer from instance with superset of module to instance with subset of modules in presence of schema-annotation in the additional modules PR @falexwolf

Bionty changes.

🚸 Allow using public references without prefix filters PR @sunnyosun
⬆️ Add new ontology versions & fix registration bugs PR PR @Zethson
♻️ Downgrade sources versions in sources.yaml PR @namsaraeva
♻️ Do not upload dataframe if instance is bionty-assets PR @sunnyosun
♻️ Make treatment of Record link tables consistent with lamindb PR @falexwolf
🐛 Fix import_source with non-existing parents in the reference PR @namsaraeva

Deprecated code.

🔥 Remove long-deprecated ln.setup.load() and LaminDB v1 migration logic PR @falexwolf

2025-06-10 db 1.6.2¶

🐛 Use standard Auth URL to fix API key login (fix not necessary after fixing bot detection server side) PR @falexwolf
🚸 Mark as py.typed and improve typing of ln.setup.connect() PR @ap–
🚸 Expand user directory by default PR @Zethson

2025-06-03 db 1.6.1 | bionty 1.5.0¶

Bionty.

✨ Flexible ontology sources PR @sunnyosun

LaminDB.

🚸 Enable passing --branch and --space to lamin save PR @falexwolf
🐛 Fix query of feature-associated labels from non-ULabel registries PR @sunnyosun

2025-06-01 db 1.6.0 | bionty 1.4.0¶

Changes to registries.

🏗️ Integrate the Param into the Feature registry PR @falexwolf – the change is backward compatible on the Python/R level – on the SQL level, records are transferred from the lamindb_param table to the lamindb_feature table during migrations
✨ Introduce a Branch registry PR @falexwolf
♻️ Rename Record to SQLRecord PR PR @falexwolf
✨ Introduce a flexible Record registry to manage any kind of entity without database migrations PR @falexwolf

Data curation.

✨ Add schema-based TiledbsomaExperimentCurator PR PR @Zethson
✨ Support curating lists as values in DataFrameCurator PR @sunnyosun

Bug fixes.

🐛 Fix transfer for cases in which genes are insufficiently populated PR @falexwolf

Dependency changes.

⬆️ Add compatibility with zarr v3 PR @Koncopd
⬆️ Add a lower bound on pandera PR PR @Koncopd @Zethson
⬆️ No longer install contenttypes PR @falexwolf

UX improvements.

🚸 Do no longer duplicate tracking of predecessors through the corresponding link table on Transform PR @falexwolf
🚸 Add is_run_input to Artifact.get() and Collection.get() PR @Koncopd
🚸 Improve suffix mismatch error message PR @Zethson
🚸 Clearer error in parse_cat_dtype if cat dtype contains a module name and the module is not found PR @Koncopd
🚸 Better error message when user passes manual uid to track() + anticipate that the user might want to create new transforms in some cases also if hash matches PR @falexwolf
🚸 Improve setting relationships of unsaved records UX PR @Zethson
🚸 Improve DoesNotExist error message upon DBRecord.get() PR @Zethson
♻️ Set current space when transferring records PR @Koncopd
♻️ Mark internal lamindb-produced artifacts with kind="__lamindb__" instead of _branch_code=0 PR @falexwolf

2025-05-13 db 1.5.3¶

🐛 Fix schema transfer PR @Zethson

2025-05-13 db 1.5.2¶

🐛 Reset SpatialData path when access in-memory representation PR @Zethson
🚸 Do not validate twice within Artifact.from_X(...) when passing schema PR @falexwolf

2025-05-08 db 1.5.1¶

🐛 Fix a too strict unique constraint in composite schemas PR @falexwolf
🐛 Fix display of parents & children in view_parents(with_children) PR @falexwolf
⬆️ Adapt save_tiledbsoma_experiment to tiledbsoma==1.16.2 PR @Koncopd

2025-05-07 db 1.5.0 | bionty 1.3.2¶

Data lineage.

🚸 Make notebook & script tracking via ln.track() robust to renames PR @falexwolf
✨ Enable executing notebooks via jupyter nbconvert --execute PR @falexwolf

CLI updates.

✨ Enable cloud paths for lamin save PR @Koncopd
```
lamin save s3://my-bucket/my-file.txt
```
✨ Enable labeling with project during lamin save PR @falexwolf
```
lamin save ./my-folder --project my-project
```

Streaming artifacts.

✨ Enable polars in Artifact.open() and Collection.open() PR @Koncopd
✨ Enable .load(), .open(), and .mapped() on query sets of artifacts PR @Koncopd

Curation & schemas.

✨ Enable curating the index of a dataframe PR @falexwolf

schema = ln.Schema(
    features=[
        ln.Feature(name="required_feature", dtype=str).save(),
    ],
    index=ln.Feature(name="sample", dtype=ln.ULabel).save(),
).save()

🚸 Enable passing a ULabel type to dtype PR @falexwolf

perturbation_type = ln.ULabel.get(name="Perturbation")  # perturbation_type.is_type is True
ln.Feature(name="perturbation", dtype=perturbation_type)

🚸 Handle schema updates decently PR @falexwolf
🚸 Do not annotate with more than n_max_records = 1000 PR @falexwolf
🚸 Introduce a submodule lamindb.examples with schemas PR @falexwolf
🚸 Enable validating against nested dicts in spatialdata PR @falexwolf
🚸 Better handle validation of ensembl gene IDs and add curator representation PR @Zethson
🚸 Prettier Schema.describe() PR @sunnyosun
🚸 AnnData: enable explicit transposition in var schema definition PR @falexwolf
🚚 Rename the components argument of Schema() to slots PR @falexwolf
🐛 Fix respecting schema.ordered_set in DataFrame validation PR @sunnyosun

Bulk annotation with features & queries via features.

✨ Support feature dtype dict PR @falexwolf

ln.Feature(name="metadata_details", dtype=dict).save()

🚸 For artifacts, improve (1) bulk annotation with features + (2) queries by features PR @falexwolf

General UX improvements.

🚸 Do not raise exceptions on problems with copy_or_move_to_cache within Artifact.save PR @Koncopd
🚸 Allow passing key to save_vitessce_config() PR @namsaraeva
🚸 Add .pt and .ckpt to valid suffixes PR @Zethson

Docs.

📝 Document uid generation, prettify API reference docs PR @falexwolf

2025-04-25 bionty 1.3.1¶

🐛 Fixed downloading old Ensembl versions. PR @sunnyosun

2025-04-24 R 1.1.0¶

New features.

✨ Improved Python dependency management with reticulate, deprecated install_lamindb() PR @lazappi
✨ Add tracking of the R environment using pak lockfiles PR @lazappi
✨ Enable artifact$view_lineage() PR @lazappi

Bug fixes.

🐛 Enable setting wrapped object slots like artifact$description, artifact$key, etc. PR @lazappi
🐛 Fix an issue that was preventing lamin_connect() from being run multiple times with the same instance PR @lazappi
🐛 Properly clear and delete temporary instances created using lamin_init_temp() PR @lazappi

Other changes.

♻️ Set minimum reticulate dependency >= 1.38.0 PR @lazappi
🚸 Improve inheritance of arguments when wrapping and overwriting Python functions PR @lazappi
♻️ Dispatch CI from pre-release events in lamindb PR @falexwolf

2025-04-15 db 1.4.0 | bionty 1.3.0¶

✨ Add schema as an argument to Artifact.from_X(). PR @falexwolf

artifact = ln.Artifact.from_df(df, key="my_dataset.parquet", schema=schema).save()

✨ Enable defining simple schemas that merely enforce a feature identifier type. PR @falexwolf

schema = ln.Schema(itype=ln.Feature).save()  # <-- enforce valid feature identifiers, no need to define specific required features

✨ Enable defining optional features on a per-schema level & improve schema hash calculation. PR @sunnyosun

schema = ln.Schema(
  features=[
    ln.Feature(name="sample_id", dtype=str).save()  # required
    ln.Feature(name="sample_name", dtype=str).with_config(optional=True)  # optional
  ],
).save()

✨ Introduce lamin run with a Modal backend. PR @ragyhaddad

lamin run my_script.py --project my_project  # <-- will run the script on Modal

✨ Support auto-download of Ensembl genes of all organisms. Guide PR @sunnyosun

gene_ontology = bt.base.Gene(source="ensembl", organism="rabbit", version='release-103')
gene_ontology.register_source_in_lamindb()  # register the new ontology source in lamindb
source = bt.Source.get(entity="bionty.Gene", name="ensembl", organism="rabbit", version='release-103')
bt.Gene.import_source(source=source)  # import all genes from that source

🚸 Enable querying by features & params through Artifact.filter() and Run.filter(). Guide PR @falexwolf

ln.Artifact.filter(scientist="Barbara McClintock")

User experience.

🚸 from_source no longer returns None but throws a NoResultFound exception if the look up in the public ontology fails PR @sunnyosun
🚸 Allow renaming artifacts & transforms within the same version family PR @falexwolf
🚸 Better support minimal_set, maximal_set, ordered_set in curators PR @sunnyosun
🚸 Enable passing the stem uid to lamin save PR @falexwolf
🚸 No longer throw an error but merely print a warning when attempting to update a schema PR @falexwolf
🚸 Enable plain notebook uploads by making a default run for notebook in case no run is found PR @falexwolf
🚸 Enable to authenticate and set the current instance through environment variables PR @falexwolf
🚸 Show link to hub in view_lineage() and render lineage through graphviz also in scripts PR @falexwolf
🚸 Order IsVersioned.versions query set PR @falexwolf
🚸 Do not print warning about missing schema modules PR @falexwolf

Refactors.

♻️ Eliminate duplicated parsing & record creation during curation PR @falexwolf
♻️ Remove verbosity and organism arguments on CatManager level PR PR @falexwolf
♻️ Organize categorical curation code with CatColumn PR @sunnyosun
♻️ Add return_graph argument to view_lineage() PR @lazappi
♻️ Suppress aiobotocore traceback logging PR @Koncopd
⬆️ Upgrade supabase to <2.15.0 PR @Koncopd
⬆️ Upgrade anndata to 0.11.4 PR @Koncopd

Docs.

📝 Compare lamindb with pydantic and pandera in an FAQ doc PR @falexwolf
📝 Document access any Ensembl genes PR @sunnyosun

Bugs.

🐛 Fix validation of var_index PR @sunnyosun
🐛 Fix numcodecs==0.16.0 incompatibility with zarr v2 PR @Koncopd
🐛 Fix SpatialData and MuData check PR @Zethson
🐛 Fix organism passing to from_source PR @sunnyosun
🐛 Return an empty set not a dict for modules in instance settings PR @falexwolf

Bionty.

🚸 Make the default organism "human" instead of None PR @falexwolf
⬆️ Support Python 3.13 & remove support for Python 3.9 PR @Zethson
♻️ Improve Ensembl prefix detection PR @sunnyosun
♻️ Use UPath.synchronize in s3_bionty_assets PR @Koncopd

2025-03-27 db 1.3.2 | bionty 1.2.1¶

🐛 Fix bionty ontology sources sync through reticulate PR @falexwolf
🐛 Fix data transfer through when target instances has no schema modules PR @falexwolf

2025-03-26 db 1.3.1 | bionty 1.2.0¶

In Bionty, you can now add custom ontology sources through the Source registry.

df = pd.read_csv("./our_inhouse_genes.csv")  # a csv describing gene metadata e.g. from parsing a GTF file
custom_source = bt.Source(entity="bionty.Gene", organism="human", name="Our genes", version="2025-04-01").save()
bt.Gene.add_source(custom_source, df=df)  # couple the custom source to the Gene registry

LaminDB changes.

🐛 Fix incompatibility with gotrue==2.12.0 PR @Koncopd
🐛 Enable transferring features pointing to multiple labels PR @sunnyosun
🐛 More extensive validation for updates to artifact.key and artifact.suffix PR @falexwolf
🚸 Refactor conventions for files written during init: the SQLite file is now .lamindb/lamin.db and the storage marker is .lamindb/storage_uid.txt PR @falexwolf
🚸 Make upload of large directories more robust by reducing batch size PR @Koncopd
🚸 Avoid requiring coerce_dtype for "int" and "float" in case an integer or float pd.Series.dtype only deviates by numerical precision/range PR @falexwolf
🚸 In AnnDataCurator, make 'obs' schema optional and allow 'uns' schema PR @falexwolf

2025-03-16 db 1.3.0¶

New features.

✨ Add schema-based SpatialDataCurator PR1 PR2 PR3 @Zethson
✨ Add schema-based MuDataCurator PR @sunnyosun
✨ Add lamin get for artifacts and lamin load for collections PR @Zethson @falexwolf

Other changes.

⬆️ Python 3.13 support PR @Zethson
⬆️ Support CELLxGENE schema 5.2.0 PR1 PR2 @sunnyosun
🚸 Skip ln.track() when connected in read-only mode PR @falexwolf
🚸 Error if trying to register an instance without a storage in the hub PR @Koncopd
🚸 Refactor organism constraints during validation PR @sunnyosun
🚸 Add more constructor signatures and specific inherited types PR @falexwolf
🚸 No logging message if database is behind by minor version PR @falexwolf
📝 Re-structure curation guides PR1 PR2 @falexwolf
📝 Integrate tutorials into introduction guide PR @falexwolf

2025-03-10 R 1.0.0¶

✨ laminr now has feature parity with lamindb. PR @lazappi

Run install_lamindb(), which will ensure lamindb >= 1.2 in the Python environment used by reticulate.
Replace db <- connect() with ln <- import_module("lamindb") and see the “Detailed changes” dropdown.

The ln object is largely similar to the db object in laminr < v1 and matches lamindb’s Python API (. → $).

Detailed changes

What	Before	After
Connect to the default LaminDB instance	`db <- connect()`	`ln <- import_module("lamindb")`
Start tracking	`db$track()`	`ln$track()`
Get an artifact from another instance	`new_instance <- connect("another/instance"); new_instance$Artifact$get(...)`	`ln$Artifact$using("another/instance")$get(...)`
Create an artifact from a path	`db$Artifact$from_path(path)`	`ln$Artifact(path)`
Finish tracking	`db$finish()`	`ln$finish()`

See the updated “Get started” vignette for more information.

User-facing changes:

Add an import_module() function to import Python modules with additional functionality, e.g., import_module("lamindb") for lamindb
Add functions for accessing more lamin CLI commands
Add a new “Introduction” vignette that replicates the code from the Python lamindb introduction guide

Internal changes:

Add an internal wrap_python() function to wrap Python objects while replacing Python methods with R methods as needed, leaving most work to {reticulate}
Update the internal check_requires() function to handle Python packages
Add custom cache()/load() methods to the Artifact class
Add custom track()/finish() methods to the lamindb module

2025-03-09 db 1.2.0¶

✨ Enable to auto-link entities to projects. Guide PR @falexwolf

ln.track(project="My project")

🚸 Better support for spatialdata with Artifact.from_spatialdata() and artifact.load(). PR1 PR2 @Zethson

🚸 Introduce .slots in Schema, Curator, and artifact.features to access schemas and curators by dataset slot. PR @sunnyosun

schema.slots["obs"]  # -> schema for .obs slot of AnnData
curator.slots["obs"]  # -> curator for .obs slot of AnnData
artifact.features["obs"]  # -> feature set for .obs slot of AnnData

🏗️ Re-structured the internal API away from monkey-patching Django models. PR @falexwolf

🚸 When re-creating an Artifact, link subsequent runs instead of updating .run and linking previous runs. PR @falexwolf

On the hub.

More details here. @chaichontat

Before	After
An artifact is only shown as an output for the latest run that created the artifact. Previous runs don’t show it.	All runs that (re-)create an artifact show it as an output.

More changes:

✨ Support R2 PR @Koncopd
✨ Enable Artifact.open() and Artifact.load() for .gz files PR @Koncopd
🐛 Fix passing a path to ln.track() when no path found by nbproject PR @Koncopd
🐛 Do not overwrite ._state_db of records when the current instance is passed to .using PR @Koncopd
🚸 Do not show track warning for read-only connections PR @Koncopd
🚸 Raise NotImplementedError in Artifact.load() if there is no loader PR @Koncopd

2025-02-27 db 1.1.1¶

🚸 Make the obs and var DataFrameCurator objects accessible via AnnDataCurator.slots PR @sunnyosun
🚸 Better error message upon re-creation of schema with same name and different hash PR @falexwolf
🚸 Raise consistency error if a source path suffix doesn’t match the artifact key suffix PR @falexwolf
🚸 Automatically add missing columns upon DataFrameCurator.standardize() if nullable is True PR @falexwolf
🚸 Allow specifying fsspec upload options in Artifact.save PR @Koncopd
🚸 Populate Artifact.n_observations in Artifact.from_df() PR @Koncopd
🐛 Fix UPath.view_tree on first call on gs PR @Koncopd
🐛 Fix .add_new_from message PR @Zethson
🐛 Run pip freeze with current python interpreter PR @ap–
🐛 Do not resolve http links when registering PR @Koncopd
🐛 Fix notebook re-run with same hash PR @falexwolf

2025-02-18 db 1.1.0¶

✨ Conveniently track functions including inputs, outputs, and parameters with a decorator: ln.tracked(). PR1 PR2 @falexwolf

@ln.tracked()
def subset_dataframe(
    input_artifact_key: str,  # all arguments tracked as parameters of the function run
    output_artifact_key: str,
    subset_rows: int = 2,
    subset_cols: int = 2,
) -> None:
    artifact = ln.Artifact.get(key=input_artifact_key)
    df = artifact.load()  # auto-tracked as input
    new_df = df.iloc[:subset_rows, :subset_cols]
    ln.Artifact.from_df(new_df, key=output_artifact_key).save()  # auto-tracked as output

✨ Make sub-types of ULabel, Feature, Schema, Project, Param, and Reference. PR @falexwolf

perturbation = ln.ULabel(name="Perturbation", is_type=True).save()
ln.ULabel(name="DMSO", type=perturbation).save()
ln.ULabel(name="IFNG", type=perturbation).save()

✨ Use an overhauled dataset curation flow. @falexwolf @Zethson @sunnyosun

support persisting validation constraints as a pandera-compatible schema
support validating any feature type, no longer just categoricals
make the relationship between features, dataset schema, and curator evident

Detailed changes for the overhauled curation flow.

⚠️ The API gained the lamindb.curators module as the new way to access Curator classes for different data structures.

This release introduces the schema-based DataFrameCurator and AnnDataCurator
The old-style curation flow for categoricals based on lamindb.Curator.from_objecttype() continues to work

Before	After

Key PRs.

✨ Overhaul curation guides + enable default values and filters on valid categories for features PR @falexwolf
✨ Schema-based curators: AnnDataCurator PR @falexwolf
✨ Schema-based curators: DataFrameCurator PR @falexwolf

Enabling PRs.

✨ Allow passing artifact to Curator PR @sunnyosun
🎨 A ManyToMany between Schema.components and .composites PR @falexwolf
♻️ Mark Schema fields as non-editable PR @falexwolf
✨ Add auxiliary field nullable to Feature PR @falexwolf
♻️ Prettify AnnDataCurator implementation PR @falexwolf
🚸 Better error for malformed categorical dtype PR @falexwolf
🎨 A ManyToMany between Schema.components and .composites PR @falexwolf
🚚 Restore .feature_sets as a ManyToManyField PR @falexwolf
🚚 Rename CatCurator to CatManager PR @falexwolf
🎨 Let Curator.validate() throw an error PR @falexwolf
♻️ Re-purpose BaseCurator as Curator, introduce CatCurator and consolidate shared logic under CatCurator PR @falexwolf
♻️ Refactor organism handling in curators PR @falexwolf
🔥 Eliminate all logic related to using_key in curators PR @falexwolf
🚚 Bulk-rename old-style curators to CatCurator PR @falexwolf
🎨 Self-contained definition of CellxGene schema / validation constraints PR @falexwolf
🚚 Move PertCurator from wetlab here and add CellxGene Curator test PR @falexwolf
🚚 Move CellXGene Curator from cellxgene-lamin here PR @falexwolf

schema = ln.Schema(
    name="small_dataset1_obs_level_metadata",
    features=[
        ln.Feature(name="CD8A", dtype=int).save(),  # integer counts for CD8A marker
        ln.Feature(name="perturbation", dtype=ln.ULabel).save(),  # a categorical feature that validates against the ULabel registry
        ln.Feature(name="sample_note", dtype=str).save(),   # a note for the sample
    ],
).save()

df = pd.DataFrame({
    "CD8A": [1, 4, 0],
    "perturbation": ["DMSO", ],
    "sample_note": ["value_1", "value_2", "value_3"],
    "temperature": [22.2, 25.7, 27.3],
})
curator = ln.curators.DataFrameCurator(df, schema)
artifact = curator.save_artifact(key="example_datasets/dataset1.parquet")  # validates compliance with schema, annotates with metadata
assert artifact.schema == schema  # the validating schema

✨ Easily filter on a validating schema. @falexwolf @Zethson @sunnyosun

schema = ln.Schema.get(name="small_dataset1_obs_level_metadata")  # get a schema
ln.Artifact.filter(schema=schema).df()  # filter all datasets that were validated by the schema

✨ Collection.open() returns a pyarrow dataset. PR @Koncopd

df = pd.DataFrame({"feat1": [0, 0, 1, 1], "feat2": [6, 7, 8, 9]})
df[:2].to_parquet("df1.parquet", engine="pyarrow")
df[2:].to_parquet("df2.parquet", engine="pyarrow")

artifact1 = ln.Artifact(shard1, key="df1.parquet").save()
artifact2 = ln.Artifact(shard2, key="df2.parquet").save()
collection = ln.Collection([artifact1, artifact2], key="parquet_col")

dataset = collection.open() # backed by files in the cloud storage
dataset.to_table().to_pandas().head()

✨ Support s3-compatible endpoint urls, say your on-prem MinIO deployment. PR @Koncopd

Speed up instance creation through squashed migrations.

⚡ Squash migrations PR1 PR2 @falexwolf

Tiledbsoma.

✨ Support endpoint_url in operations with tiledbsoma PR1 PR2 @Koncopd
✨ Add Artifact.from_tiledbsoma to populate n_observations PR @Koncopd

MappedCollection.

🐛 Allow filtering on np.nan in obs_filter of MappedCollection PR @Koncopd
🐛 Fix labels for NaN in categorical columns for MappedCollection PR @Koncopd

SpatialDataCurator.

🐛 Fix var_index standardization of SpatialDataCurator PR1 PR2 @Zethson
🐛 Fix sample level metadata optional in SpatialDataCatManager PR @Zethson

Core functionality.

✨ Allow checking the need for syncing without actually syncing PR @Koncopd
✨ Check for corrupted cache in Artifact.load() & Artifact.open() PR PR @Koncopd
✨ Infer n_observations in Artifact.from_anndata PR @Koncopd
🐛 Account for VSCode appending languageid to markdown cell in notebook tracking PR @falexwolf
🐛 Fix dangling folders on upload failures PR @Koncopd
🐛 Normalize module names for robust checking in _check_instance_setup() PR @Koncopd
🐛 Fix idempotency of Feature creation when description is passed and improve filter and get error behavior PR @Zethson
🐛 Fix caching logic in Artifact.open() PR @Koncopd
🚸 Make new version upon passing existing key to Collection PR @falexwolf
🚸 Throw better error upon checking instance.modules when loading a lamindb schema module PR @Koncopd
🚸 Validate existing records in the DB irrespective of whether an ontology source is passed or not PR @sunnyosun
🚸 Full guarantee of avoiding duplicating Transform, Artifact & Collection in concurrent runs PR @falexwolf
🚸 Fix RemovedInDjango60Warning PR @Zethson
🚸 Better user feedback during keyword validation in Record constructor PR @Zethson
🚸 Fix warning about artifacts in trash PR @ap–
🚸 Improved error message when saving via CLI PR @Zethson
🚸 Improve local storage not found warning message PR @Zethson
🚸 Better error message when attempting to save a file while not being connected to an instance PR @Zethson
🚸 Error for non-keyword parameters for Artifact.from_x methods PR @Zethson

Housekeeping.

🚸 Error at runtime with old s3fs PR @Koncopd
🚸 Safer resolve in check_path_is_child_of_root() PR @Koncopd
⬆️ Upgrade fsspec packages (s3fs, gcsfs, universal_pathlib) PR @Koncopd
➕ Add pyyaml to dependencies PR @Koncopd

2025-01-23 db 1.0.5¶

🚸 No longer throw a NotebookNotSaved error in ln.finish() but wait for the user or gracefully exit PR @falexwolf
🚸 Resolve save FutureWarning PR @Zethson
🐛 Fix Artifact.replace() for folder-like artifacts PR @Koncopd
🐛 Filter the latest transform on saving by filename PR @Koncopd

2025-01-21 db 1.0.4¶

🚚 Revert Collection.description back to unlimited length TextField. PR @falexwolf

2025-01-21 db 1.0.3¶

🚸 In track(), improve logging in RStudio sessions. PR @falexwolf

2025-01-20 R 0.4.0¶

🚚 Migrate to lamindb v1 PR @falexwolf
🚸 Improve the user experience for setting up Python & reticulate PR @lazappi

2025-01-20 db 1.0.2¶

🚚 Improvments for lamindb v1 migrations. PR @falexwolf

add a .description field to Schema
enable labeling Run with ULabel
add a .predecessors and .successors field to Project akin to what’s present on Transform
make .uid fields not editable

2025-01-18 db 1.0.1¶

🐛 Block non-admin users from confirming the dialogue for integrating lnschema-core. PR @falexwolf

2025-01-17 db 1.0.0¶

This release makes the API consistent, integrates lnschema_core & ourprojects into the lamindb package, and introduces a breadth of database migrations to enable future features without disruption. You’ll now need at least Python 3.10.

Your code will continue to run as is, but you will receive warnings about a few renamed API components.

What	Before	After
Dataset vs. model	`Artifact.type`	`Artifact.kind`
Python object for `Artifact`	`Artifact._accessor`	`Artifact.otype`
Number of files	`Artifact.n_objects`	`Artifact.n_files`
`name` arg of `Transform`	`Transform(name="My notebook", key="my-notebook.ipynb")`	`Transform(key="my-notebook.ipynb", description="My notebook")`
`name` arg of `Collection`	`Collection(name="My collection")`	`Collection(key="My collection")`
Consecutiveness field	`Run.is_consecutive`	`Run._is_consecutive`
Run initiator	`Run.parent`	`Run.initiated_by_run`
`--schema` arg	`lamin init --schema bionty,wetlab`	`lamin init --modules bionty,wetlab`

Migration guide:

Upon lamin connect account/instance you will be prompted to confirm migrating away from lnschema_core
After that, you will be prompted to call lamin migrate deploy to apply database migrations

New features:

✨ Allow http storage backend for Artifact PR @Koncopd
✨ Add SpatialDataCurator PR @Zethson
✨ Allow filtering by multiple obs columns in MappedCollection PR @Koncopd
✨ In git sync, also search git blob hash in non-default branches PR @Zethson
✨ Add relationship with Project to everything except Run, Storage & User so that you can easily filter for the entities relevant to your project PR @falexwolf
✨ Capture logs of scripts during ln.track() PR1 PR2 @falexwolf @Koncopd
✨ Support "|"-seperated multi-values in Curator PR @sunnyosun
🚸 Accept None in connect() and improve migration dialogue PR @falexwolf

UX improvements:

🚸 Simplify the ln.track() experience PR @falexwolf
1. you can omit the uid argument
2. you can organize transforms in folders
3. versioning is fully automated (requirement for 1.)
4. you can save scripts and notebooks without running them (corollary of 1.)
5. you avoid the interactive prompt in a notebook and the throwing of an error in a script (corollary of 1.)
6. you are no longer required to add a title in a notebook
🚸 Raise error when modifying Artifact.key in problematic ways PR1 PR2 @sunnyosun @Koncopd
🚸 Better error message on running ln.track() within Python terminal PR @Koncopd
🚸 Hide traceback for InstanceNotEmpty using Click Exception PR @Zethson
🚸 Hide underscore attributes in __repr__ PR @Zethson
🚸 Only auto-search ._name_field in sub-classes of CanCurate PR @falexwolf
🚸 Simplify installation & API overview PR @falexwolf
🚸 Make lamin_run_uid categorical in tiledbsoma stores PR @Koncopd
🚸 Add defensive check for organism arg PR @Zethson
🚸 Raise ValueError when trying to search a None value PR @Zethson

Bug fixes:

🐛 Skip deleting storage when deleting outdated versions of folder-like artifacts PR @Koncopd
🐛 Let SOMACurator() validate and annotate all .obs columns PR @falexwolf
🐛 Fix renaming of feature sets PR @sunnyosun
🐛 Do not raise an exception when default AWS credentials fail PR @Koncopd
🐛 Only map synonyms when field is name PR @sunnyosun
🐛 Fix source in .from_values PR @sunnyosun
🐛 Fix creating instances with storage in the current local working directory PR @Koncopd
🐛 Fix NA values in Curator.add_new_from() PR @sunnyosun

Refactors, renames & maintenance:

🏗️ Integrate lnschema-core into lamindb PR1 PR2 @falexwolf @Koncopd
🏗️ Integrate ourprojects into lamindb PR @falexwolf
♻️ Manage created_at, updated_at on the database-level, make created_by not editable PR @falexwolf
🚚 Rename transform type “glue” to “linker” PR @falexwolf
🚚 Deprecate the --schema argument of lamin init in favor of --modules PR @falexwolf
⬆️ Compatibility with tiledbsoma==1.15.0 PR @Koncopd

DevOps:

👷 Isolate curator tests PR @Zethson

Detailed list of database migrations

Those not yet announced above will be announced with the functionality they enable.

♻️ Add contenttypes Django plugin PR @falexwolf
🚚 Prepare introduction of persistable Curator objects by renaming FeatureSet to Schema on the database-level PR @falexwolf
🚚 Add a .type foreign key to ULabel, Feature, FeatureSet, Reference, Param PR @falexwolf
🚚 Introduce RunData, TidyTable, and TidyTableData in the database PR @falexwolf

All remaining database schema changes were made in this PR @falexwolf. Data migrations happen automatically.

remove _source_code_artifact from Transform, it’s been deprecated since 0.75
- data migration: for all transforms that have _source_code_artifact populated, populate source_code
rename Transform.name to Transform.description because it’s analogous to Artifact.description
- backward compat:
  - in the Transform constructor use name to populate key in all cases in which only name is passed
  - return the same transform based on key in case source_code is None via ._name_field = "key"
- data migrations:
  - there already was a legacy description field that was never exposed on the constructor; to be safe, we concatenated potential data in it on the new description field
  - for all transforms that have key=None and name!=None, use name to pre-populate key
rename Collection.name to Collection.key for consistency with Artifact & Transform and the high likelihood of you wanting to organize them hierarchically
a _branch_code integer on every record to model pull requests
- include visibility within that code
- repurpose visibility=0 as _branch_code=0 as “archive”
- put an index on it
- code a “draft” as _branch_code = 2, and “draft prs” as negative branch codes
rename values "number" to "num" in dtype
an ._aux json field on Record
a SmallInteger run._status_code that allows to write finished_at in clean up operations so that there is a run time also for aborted runs
rename Run.is_consecutive to Run._is_consecutive
a _template_id FK to store the information of the generating template (whether a record is a template is coded via _branch_code)
rename _accessor to otype to publicly declare the data format as suffix, accessor
rename Artifact.type to Artifact.kind
a FK to artifact run._logfile which holds logs
a hash field on ParamValue and FeatureValue to enforce uniqueness without running the danger of failure for large dictionaries
add a boolean field ._expect_many to Feature/Param that defaults to True/False and indicates whether values for this feature/param are expected to occur a single or multiple times for every single artifact/run
- for feature
  - if it’s True (default), the values come from an observation-level aggregation and a dtype of datetime on the observation-level mean set[datetime] on the artifact-level
  - if it’s False it’s an artifact-level value and datetime means datetime; this is an edge case because an arbitrary artifact would always be a set of arbitrary measurements that would need to be aggregated (“one just happens to measure a single cell line in that artifact”)
- for param
  - if it’s False (default), the values mean artifact/run-level values and datetime means datetime
  - if it’s True, the values would be from an aggregation, this seems like an edge case but say when characterizing a model ensemble trained with different parameters it could be relevant
remove the .transform foreign key from artifact and collection for consistency with all other records; introduce a property and a simple filter statement instead that maintains the same UX
store provenance metadata for TransformULabel, RunParamValue, ArtifactParamValue
enable linking projects & references to transforms & collections
rename Run.parent to Run.initiated_by_run
introduce a boolean flag on artifact that’s called _overwrite_versions, which indicates whether versions are overwritten or stored separately; it defaults to False for file-like artifacts and to True for folder-like artifacts
Rename n_objects to n_files for more clarity
Add a Space registry to lamindb with an FK on every BasicRecord
add a name column to Run so that a specific run can be used as a named specific analysis
remove _previous_runs field on everything except Artifact & Collection