Changelog 2025

Note

Get notified by watching releases for git repositories: lamindb, laminhub, laminr, and bionty.

🪜 For other years, see: 2024 · 2023 · 2022

2025-01-21 db 1.0.3

🚚 Revert Collection.description back to unlimited length TextField. PR @falexwolf

2025-01-20 db 1.0.2

🚚 Improvments for lamindb v1 migrations. PR @falexwolf

  • add a .description field to Schema

  • enable labeling Run with ULabel

  • add a .predecessors and .successors field to Project akin to what’s present on Transform

  • make .uid fields not editable

2025-01-18 db 1.0.1

🐛 Block non-admin users from confirming the dialogue for integrating lnschema-core. PR @falexwolf

2025-01-17 db 1.0.0

This release makes the API consistent, integrates lnschema_core & ourprojects into the lamindb package, and introduces a breadth of database migrations to enable future features without disruption. You’ll now need at least Python 3.10.

Your code will continue to run as is, but you will receive warnings about a few renamed API components.

What

Before

After

Dataset vs. model

Artifact.type

Artifact.kind

Python object for Artifact

Artifact._accessor

Artifact.otype

Number of files

Artifact.n_objects

Artifact.n_files

name arg of Transform

Transform(name="My notebook", key="my-notebook.ipynb")

Transform(key="my-notebook.ipynb", description="My notebook")

name arg of Collection

Collection(name="My collection")

Collection(key="My collection")

Consecutiveness field

Run.is_consecutive

Run._is_consecutive

Run initiator

Run.parent

Run.initiated_by_run

--schema arg

lamin init --schema bionty,wetlab

lamin init --modules bionty,wetlab

Migration guide:

  1. Upon lamin connect account/instance you will be prompted to confirm migrating away from lnschema_core

  2. After that, you will be prompted to call lamin migrate deploy to apply database migrations

New features:

  • ✨ Allow http storage backend for Artifact PR @Koncopd

  • ✨ Add SpatialDataCurator PR @Zethson

  • ✨ Allow filtering by multiple obs columns in MappedCollection PR @Koncopd

  • ✨ In git sync, also search git blob hash in non-default branches PR @Zethson

  • ✨ Add relationship with Project to everything except Run, Storage & User so that you can easily filter for the entities relevant to your project PR @falexwolf

  • ✨ Capture logs of scripts during ln.track() PR1 PR2 @falexwolf @Koncopd

  • ✨ Support "|"-seperated multi-values in Curator PR @sunnyosun

  • 🚸 Accept None in connect() and improve migration dialogue PR @falexwolf

UX improvements:

  • 🚸 Simplify the ln.track() experience PR @falexwolf

    1. you can omit the uid argument

    2. you can organize transforms in folders

    3. versioning is fully automated (requirement for 1.)

    4. you can save scripts and notebooks without running them (corollary of 1.)

    5. you avoid the interactive prompt in a notebook and the throwing of an error in a script (corollary of 1.)

    6. you are no longer required to add a title in a notebook

  • 🚸 Raise error when modifying Artifact.key in problematic ways PR1 PR2 @sunnyosun @Koncopd

  • 🚸 Better error message on running ln.track() within Python terminal PR @Koncopd

  • 🚸 Hide traceback for InstanceNotEmpty using Click Exception PR @Zethson

  • 🚸 Hide underscore attributes in __repr__ PR @Zethson

  • 🚸 Only auto-search ._name_field in sub-classes of CanCurate PR @falexwolf

  • 🚸 Simplify installation & API overview PR @falexwolf

  • 🚸 Make lamin_run_uid categorical in tiledbsoma stores PR @Koncopd

  • 🚸 Add defensive check for organism arg PR @Zethson

  • 🚸 Raise ValueError when trying to search a None value PR @Zethson

Bug fixes:

  • 🐛 Skip deleting storage when deleting outdated versions of folder-like artifacts PR @Koncopd

  • 🐛 Let SOMACurator() validate and annotate all .obs columns PR @falexwolf

  • 🐛 Fix renaming of feature sets PR @sunnyosun

  • 🐛 Do not raise an exception when default AWS credentials fail PR @Koncopd

  • 🐛 Only map synonyms when field is name PR @sunnyosun

  • 🐛 Fix source in .from_values PR @sunnyosun

  • 🐛 Fix creating instances with storage in the current local working directory PR @Koncopd

  • 🐛 Fix NA values in Curator.add_new_from() PR @sunnyosun

Refactors, renames & maintenance:

  • 🏗️ Integrate lnschema-core into lamindb PR1 PR2 @falexwolf @Koncopd

  • 🏗️ Integrate ourprojects into lamindb PR @falexwolf

  • ♻️ Manage created_at, updated_at on the database-level, make created_by not editable PR @falexwolf

  • 🚚 Rename transform type “glue” to “linker” PR @falexwolf

  • 🚚 Deprecate the --schema argument of lamin init in favor of --modules PR @falexwolf

  • ⬆️ Compatibility with tiledbsoma==1.15.0 PR @Koncopd

DevOps:

Detailed list of database migrations

Those not yet announced above will be announced with the functionality they enable.

  • ♻️ Add contenttypes Django plugin PR @falexwolf

  • 🚚 Prepare introduction of persistable Curator objects by renaming FeatureSet to Schema on the database-level PR @falexwolf

  • 🚚 Add a .type foreign key to ULabel, Feature, FeatureSet, Reference, Param PR @falexwolf

  • 🚚 Introduce RunData, TidyTable, and TidyTableData in the database PR @falexwolf

All remaining database schema changes were made in this PR @falexwolf. Data migrations happen automatically.

  • remove _source_code_artifact from Transform, it’s been deprecated since 0.75

    • data migration: for all transforms that have _source_code_artifact populated, populate source_code

  • rename Transform.name to Transform.description because it’s analogous to Artifact.description

    • backward compat:

      • in the Transform constructor use name to populate key in all cases in which only name is passed

      • return the same transform based on key in case source_code is None via ._name_field = "key"

    • data migrations:

      • there already was a legacy description field that was never exposed on the constructor; to be safe, we concatenated potential data in it on the new description field

      • for all transforms that have key=None and name!=None, use name to pre-populate key

  • rename Collection.name to Collection.key for consistency with Artifact & Transform and the high likelihood of you wanting to organize them hierarchically

  • a _branch_code integer on every record to model pull requests

    • include visibility within that code

    • repurpose visibility=0 as _branch_code=0 as “archive”

    • put an index on it

    • code a “draft” as _branch_code = 2, and “draft prs” as negative branch codes

  • rename values "number" to "num" in dtype

  • an ._aux json field on Record

  • a SmallInteger run._status_code that allows to write finished_at in clean up operations so that there is a run time also for aborted runs

  • rename Run.is_consecutive to Run._is_consecutive

  • a _template_id FK to store the information of the generating template (whether a record is a template is coded via _branch_code)

  • rename _accessor to otype to publicly declare the data format as suffix, accessor

  • rename Artifact.type to Artifact.kind

  • a FK to artifact run._logfile which holds logs

  • a hash field on ParamValue and FeatureValue to enforce uniqueness without running the danger of failure for large dictionaries

  • add a boolean field ._expect_many to Feature/Param that defaults to True/False and indicates whether values for this feature/param are expected to occur a single or multiple times for every single artifact/run

    • for feature

      • if it’s True (default), the values come from an observation-level aggregation and a dtype of datetime on the observation-level mean set[datetime] on the artifact-level

      • if it’s False it’s an artifact-level value and datetime means datetime; this is an edge case because an arbitrary artifact would always be a set of arbitrary measurements that would need to be aggregated (“one just happens to measure a single cell line in that artifact”)

    • for param

      • if it’s False (default), the values mean artifact/run-level values and datetime means datetime

      • if it’s True, the values would be from an aggregation, this seems like an edge case but say when characterizing a model ensemble trained with different parameters it could be relevant

  • remove the .transform foreign key from artifact and collection for consistency with all other records; introduce a property and a simple filter statement instead that maintains the same UX

  • store provenance metadata for TransformULabel, RunParamValue, ArtifactParamValue

  • enable linking projects & references to transforms & collections

  • rename Run.parent to Run.initiated_by_run

  • introduce a boolean flag on artifact that’s called _overwrite_versions, which indicates whether versions are overwritten or stored separately; it defaults to False for file-like artifacts and to True for folder-like artifacts

  • Rename n_objects to n_files for more clarity

  • Add a Space registry to lamindb with an FK on every BasicRecord

  • add a name column to Run so that a specific run can be used as a named specific analysis

  • remove _previous_runs field on everything except Artifact & Collection