Post-run script¶

Nextflow is the most widely used workflow manager in bioinformatics.

We generally recommend using the nf-lamin plugin. However, if lower level LaminDB usage is required, it might be worthwhile writing a custom Python script.

This guide shows how to register a Nextflow run with inputs & outputs for the example of the nf-core/scrnaseq pipeline by running a Python script.

The approach could be automated by deploying the script via

a serverless environment trigger (e.g., AWS Lambda)
a post-run script on the Seqera Platform

!lamin init --storage ./test-nextflow --name test-nextflow

Run the pipeline¶

Let’s download the input data from an S3 bucket.

import lamindb as ln

input_path = ln.UPath("s3://lamindb-test/scrnaseq_input")
input_path.download_to("scrnaseq_input")

→ connected lamindb: testuser1/test-nextflow

And run the nf-core/scrnaseq pipeline.

# the test profile uses all downloaded input files as an input
!nextflow run nf-core/scrnaseq -r 4.0.0 -profile docker,test -resume --outdir scrnaseq_output

Show code cell output Hide code cell output

N E X T F L O W  ~  version 25.10.2

Pulling nf-core/scrnaseq ...

 downloaded from https://github.com/nf-core/scrnaseq.git

WARN: It appears you have never run this project before -- Option `-resume` is ignored

Downloading plugin [email protected]

WARN: It appears you have never run this project before -- Option `-resume` is ignored
Launching `https://github.com/nf-core/scrnaseq` [elated_mendel] DSL2 - revision: e0ddddbff9 [4.0.0]

Downloading plugin [email protected]

------------------------------------------------------
                                        ,--./,-.
        ___     __   __   __   ___     /,-._.--~'
  |\ | |__  __ /  ` /  \ |__) |__         }  {
  | \| |       \__, \__/ |  \ |___     \`-._,-`-,
                                        `._,._,'
  nf-core/scrnaseq 4.0.0
------------------------------------------------------
Input/output options
  input                     : https://github.com/nf-core/test-datasets/raw/scrnaseq/samplesheet-2-0.csv
  outdir                    : scrnaseq_output

Mandatory arguments
  aligner                   : star
  protocol                  : 10XV2

Skip Tools
  skip_cellbender           : true

Reference genome options
  fasta                     : https://github.com/nf-core/test-datasets/raw/scrnaseq/reference/GRCm38.p6.genome.chr19.fa
  gtf                       : https://github.com/nf-core/test-datasets/raw/scrnaseq/reference/gencode.vM19.annotation.chr19.gtf
  save_align_intermeds      : true

Institutional config options
  config_profile_name       : Test profile
  config_profile_description: Minimal test dataset to check pipeline function

Generic options
  trace_report_suffix       : 2025-12-14_22-40-10

Core Nextflow options
  revision                  : 4.0.0
  runName                   : elated_mendel
  containerEngine           : docker
  launchDir                 : /home/runner/work/nf-lamin/nf-lamin/docs
  workDir                   : /home/runner/work/nf-lamin/nf-lamin/docs/work
  projectDir                : /home/runner/.nextflow/assets/nf-core/scrnaseq
  userName                  : runner
  profile                   : docker,test
  configFiles               : /home/runner/.nextflow/assets/nf-core/scrnaseq/nextflow.config, /home/runner/work/nf-lamin/nf-lamin/docs/nextflow.config

!! Only displaying parameters that differ from the pipeline defaults !!
------------------------------------------------------
* The pipeline
    https://doi.org/10.5281/zenodo.3568187

* The nf-core framework
    https://doi.org/10.1038/s41587-020-0439-x

* Software dependencies
    https://github.com/nf-core/scrnaseq/blob/master/CITATIONS.md

Downloading plugin [email protected]

WARN: The following invalid input values have been detected:

* --validationSchemaIgnoreParams: genomes

[5c/5b1038] Submitted process > NFCORE_SCRNASEQ:SCRNASEQ:FASTQC_CHECK:FASTQC (Sample_X)

[7b/f64afa] Submitted process > NFCORE_SCRNASEQ:SCRNASEQ:FASTQC_CHECK:FASTQC (Sample_Y)

[b2/872cd9] Submitted process > NFCORE_SCRNASEQ:SCRNASEQ:GTF_GENE_FILTER (GRCm38.p6.genome.chr19.fa)

[2d/c4cc5a] Submitted process > NFCORE_SCRNASEQ:SCRNASEQ:STARSOLO:STAR_GENOMEGENERATE (GRCm38.p6.genome.chr19.fa)

[5c/59b422] Submitted process > NFCORE_SCRNASEQ:SCRNASEQ:STARSOLO:STAR_ALIGN (Sample_X)

[87/6b970b] Submitted process > NFCORE_SCRNASEQ:SCRNASEQ:STARSOLO:STAR_ALIGN (Sample_Y)

[c7/35f331] Submitted process > NFCORE_SCRNASEQ:SCRNASEQ:MTX_TO_H5AD (Sample_X)

[52/fd0401] Submitted process > NFCORE_SCRNASEQ:SCRNASEQ:MTX_TO_H5AD (Sample_Y)

[de/f7e550] Submitted process > NFCORE_SCRNASEQ:SCRNASEQ:MTX_TO_H5AD (Sample_Y)

[db/4b9311] Submitted process > NFCORE_SCRNASEQ:SCRNASEQ:MTX_TO_H5AD (Sample_X)

[99/c084a2] Submitted process > NFCORE_SCRNASEQ:SCRNASEQ:H5AD_CONVERSION:ANNDATAR_CONVERT (Sample_X)

[99/dbb1c5] Submitted process > NFCORE_SCRNASEQ:SCRNASEQ:MULTIQC

[54/a19291] Submitted process > NFCORE_SCRNASEQ:SCRNASEQ:H5AD_CONVERSION:ANNDATAR_CONVERT (Sample_Y)

[9b/3e67c6] Submitted process > NFCORE_SCRNASEQ:SCRNASEQ:H5AD_CONVERSION:ANNDATAR_CONVERT (Sample_Y)

[19/4b046c] Submitted process > NFCORE_SCRNASEQ:SCRNASEQ:H5AD_CONVERSION:ANNDATAR_CONVERT (Sample_X)

[b2/3437be] Submitted process > NFCORE_SCRNASEQ:SCRNASEQ:H5AD_CONVERSION:CONCAT_H5AD (combined)

[34/db53b9] Submitted process > NFCORE_SCRNASEQ:SCRNASEQ:H5AD_CONVERSION:CONCAT_H5AD (combined)

[36/2316b7] Submitted process > NFCORE_SCRNASEQ:SCRNASEQ:H5AD_CONVERSION:ANNDATAR_CONVERT (combined)

[95/cd96a8] Submitted process > NFCORE_SCRNASEQ:SCRNASEQ:H5AD_CONVERSION:ANNDATAR_CONVERT (combined)

-[nf-core/scrnaseq] Pipeline completed successfully-

Run the registration script¶

After the pipeline has completed, a Python script registers inputs & outputs in LaminDB.

nf-core/scrnaseq run registration¶

import argparse
import lamindb as ln
import json
import re
from pathlib import Path
from lamin_utils import logger


def parse_arguments() -> argparse.Namespace:
    parser = argparse.ArgumentParser()
    parser.add_argument("--input", type=str, required=True)
    parser.add_argument("--output", type=str, required=True)
    return parser.parse_args()


def register_pipeline_io(input_dir: str, output_dir: str, run: ln.Run) -> None:
    """Register input and output artifacts for an `nf-core/scrnaseq` run."""
    input_artifacts = ln.Artifact.from_dir(input_dir, run=False)
    ln.save(input_artifacts)
    run.input_artifacts.set(input_artifacts)
    ln.Artifact(f"{output_dir}/multiqc", description="multiqc report", run=run).save()
    ln.Artifact(
        f"{output_dir}/star/mtx_conversions/combined_filtered_matrix.h5ad",
        key="filtered_count_matrix.h5ad",
        run=run,
    ).save()


def register_pipeline_metadata(output_dir: str, run: ln.Run) -> None:
    """Register nf-core run metadata stored in the 'pipeline_info' folder."""
    ulabel = ln.ULabel(name="nextflow").save()
    run.transform.ulabels.add(ulabel)

    # nextflow run id
    content = next(Path(f"{output_dir}/pipeline_info").glob("execution_report_*.html")).read_text()
    match = re.search(r"run id \[([^\]]+)\]", content)
    nextflow_id = match.group(1) if match else ""
    run.reference = nextflow_id
    run.reference_type = "nextflow_id"

    # completed at
    completion_match = re.search(r'<span id="workflow_complete">([^<]+)</span>', content)
    if completion_match:
        from datetime import datetime

        timestamp_str = completion_match.group(1).strip()
        run.finished_at = datetime.strptime(timestamp_str, "%d-%b-%Y %H:%M:%S")

    # execution report and software versions
    for file_pattern, description, run_attr in [
        ("execution_report*", "execution report", "report"),
        ("nf_core_*_software*", "software versions", "environment"),
    ]:
        matching_files = list(Path(f"{output_dir}/pipeline_info").glob(file_pattern))
        if not matching_files:
            logger.warning(f"No files matching '{file_pattern}' in pipeline_info")
            continue

        artifact = ln.Artifact(
            matching_files[0],
            description=f"nextflow run {description} of {nextflow_id}",
            visibility=0,
            run=False,
        ).save()
        setattr(run, run_attr, artifact)

    # nextflow run parameters
    params_path = next(Path(f"{output_dir}/pipeline_info").glob("params*"))
    with params_path.open() as params_file:
        params = json.load(params_file)
    ln.Param(name="params", dtype="dict").save()
    run.features.add_values({"params": params})
    run.save()


args = parse_arguments()
scrnaseq_transform = ln.Transform(
    key="scrna-seq",
    version="4.0.0",
    type="pipeline",
    reference="https://github.com/nf-core/scrnaseq",
).save()
run = ln.Run(transform=scrnaseq_transform).save()
register_pipeline_io(args.input, args.output, run)
register_pipeline_metadata(args.output, run)

!python register_scrnaseq_run.py --input scrnaseq_input --output scrnaseq_output

Data lineage¶

The output data could now be accessed (in a different notebook/script) for analysis with full lineage.

matrix_af = ln.Artifact.get(key__icontains="filtered_count_matrix.h5ad")

matrix_af.view_lineage()

_images/5c752d5dbc1baca1f1b07efe23135d550a3e586dfe326c791b4cc65726082eda.svg

Post-run script¶

Run the pipeline¶

Run the registration script¶

Data lineage¶

View transforms & runs on the hub¶