Redun

Here, we’ll see how to track redun workflow runs with LaminDB.

Note

This use case is based on github.com/ricomnl/bioinformatics-pipeline-tutorial.

!lamin init --storage ./test-redun-lamin --schema bionty
Hide code cell output
→ connected lamindb: testuser1/test-redun-lamin

Amend the workflow

import lamindb as ln
import json
→ connected lamindb: testuser1/test-redun-lamin

Let’s amend a redun workflow.py to register input & output artifacts in LaminDB:

  • To track the workflow run in LaminDB, add (see on GitHub):

    ln.track(params=params)
    
  • To register the output file via LaminDB, add (see on GitHub):

    ln.Artifact(output_path, description="results").save()
    

Run redun

Let’s see what the input files are:

!ls ./fasta
KLF4.fasta  MYC.fasta  PO5F1.fasta  SOX2.fasta

And call the workflow:

!redun run workflow.py main --input-dir ./fasta --tag run=test-run  1> redun_stdout.txt 2>redun_stderr.txt

Inspect the output:

!cat redun_stdout.txt
→ connected lamindb: testuser1/test-redun-lamin
→ running outside of synched git repo, cloning https://github.com/laminlabs/redun-lamin into /home/runner/.cache/lamindb/redun-lamin
→ created Transform('taasWKaw'), started new Run('B5bXqvlp') at 2024-11-21 05:38:13 UTC
→ params: input_dir='./fasta' executor='Executor.default' amino_acid='C' enzyme_regex='[KR]' max_length='75' min_length='4' missed_cleavages='0'
! folder is outside existing storage location, will copy files from ./fasta to /home/runner/work/redun-lamin/redun-lamin/docs/test-redun-lamin/fasta
?25l
downloading... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% -:--:--
downloading... ━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━  31% 0:00:01
downloading... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━  77% 0:00:01
downloading... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00
downloading... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00
?25h
→ finished Run('B5bXqvlp') after 0d 0h 0m 6s at 2024-11-21 05:38:20 UTC
File(path=/home/runner/work/redun-lamin/redun-lamin/docs/test-redun-lamin/data/results.tgz, hash=87d2c9d7)

And the error log:

!tail -1 redun_stderr.txt
[redun] Execution duration: 9.24 seconds

View data lineage:

artifact = ln.Artifact.filter(description="results", suffix=".tgz").one()
artifact.view_lineage()
_images/d1777db76c088627169a2050b7a180543d70cedac2b2ae2fdd284beaf46a6e9d.svg

Track the redun execution id

If we want to be able to query LaminDB for redun execution ID, this here is a way to get it:

# export the run information from redun
!redun log --exec --exec-tag run=test-run --format json --no-pager > redun_exec.json
# load the redun execution id from the JSON and store it in the LaminDB run record
with open("redun_exec.json", "r") as file:
    redun_exec = json.loads(file.readline())
artifact.run.reference = redun_exec["id"]
artifact.run.reference_type = "redun_id"
artifact.run.save()
Run(uid='B5bXqvlpwmHKWf43dx8F', started_at=2024-11-21 05:38:13 UTC, finished_at=2024-11-21 05:38:20 UTC, is_consecutive=True, reference='79f1262c-3572-43ab-88e7-73c955b64207', reference_type='redun_id', transform_id=1, environment_id=5, created_by_id=1, created_at=2024-11-21 05:38:13 UTC)

Track the redun run report

Attach a run report:

report = ln.Artifact(
    "redun_stderr.txt",
    description=f"Redun run report of {redun_exec['id']}",
    run=False,
    visibility=0,
).save()
artifact.run.report = report
artifact.run.save()
Run(uid='B5bXqvlpwmHKWf43dx8F', started_at=2024-11-21 05:38:13 UTC, finished_at=2024-11-21 05:38:20 UTC, is_consecutive=True, reference='79f1262c-3572-43ab-88e7-73c955b64207', reference_type='redun_id', transform_id=1, report_id=7, environment_id=5, created_by_id=1, created_at=2024-11-21 05:38:13 UTC)

View transforms and runs in LaminHub

hub

View the database content

ln.view()
****************
* module: core *
****************
Artifact
uid version is_latest description key suffix type size hash n_objects n_observations _hash_type _accessor visibility _key_is_virtual storage_id transform_id run_id created_at created_by_id
id
6 lGX0JdqwmtvDs4vK0000 None True results data/results.tgz .tgz None 83820 AS7kma56Et_If0h7cy1BLg None None md5 None 1 False 1 1.0 1.0 2024-11-21 05:38:22.176916+00:00 1
4 Sz4CrH9YtvrNUu6c0000 None True None fasta/KLF4.fasta .fasta None 609 LyuoYkWs4SgYcH7P7JLJtA None None md5 None 1 True 1 NaN NaN 2024-11-21 05:38:14.244412+00:00 1
3 PyzvgSJD1yAU73RS0000 None True None fasta/PO5F1.fasta .fasta None 477 -7iJgveFO9ia0wE1bqVu6g None None md5 None 1 True 1 NaN NaN 2024-11-21 05:38:14.243942+00:00 1
2 2B4vGQeL0HydWW350000 None True None fasta/SOX2.fasta .fasta None 414 C5q_yaFXGk4SAEpfdqBwnQ None None md5 None 1 True 1 NaN NaN 2024-11-21 05:38:14.243279+00:00 1
1 GRTAkaIePM43a4xe0000 None True None fasta/MYC.fasta .fasta None 536 WGbEtzPw-3bQEGcngO_pHQ None None md5 None 1 True 1 NaN NaN 2024-11-21 05:38:14.242099+00:00 1
! No records found
! No records found
! No records found
Run
uid started_at finished_at is_consecutive reference reference_type transform_id report_id environment_id parent_id created_at created_by_id
id
1 B5bXqvlpwmHKWf43dx8F 2024-11-21 05:38:13.525734+00:00 2024-11-21 05:38:20.052577+00:00 True 79f1262c-3572-43ab-88e7-73c955b64207 redun_id 1 7 5 None 2024-11-21 05:38:13.525793+00:00 1
Storage
uid root description type region instance_uid run_id created_at created_by_id
id
1 JFEZ4jhg6iHq /home/runner/work/redun-lamin/redun-lamin/docs... None local None iQlBPgD8uaqR None 2024-11-21 05:37:57.045059+00:00 1
Transform
uid version is_latest name key description type source_code hash reference reference_type _source_code_artifact_id created_at created_by_id
id
1 taasWKawCiNA0000 0.1.0 True workflow.py workflow.py None script """workflow.py"""\n\n# This code is a copy fro... B36u9mvhSeZwmt4wniwBNg https://github.com/laminlabs/redun-lamin/blob/... url None 2024-11-21 05:38:13.519883+00:00 1
ULabel
uid name description reference reference_type run_id created_at created_by_id
id
1 y8sdzOhx redun None None None 1 2024-11-21 05:38:14.208410+00:00 1
User
uid handle name created_at
id
1 DzTjkKse testuser1 Test User1 2024-11-21 05:37:57.038946+00:00
******************
* module: bionty *
******************

Delete the test instance:

!rm -rf /Users/falexwolf/repos/redun-lamin/docs/test-redun-lamin
!lamin delete --force test-redun-lamin
Hide code cell output
Traceback (most recent call last):
  File "/opt/hostedtoolcache/Python/3.10.15/x64/bin/lamin", line 8, in <module>
    sys.exit(main())
  File "/opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/rich_click/rich_command.py", line 367, in __call__
    return super().__call__(*args, **kwargs)
  File "/opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/rich_click/rich_command.py", line 152, in main
    rv = self.invoke(ctx)
  File "/opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/lamin_cli/__main__.py", line 209, in delete
    return delete(instance, force=force)
  File "/opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/lamindb_setup/_delete.py", line 102, in delete
    n_objects = check_storage_is_empty(
  File "/opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/lamindb_setup/core/upath.py", line 824, in check_storage_is_empty
    raise InstanceNotEmpty(message)
lamindb_setup.core.upath.InstanceNotEmpty: Storage '/home/runner/work/redun-lamin/redun-lamin/docs/test-redun-lamin/.lamindb' contains 6 objects - delete them prior to deleting the instance