Redun

Here, we’ll see how to track redun workflow runs with LaminDB.

Note

This use case is based on github.com/ricomnl/bioinformatics-pipeline-tutorial.

!lamin init --storage ./test-redun-lamin --schema bionty
Hide code cell output
→ connected lamindb: testuser1/test-redun-lamin

Amend the workflow

import lamindb as ln
import json
→ connected lamindb: testuser1/test-redun-lamin

Let’s amend a redun workflow.py to register input & output artifacts in LaminDB:

  • To track the workflow run in LaminDB, add (see on GitHub):

    ln.track(params=params)
    
  • To register the output file via LaminDB, add (see on GitHub):

    ln.Artifact(output_path, description="results").save()
    

Run redun

Let’s see what the input files are:

!ls ./fasta
KLF4.fasta  MYC.fasta  PO5F1.fasta  SOX2.fasta

And call the workflow:

!redun run workflow.py main --input-dir ./fasta --tag run=test-run  1> redun_stdout.txt 2>redun_stderr.txt

Inspect the output:

!cat redun_stdout.txt
→ connected lamindb: testuser1/test-redun-lamin
→ running outside of synched git repo, cloning https://github.com/laminlabs/redun-lamin into /home/runner/.cache/lamindb/redun-lamin
→ created Transform('taasWKaw'), started new Run('9t4d2Zeh') at 2024-10-19 00:01:45 UTC
→ params: input_dir='./fasta' executor='Executor.default' amino_acid='C' enzyme_regex='[KR]' max_length='75' min_length='4' missed_cleavages='0'
! folder is outside existing storage location, will copy files from ./fasta to /home/runner/work/redun-lamin/redun-lamin/docs/test-redun-lamin/fasta
?25l
downloading... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% -:--:--
downloading... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━  74% 0:00:01
downloading... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00
downloading... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00
?25h
→ finished Run('9t4d2Zeh') after 0:00:05.607783 at 2024-10-19 00:01:50 UTC
File(path=/home/runner/work/redun-lamin/redun-lamin/docs/test-redun-lamin/data/results.tgz, hash=e507bb34)

And the error log:

!tail -1 redun_stderr.txt
[redun] Execution duration: 8.06 seconds

View data lineage:

artifact = ln.Artifact.filter(description="results", suffix=".tgz").one()
artifact.view_lineage()
_images/c016b409eaa56a1f04683153eb1a62588328d14926166a268ad9f0c40b71f4aa.svg

Track the redun execution id

If we want to be able to query LaminDB for redun execution ID, this here is a way to get it:

# export the run information from redun
!redun log --exec --exec-tag run=test-run --format json --no-pager > redun_exec.json
# load the redun execution id from the JSON and store it in the LaminDB run record
with open("redun_exec.json", "r") as file:
    redun_exec = json.loads(file.readline())
artifact.run.reference = redun_exec["id"]
artifact.run.reference_type = "redun_id"
artifact.run.save()
Run(uid='9t4d2ZehITvPrmaTxJ4R', started_at=2024-10-19 00:01:45 UTC, finished_at=2024-10-19 00:01:50 UTC, is_consecutive=True, reference='73e01a79-1308-489e-8ee7-d16aaf0db3e6', reference_type='redun_id', transform_id=1, environment_id=5, created_by_id=1, created_at=2024-10-19 00:01:45 UTC)

Track the redun run report

Attach a run report:

report = ln.Artifact(
    "redun_stderr.txt",
    description=f"Redun run report of {redun_exec['id']}",
    run=False,
    visibility=0,
).save()
artifact.run.report = report
artifact.run.save()
Run(uid='9t4d2ZehITvPrmaTxJ4R', started_at=2024-10-19 00:01:45 UTC, finished_at=2024-10-19 00:01:50 UTC, is_consecutive=True, reference='73e01a79-1308-489e-8ee7-d16aaf0db3e6', reference_type='redun_id', transform_id=1, report_id=7, environment_id=5, created_by_id=1, created_at=2024-10-19 00:01:45 UTC)

View transforms and runs in LaminHub

hub

View the database content

ln.view()
****************
* module: core *
****************
Artifact
uid version is_latest description key suffix type size hash n_objects n_observations _hash_type _accessor visibility _key_is_virtual storage_id transform_id run_id created_at created_by_id
id
6 mcFprHnwpiExd4mE0000 None True results data/results.tgz .tgz None 83817 XoD-eic1p0F7zks0oh6AXA None None md5 None 1 False 1 1.0 1.0 2024-10-19 00:01:52.825895+00:00 1
4 52fcKnDFjDNayDi50000 None True None fasta/KLF4.fasta .fasta None 609 LyuoYkWs4SgYcH7P7JLJtA None None md5 None 1 True 1 NaN NaN 2024-10-19 00:01:45.829212+00:00 1
3 dVoPlr9tLFRe0b430000 None True None fasta/PO5F1.fasta .fasta None 477 -7iJgveFO9ia0wE1bqVu6g None None md5 None 1 True 1 NaN NaN 2024-10-19 00:01:45.828756+00:00 1
2 16hIFIjIgx6ZB6Eo0000 None True None fasta/SOX2.fasta .fasta None 414 C5q_yaFXGk4SAEpfdqBwnQ None None md5 None 1 True 1 NaN NaN 2024-10-19 00:01:45.828063+00:00 1
1 jFKdmCBTCZyutN6x0000 None True None fasta/MYC.fasta .fasta None 536 WGbEtzPw-3bQEGcngO_pHQ None None md5 None 1 True 1 NaN NaN 2024-10-19 00:01:45.826920+00:00 1
! No records found
! No records found
! No records found
Run
uid started_at finished_at is_consecutive reference reference_type transform_id report_id environment_id parent_id created_at created_by_id
id
1 9t4d2ZehITvPrmaTxJ4R 2024-10-19 00:01:45.249691+00:00 2024-10-19 00:01:50.857474+00:00 True 73e01a79-1308-489e-8ee7-d16aaf0db3e6 redun_id 1 7 5 None 2024-10-19 00:01:45.249742+00:00 1
Storage
uid root description type region instance_uid run_id created_at created_by_id
id
1 PDU6RsujGTpM /home/runner/work/redun-lamin/redun-lamin/docs... None local None iQlBPgD8uaqR None 2024-10-19 00:01:32.424936+00:00 1
Transform
uid version is_latest name key description type source_code hash reference reference_type _source_code_artifact_id created_at created_by_id
id
1 taasWKawCiNA0000 0.1.0 True workflow.py workflow.py None script """workflow.py"""\n\n# This code is a copy fro... B36u9mvhSeZwmt4wniwBNg https://github.com/laminlabs/redun-lamin/blob/... url None 2024-10-19 00:01:45.247268+00:00 1
ULabel
uid name description reference reference_type run_id created_at created_by_id
id
1 BFfCfpw5 redun None None None 1 2024-10-19 00:01:45.800095+00:00 1
User
uid handle name created_at
id
1 DzTjkKse testuser1 Test User1 2024-10-19 00:01:32.421062+00:00
******************
* module: bionty *
******************

Delete the test instance:

!rm -rf /Users/falexwolf/repos/redun-lamin/docs/test-redun-lamin
!lamin delete --force test-redun-lamin
Hide code cell output
Traceback (most recent call last):
  File "/opt/hostedtoolcache/Python/3.10.15/x64/bin/lamin", line 8, in <module>
    sys.exit(main())
  File "/opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/rich_click/rich_command.py", line 367, in __call__
    return super().__call__(*args, **kwargs)
  File "/opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/rich_click/rich_command.py", line 152, in main
    rv = self.invoke(ctx)
  File "/opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/lamin_cli/__main__.py", line 209, in delete
    return delete(instance, force=force)
  File "/opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/lamindb_setup/_delete.py", line 102, in delete
    n_objects = check_storage_is_empty(
  File "/opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/lamindb_setup/core/upath.py", line 772, in check_storage_is_empty
    raise InstanceNotEmpty(message)
lamindb_setup.core.upath.InstanceNotEmpty: Storage '/home/runner/work/redun-lamin/redun-lamin/docs/test-redun-lamin/.lamindb' contains 6 objects - delete them prior to deleting the instance