Redun

Here, we’ll see how to track redun workflow runs with LaminDB.

Note

This use case is based on github.com/ricomnl/bioinformatics-pipeline-tutorial.

!lamin init --storage ./test-redun-lamin --schema bionty
Hide code cell output
💡 connected lamindb: testuser1/test-redun-lamin

Amend the workflow

import lamindb as ln
import json
💡 connected lamindb: testuser1/test-redun-lamin

Let’s amend a redun workflow.py to register input & output artifacts in LaminDB:

  • To track the workflow run in LaminDB, add (see on GitHub):

    ln.track(params=params)
    
  • To register the output file via LaminDB, add (see on GitHub):

    ln.Artifact(output_path, description="results").save()
    

Run redun

Let’s see what the input files are:

!ls ./fasta
KLF4.fasta  MYC.fasta  PO5F1.fasta  SOX2.fasta

And call the workflow:

!redun run workflow.py main --input-dir ./fasta --tag run=test-run  1> redun_stdout.txt 2>redun_stderr.txt

Inspect the output:

!cat redun_stdout.txt
💡 connected lamindb: testuser1/test-redun-lamin
💡 saved: Transform(uid='taasWKawCiNA6zf0', version='0.1.0', name='workflow.py', key='workflow.py', type='script', reference='https://github.com/laminlabs/redun-lamin/blob/9b3b4b43b3c99bbbacd3b269f2fc67dc81d69746/docs/workflow.py', reference_type='url', created_by_id=1, updated_at='2024-07-06 12:50:22 UTC')
💡 saved: Run(uid='aXjDMymFfzDDBot5zm7t', transform_id=1, created_by_id=1)
❗ this creates one artifact per file in the directory - you might simply call ln.Artifact(dir) to get one artifact for the entire directory
❗ folder is outside existing storage location, will copy files from ./fasta to /home/runner/work/redun-lamin/redun-lamin/docs/test-redun-lamin/fasta
?25l
downloading... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% -:--:--
downloading... ━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  24% 0:00:01
downloading... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00
downloading... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00
?25h
File(path=/home/runner/work/redun-lamin/redun-lamin/docs/test-redun-lamin/data/results.tgz, hash=bad25fd0)

And the error log:

!tail -1 redun_stderr.txt
[redun] Execution duration: 11.96 seconds

View data lineage:

artifact = ln.Artifact.filter(description="results", suffix=".tgz").one()
artifact.view_lineage()
_images/be8bdc75f31d2a82b305936c9a6c96e52e993981425eeebbb0f63cb3114852cc.svg

Track the redun execution id

If we want to be able to query LaminDB for redun execution ID, this here is a way to get it:

# export the run information from redun
!redun log --exec --exec-tag run=test-run --format json --no-pager > redun_exec.json
# load the redun execution id from the JSON and store it in the LaminDB run record
with open("redun_exec.json", "r") as file:
    redun_exec = json.loads(file.readline())
artifact.run.reference = redun_exec["id"]
artifact.run.reference_type = "redun_id"
artifact.run.save()
Run(uid='aXjDMymFfzDDBot5zm7t', started_at='2024-07-06 12:50:22 UTC', finished_at='2024-07-06 12:50:32 UTC', is_consecutive=True, reference='ed6d7af7-2f4a-4efe-9f84-170f557d81df', reference_type='redun_id', transform_id=1, created_by_id=1, environment_id=6)

Track the redun run report

Attach a run report:

report = ln.Artifact(
    "redun_stderr.txt",
    description=f"Redun run report of {redun_exec['id']}",
    run=False,
    visibility=0,
).save()
artifact.run.report = report
artifact.run.save()
Run(uid='aXjDMymFfzDDBot5zm7t', started_at='2024-07-06 12:50:22 UTC', finished_at='2024-07-06 12:50:32 UTC', is_consecutive=True, reference='ed6d7af7-2f4a-4efe-9f84-170f557d81df', reference_type='redun_id', transform_id=1, created_by_id=1, report_id=8, environment_id=6)

View transforms and runs in LaminHub

hub

View the database content

ln.view()
****************
* module: core *
****************
Artifact
uid version description key suffix type accessor size hash hash_type n_objects n_observations visibility key_is_virtual storage_id transform_id run_id created_by_id updated_at
id
7 ZEq1X4o7hrw8kpuECrgu None results data/results.tgz .tgz dataset None 83640 dpupdCa5RjbGI_O3rCp2-g md5 None None 1 False 1 1.0 1.0 1 2024-07-06 12:50:34.327207+00:00
4 c91bjJHRWwe8DtvqekBg None None fasta/MYC.fasta .fasta dataset None 536 WGbEtzPw-3bQEGcngO_pHQ md5 None None 1 True 1 NaN NaN 1 2024-07-06 12:50:23.291413+00:00
3 PufozrqnGUJpZMt4BzVU None None fasta/KLF4.fasta .fasta dataset None 609 LyuoYkWs4SgYcH7P7JLJtA md5 None None 1 True 1 NaN NaN 1 2024-07-06 12:50:23.290832+00:00
2 sq28b4irx9qxk62rNPwG None None fasta/SOX2.fasta .fasta dataset None 414 C5q_yaFXGk4SAEpfdqBwnQ md5 None None 1 True 1 NaN NaN 1 2024-07-06 12:50:23.290083+00:00
1 OdS4oB7LKzjRx7V4slf4 None None fasta/PO5F1.fasta .fasta dataset None 477 -7iJgveFO9ia0wE1bqVu6g md5 None None 1 True 1 NaN NaN 1 2024-07-06 12:50:23.288848+00:00
Run
uid started_at finished_at is_consecutive reference reference_type transform_id report_id environment_id created_by_id
id
1 aXjDMymFfzDDBot5zm7t 2024-07-06 12:50:22.724038+00:00 2024-07-06 12:50:32.325586+00:00 True ed6d7af7-2f4a-4efe-9f84-170f557d81df redun_id 1 8 6 1
Storage
uid root description type region instance_uid run_id created_by_id updated_at
id
1 vuaw4EC2Y6xt /home/runner/work/redun-lamin/redun-lamin/docs... None local None iQlBPgD8uaqR None 1 2024-07-06 12:49:58.709353+00:00
Transform
uid version name key description type reference reference_type latest_report_id source_code_id created_by_id updated_at
id
1 taasWKawCiNA6zf0 0.1.0 workflow.py workflow.py None script https://github.com/laminlabs/redun-lamin/blob/... url None 5 1 2024-07-06 12:50:32.327560+00:00
ULabel
uid name description reference reference_type run_id created_by_id updated_at
id
1 7b7aDgeA redun None None None 1 1 2024-07-06 12:50:23.266968+00:00
User
uid handle name updated_at
id
1 DzTjkKse testuser1 Test User1 2024-07-06 12:49:58.704052+00:00
******************
* module: bionty *
******************
Organism
uid name ontology_id scientific_name public_source_id run_id created_by_id updated_at
id
1 1dpCL6Td human NCBITaxon:9606 homo_sapiens 1 1 1 2024-07-06 12:50:24.298444+00:00
Protein
uid name uniprotkb_id synonyms length gene_symbol ensembl_gene_ids organism_id public_source_id run_id created_by_id updated_at
id
4 36jnmKHdiT9m MYC_HUMAN Myc proto-oncogene protein P01106 454 MYC ENST00000377970.6 [P01106-1]|ENST00000524013.2... 1 22 1 1 2024-07-06 12:50:32.304506+00:00
3 6ThKerPbf6DR KLF4_HUMAN Krueppel-like factor 4 O43474 513 KLF4 ENST00000374672.5 [O43474-1] 1 22 1 1 2024-07-06 12:50:30.200795+00:00
2 38rbzWPtKmb2 SOX2_HUMAN Transcription factor SOX-2 P48431 317 SOX2 ENST00000325404.3 1 22 1 1 2024-07-06 12:50:28.294744+00:00
1 3qNrC4hwnDC9 PO5F1_HUMAN POU domain, class 5, transcription... Q01860 transcription factor 1 (Octamer-binding protei... 360 POU5F1 ENST00000259915.13 [Q01860-1]|ENST00000376243.... 1 22 1 1 2024-07-06 12:50:26.438113+00:00
PublicSource
uid entity organism currently_used source source_name version url md5 source_website run_id created_by_id updated_at
id
73 5JnV BioSample all True ncbi NCBI BioSample attributes 2023-09 s3://bionty-assets/df_all__ncbi__2023-09__BioS... 918db9bd1734b97c596c67d9654a4126 https://www.ncbi.nlm.nih.gov/biosample/docs/at... None 1 2024-07-06 12:49:58.824098+00:00
72 3Tlc Ethnicity human True hancestro Human Ancestry Ontology 3.0 https://github.com/EBISPOT/hancestro/raw/3.0/h... 76dd9efda9c2abd4bc32fc57c0b755dd https://github.com/EBISPOT/hancestro None 1 2024-07-06 12:49:58.823926+00:00
71 16tR DevelopmentalStage mouse True mmusdv Mouse Developmental Stages 2020-03-10 http://aber-owl.net/media/ontologies/MMUSDV/9/... 5bef72395d853c7f65450e6c2a1fc653 https://github.com/obophenotype/developmental-... None 1 2024-07-06 12:49:58.820665+00:00
70 7CRn DevelopmentalStage human True hsapdv Human Developmental Stages 2020-03-10 http://aber-owl.net/media/ontologies/HSAPDV/11... 52181d59df84578ed69214a5cb614036 https://github.com/obophenotype/developmental-... None 1 2024-07-06 12:49:58.820505+00:00
69 3TI0 Drug all False dron Drug Ontology 2023-03-10 https://data.bioontology.org/ontologies/DRON/s... 75e86011158fae76bb46d96662a33ba3 https://bioportal.bioontology.org/ontologies/DRON None 1 2024-07-06 12:49:58.820345+00:00
68 5alK Drug all True dron Drug Ontology 2024-03-02 https://data.bioontology.org/ontologies/DRON/s... 84138459de4f65034e979f4e46783747 https://bioportal.bioontology.org/ontologies/DRON None 1 2024-07-06 12:49:58.820185+00:00
67 3rm9 BFXPipeline all True lamin Bioinformatics Pipeline 1.0.0 s3://bionty-assets/bfxpipelines.json a7eff57a256994692fba46e0199ffc94 https://lamin.ai None 1 2024-07-06 12:49:58.820023+00:00

Delete the test instance:

!rm -rf /Users/falexwolf/repos/redun-lamin/docs/test-redun-lamin
!lamin delete --force test-redun-lamin
Hide code cell output
Traceback (most recent call last):
  File "/opt/hostedtoolcache/Python/3.10.14/x64/bin/lamin", line 8, in <module>
    sys.exit(main())
  File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/rich_click/rich_command.py", line 367, in __call__
    return super().__call__(*args, **kwargs)
  File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/rich_click/rich_command.py", line 152, in main
    rv = self.invoke(ctx)
  File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/lamin_cli/__main__.py", line 103, in delete
    return delete(instance, force=force)
  File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/lamindb_setup/_delete.py", line 98, in delete
    n_objects = check_storage_is_empty(
  File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/lamindb_setup/core/upath.py", line 779, in check_storage_is_empty
    raise InstanceNotEmpty(message)
lamindb_setup.core.upath.InstanceNotEmpty: Storage /home/runner/work/redun-lamin/redun-lamin/docs/test-redun-lamin/.lamindb contains 7 objects ('_is_initialized' ignored) - delete them prior to deleting the instance
['/home/runner/work/redun-lamin/redun-lamin/docs/test-redun-lamin/.lamindb/9xWopgbwhOQCR9CeXAKn.py', '/home/runner/work/redun-lamin/redun-lamin/docs/test-redun-lamin/.lamindb/Im6emwpYfonckpgdPLbO.txt', '/home/runner/work/redun-lamin/redun-lamin/docs/test-redun-lamin/.lamindb/OdS4oB7LKzjRx7V4slf4.fasta', '/home/runner/work/redun-lamin/redun-lamin/docs/test-redun-lamin/.lamindb/PufozrqnGUJpZMt4BzVU.fasta', '/home/runner/work/redun-lamin/redun-lamin/docs/test-redun-lamin/.lamindb/_is_initialized', '/home/runner/work/redun-lamin/redun-lamin/docs/test-redun-lamin/.lamindb/bMPGDiyFuNqyl4ID7qEM.txt', '/home/runner/work/redun-lamin/redun-lamin/docs/test-redun-lamin/.lamindb/c91bjJHRWwe8DtvqekBg.fasta', '/home/runner/work/redun-lamin/redun-lamin/docs/test-redun-lamin/.lamindb/sq28b4irx9qxk62rNPwG.fasta']