Redun

Here, we’ll see how to track redun workflow runs with LaminDB.

Note

This use case is based on github.com/ricomnl/bioinformatics-pipeline-tutorial.

!lamin init --storage ./test-redun-lamin --schema bionty
Hide code cell output
💡 connected lamindb: testuser1/test-redun-lamin

Amend the workflow

import lamindb as ln
import json
💡 connected lamindb: testuser1/test-redun-lamin

Let’s amend a redun workflow.py to register input & output artifacts in LaminDB:

  • To track the workflow run in LaminDB, add (see on GitHub):

    ln.track(params=params)
    
  • To register the output file via LaminDB, add (see on GitHub):

    ln.Artifact(output_path, description="results").save()
    

Run redun

Let’s see what the input files are:

!ls ./fasta
KLF4.fasta  MYC.fasta  PO5F1.fasta  SOX2.fasta

And call the workflow:

!redun run workflow.py main --input-dir ./fasta --tag run=test-run  1> redun_stdout.txt 2>redun_stderr.txt

Inspect the output:

!cat redun_stdout.txt
💡 connected lamindb: testuser1/test-redun-lamin
💡 running outside of synched git repo, cloning https://github.com/laminlabs/redun-lamin into /home/runner/.cache/lamindb/redun-lamin
💡 saved: Transform(uid='taasWKawCiNA6zf0', version='0.1.0', name='workflow.py', key='workflow.py', type='script', reference='https://github.com/laminlabs/redun-lamin/blob/9b3b4b43b3c99bbbacd3b269f2fc67dc81d69746/docs/workflow.py', reference_type='url', created_by_id=1, updated_at='2024-07-26 14:36:42 UTC')
💡 saved: Run(uid='Mq8iwFKNePPKRfLPFtxk', transform_id=1, created_by_id=1)
❗ this creates one artifact per file in the directory - you might simply call ln.Artifact(dir) to get one artifact for the entire directory
❗ folder is outside existing storage location, will copy files from ./fasta to /home/runner/work/redun-lamin/redun-lamin/docs/test-redun-lamin/fasta
?25l
downloading... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% -:--:--
downloading... ━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  11% 0:00:01
downloading... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━  74% 0:00:01
downloading... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━  74% 0:00:01
downloading... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00
downloading... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00
?25h
File(path=/home/runner/work/redun-lamin/redun-lamin/docs/test-redun-lamin/data/results.tgz, hash=a413c0f6)

And the error log:

!tail -1 redun_stderr.txt
[redun] Execution duration: 12.83 seconds

View data lineage:

artifact = ln.Artifact.filter(description="results", suffix=".tgz").one()
artifact.view_lineage()
_images/e716c42baa733da69a41c2a49586b9ed9f96fcb5ac5f69ba02d6147f60bde5e9.svg

Track the redun execution id

If we want to be able to query LaminDB for redun execution ID, this here is a way to get it:

# export the run information from redun
!redun log --exec --exec-tag run=test-run --format json --no-pager > redun_exec.json
# load the redun execution id from the JSON and store it in the LaminDB run record
with open("redun_exec.json", "r") as file:
    redun_exec = json.loads(file.readline())
artifact.run.reference = redun_exec["id"]
artifact.run.reference_type = "redun_id"
artifact.run.save()
Run(uid='Mq8iwFKNePPKRfLPFtxk', started_at='2024-07-26 14:36:42 UTC', finished_at='2024-07-26 14:36:53 UTC', is_consecutive=True, reference='caa56866-913d-4db1-9b62-e438f339a734', reference_type='redun_id', transform_id=1, created_by_id=1, environment_id=6)

Track the redun run report

Attach a run report:

report = ln.Artifact(
    "redun_stderr.txt",
    description=f"Redun run report of {redun_exec['id']}",
    run=False,
    visibility=0,
).save()
artifact.run.report = report
artifact.run.save()
Run(uid='Mq8iwFKNePPKRfLPFtxk', started_at='2024-07-26 14:36:42 UTC', finished_at='2024-07-26 14:36:53 UTC', is_consecutive=True, reference='caa56866-913d-4db1-9b62-e438f339a734', reference_type='redun_id', transform_id=1, created_by_id=1, report_id=8, environment_id=6)

View transforms and runs in LaminHub

hub

View the database content

ln.view()
****************
* module: core *
****************
Artifact
uid version description key suffix type accessor size hash hash_type n_objects n_observations visibility key_is_virtual storage_id transform_id run_id created_by_id updated_at
id
7 eU0RWORwwc0Lbof0sCGW None results data/results.tgz .tgz dataset None 83892 vLIgl4MqH9AlJgwmF58hew md5 None None 1 False 1 1.0 1.0 1 2024-07-26 14:36:55.051269+00:00
4 Eq71OWB33IGtAEgfmJl7 None None fasta/SOX2.fasta .fasta dataset None 414 C5q_yaFXGk4SAEpfdqBwnQ md5 None None 1 True 1 NaN NaN 1 2024-07-26 14:36:43.399238+00:00
3 Rw3ujzMGrSBIz1YBbUdl None None fasta/MYC.fasta .fasta dataset None 536 WGbEtzPw-3bQEGcngO_pHQ md5 None None 1 True 1 NaN NaN 1 2024-07-26 14:36:43.398658+00:00
2 wnDG5zgjYc8gBrVNdI4D None None fasta/KLF4.fasta .fasta dataset None 609 LyuoYkWs4SgYcH7P7JLJtA md5 None None 1 True 1 NaN NaN 1 2024-07-26 14:36:43.397900+00:00
1 Vt5zocplJcXDknyDW7PH None None fasta/PO5F1.fasta .fasta dataset None 477 -7iJgveFO9ia0wE1bqVu6g md5 None None 1 True 1 NaN NaN 1 2024-07-26 14:36:43.396614+00:00
Run
uid started_at finished_at is_consecutive reference reference_type transform_id report_id environment_id created_by_id
id
1 Mq8iwFKNePPKRfLPFtxk 2024-07-26 14:36:42.828558+00:00 2024-07-26 14:36:53.052234+00:00 True caa56866-913d-4db1-9b62-e438f339a734 redun_id 1 8 6 1
Storage
uid root description type region instance_uid run_id created_by_id updated_at
id
1 iifPKVuGs2qC /home/runner/work/redun-lamin/redun-lamin/docs... None local None None None 1 2024-07-26 14:36:31.180860+00:00
Transform
uid version name key description type reference reference_type latest_report_id source_code_id created_by_id updated_at
id
1 taasWKawCiNA6zf0 0.1.0 workflow.py workflow.py None script https://github.com/laminlabs/redun-lamin/blob/... url None 5 1 2024-07-26 14:36:53.054365+00:00
ULabel
uid name description reference reference_type run_id created_by_id updated_at
id
1 JL67MXW2 redun None None None 1 1 2024-07-26 14:36:43.376696+00:00
User
uid handle name updated_at
id
1 DzTjkKse testuser1 Test User1 2024-07-26 14:36:31.177137+00:00
******************
* module: bionty *
******************
Organism
uid name ontology_id scientific_name synonyms description source_id run_id created_by_id updated_at
id
1 1dpCL6Td human NCBITaxon:9606 homo_sapiens None None 1 1 1 2024-07-26 14:36:44.471538+00:00
Protein
uid name uniprotkb_id synonyms description length gene_symbol ensembl_gene_ids organism_id source_id run_id created_by_id updated_at
id
4 38rbzWPtKmb2 SOX2_HUMAN Transcription factor SOX-2 P48431 317 SOX2 ENST00000325404.3; 1 22 1 1 2024-07-26 14:36:53.030986+00:00
3 36jnmKHdiT9m MYC_HUMAN Myc proto-oncogene protein P01106 Class E basic helix-loop-helix protein 39|bHLH... 454 MYC ENST00000377970.6 [P01106-1];ENST00000524013.2... 1 22 1 1 2024-07-26 14:36:50.826416+00:00
2 6ThKerPbf6DR KLF4_HUMAN Krueppel-like factor 4 O43474 Epithelial zinc finger protein EZF|Gut-enriche... 513 KLF4 ENST00000374672.5 [O43474-1]; 1 22 1 1 2024-07-26 14:36:48.967931+00:00
1 3qNrC4hwnDC9 PO5F1_HUMAN POU domain, class 5, transcription... Q01860 Octamer-binding protein 3|Oct-3|Octamer-bindin... class 5, transcription factor 1 360 POU5F1 ENST00000259915.13 [Q01860-1];ENST00000376243.... 1 22 1 1 2024-07-26 14:36:46.949710+00:00
Source
uid entity organism source version in_db currently_used source_name url md5 source_website df_id run_id created_by_id updated_at
id
75 3pvh BioSample all ncbi 2023-09 False True NCBI BioSample attributes s3://bionty-assets/df_all__ncbi__2023-09__BioS... 918db9bd1734b97c596c67d9654a4126 https://www.ncbi.nlm.nih.gov/biosample/docs/at... None None 1 2024-07-26 14:36:31.300251+00:00
74 5kwU Ethnicity human hancestro 3.0 False True Human Ancestry Ontology https://github.com/EBISPOT/hancestro/raw/3.0/h... 76dd9efda9c2abd4bc32fc57c0b755dd https://github.com/EBISPOT/hancestro None None 1 2024-07-26 14:36:31.300076+00:00
73 4hcb DevelopmentalStage mouse mmusdv 2020-03-10 False True Mouse Developmental Stages http://aber-owl.net/media/ontologies/MMUSDV/9/... 5bef72395d853c7f65450e6c2a1fc653 https://github.com/obophenotype/developmental-... None None 1 2024-07-26 14:36:31.299900+00:00
72 238S DevelopmentalStage human hsapdv 2020-03-10 False True Human Developmental Stages http://aber-owl.net/media/ontologies/HSAPDV/11... 52181d59df84578ed69214a5cb614036 https://github.com/obophenotype/developmental-... None None 1 2024-07-26 14:36:31.299725+00:00
71 1auD Drug all dron 2023-03-10 False False Drug Ontology https://data.bioontology.org/ontologies/DRON/s... 75e86011158fae76bb46d96662a33ba3 https://bioportal.bioontology.org/ontologies/DRON None None 1 2024-07-26 14:36:31.299547+00:00
70 4uDt Drug all dron 2024-03-02 False True Drug Ontology https://data.bioontology.org/ontologies/DRON/s... 84138459de4f65034e979f4e46783747 https://bioportal.bioontology.org/ontologies/DRON None None 1 2024-07-26 14:36:31.299369+00:00
69 5e83 BFXPipeline all lamin 1.0.0 False True Bioinformatics Pipeline s3://bionty-assets/bfxpipelines.json a7eff57a256994692fba46e0199ffc94 https://lamin.ai None None 1 2024-07-26 14:36:31.299187+00:00

Delete the test instance:

!rm -rf /Users/falexwolf/repos/redun-lamin/docs/test-redun-lamin
!lamin delete --force test-redun-lamin
Hide code cell output
Traceback (most recent call last):
  File "/opt/hostedtoolcache/Python/3.10.14/x64/bin/lamin", line 8, in <module>
    sys.exit(main())
  File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/rich_click/rich_command.py", line 367, in __call__
    return super().__call__(*args, **kwargs)
  File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/rich_click/rich_command.py", line 152, in main
    rv = self.invoke(ctx)
  File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/lamin_cli/__main__.py", line 105, in delete
    return delete(instance, force=force)
  File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/lamindb_setup/_delete.py", line 98, in delete
    n_objects = check_storage_is_empty(
  File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/lamindb_setup/core/upath.py", line 779, in check_storage_is_empty
    raise InstanceNotEmpty(message)
lamindb_setup.core.upath.InstanceNotEmpty: Storage /home/runner/work/redun-lamin/redun-lamin/docs/test-redun-lamin/.lamindb contains 7 objects ('_is_initialized' ignored) - delete them prior to deleting the instance
['/home/runner/work/redun-lamin/redun-lamin/docs/test-redun-lamin/.lamindb/Eq71OWB33IGtAEgfmJl7.fasta', '/home/runner/work/redun-lamin/redun-lamin/docs/test-redun-lamin/.lamindb/R3OdVuWFHYCDqx0voyd0.txt', '/home/runner/work/redun-lamin/redun-lamin/docs/test-redun-lamin/.lamindb/Rw3ujzMGrSBIz1YBbUdl.fasta', '/home/runner/work/redun-lamin/redun-lamin/docs/test-redun-lamin/.lamindb/Vt5zocplJcXDknyDW7PH.fasta', '/home/runner/work/redun-lamin/redun-lamin/docs/test-redun-lamin/.lamindb/_is_initialized', '/home/runner/work/redun-lamin/redun-lamin/docs/test-redun-lamin/.lamindb/pKDVBlJR1Il7VOIxhW01.txt', '/home/runner/work/redun-lamin/redun-lamin/docs/test-redun-lamin/.lamindb/s7ZDjpQvryR5jJwzucXg.py', '/home/runner/work/redun-lamin/redun-lamin/docs/test-redun-lamin/.lamindb/wnDG5zgjYc8gBrVNdI4D.fasta']