Jupyter Notebook

Redun

Here, we’ll see how to track redun workflow runs with LaminDB.

Note

This use case is based on github.com/ricomnl/bioinformatics-pipeline-tutorial.

!lamin init --storage .  --name redun-lamin-fasta --schema bionty
Hide code cell output
💡 connected lamindb: testuser1/redun-lamin-fasta

Register the workflow

import lamindb as ln
import json
💡 connected lamindb: testuser1/redun-lamin-fasta

Let’s amend a redun workflow.py to register input & output artifacts in LaminDB?

  • To track the workflow run in LaminDB, we added the following lines:

    # query & track the workflow run
    # (optional) pass params
    ln.track(params=params)
    
  • To register the output file via LaminDB, we added the following line:

    ln.Artifact(output_path, description="results").save()
    

Run redun

Let’s see what the input files are:

!ls ./fasta
KLF4.fasta  MYC.fasta  PO5F1.fasta  SOX2.fasta

And call the workflow:

!redun run workflow.py main --input-dir ./fasta --tag run=test-run  1> redun_stdout.txt 2>redun_stderr.txt

Inspect the output:

!cat redun_stdout.txt
💡 connected lamindb: testuser1/redun-lamin-fasta
💡 saved: Transform(uid='taasWKawCiNA6zf0', version='0.1.0', name='workflow.py', key='workflow.py', type='script', reference='https://github.com/laminlabs/redun-lamin-fasta/blob/44bfbe9463675c7dabdf1dfc1d572010ee4e7485/docs/workflow.py', reference_type='url', created_by_id=1, updated_at='2024-06-20 15:21:29 UTC')
💡 saved: Run(uid='WVPqUiCFCMRkrFhTHdc0', transform_id=1, created_by_id=1)
❗ this creates one artifact per file in the directory - you might simply call ln.Artifact(dir) to get one artifact for the entire directory
?25l
downloading... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% -:--:--
downloading... ━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  10% 0:00:01
downloading... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00
downloading... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00
?25h
File(path=/home/runner/work/redun-lamin-fasta/redun-lamin-fasta/docs/data/results.tgz, hash=19d58f7a)

And the error log:

!tail -1 redun_stderr.txt
[redun] Execution duration: 12.52 seconds

View data lineage:

artifact = ln.Artifact.filter(description="results", suffix=".tgz").one()
artifact.view_lineage()
_images/bddce1062fe442d66991fd06868edb126024572aac834b697cdf3b54234c2042.svg

Register the redun execution id

If we want to be able to query LaminDB for redun execution ID, this here is a way to get it:

# export the run information from redun
!redun log --exec --exec-tag run=test-run --format json --no-pager > redun_exec.json
# load the redun execution id from the JSON and store it in the LaminDB run record
with open("redun_exec.json", "r") as file:
    redun_exec = json.loads(file.readline())
artifact.run.reference = redun_exec["id"]
artifact.run.reference_type = "redun_id"
artifact.run.save()
Run(uid='WVPqUiCFCMRkrFhTHdc0', started_at='2024-06-20 15:21:29 UTC', finished_at='2024-06-20 15:21:39 UTC', is_consecutive=True, reference='be4743fe-dac1-4d99-9ef7-85a4bff4dd59', reference_type='redun_id', transform_id=1, created_by_id=1, environment_id=6)

Run report

Attach a run report:

report = ln.Artifact(
    "redun_stderr.txt",
    description=f"Redun run report of {redun_exec['id']}",
    run=False,
    visibility=0,
).save()
artifact.run.report = report
artifact.run.save()
Run(uid='WVPqUiCFCMRkrFhTHdc0', started_at='2024-06-20 15:21:29 UTC', finished_at='2024-06-20 15:21:39 UTC', is_consecutive=True, reference='be4743fe-dac1-4d99-9ef7-85a4bff4dd59', reference_type='redun_id', transform_id=1, created_by_id=1, report_id=8, environment_id=6)

View transforms and runs in LaminHub

https://lamin.ai/laminlabs/lamindata/transform/taasWKawCiNA6zf0

View the database content

ln.view()
****************
* module: core *
****************
Artifact
uid version description key suffix type accessor size hash hash_type n_objects n_observations visibility key_is_virtual storage_id transform_id run_id created_by_id updated_at
id
7 37qFq3tlhjwITKUtNwhD None results data/results.tgz .tgz dataset None 83498 3yHv_aNHgFxYqrX7vJmtHQ md5 None None 1 False 1 1.0 1.0 1 2024-06-20 15:21:41.472119+00:00
4 YkhxNf3p4hH3aUJOolnk None None fasta/MYC.fasta .fasta dataset None 536 WGbEtzPw-3bQEGcngO_pHQ md5 None None 1 False 1 NaN NaN 1 2024-06-20 15:21:30.014741+00:00
3 kaJrqGC8fk9JWi8zK3Uy None None fasta/KLF4.fasta .fasta dataset None 609 LyuoYkWs4SgYcH7P7JLJtA md5 None None 1 False 1 NaN NaN 1 2024-06-20 15:21:30.014161+00:00
2 bd8nt8jZYZjLJR2KoxDA None None fasta/SOX2.fasta .fasta dataset None 414 C5q_yaFXGk4SAEpfdqBwnQ md5 None None 1 False 1 NaN NaN 1 2024-06-20 15:21:30.013360+00:00
1 5ooVQ2g9CPa8AKEsfotJ None None fasta/PO5F1.fasta .fasta dataset None 477 -7iJgveFO9ia0wE1bqVu6g md5 None None 1 False 1 NaN NaN 1 2024-06-20 15:21:30.012147+00:00
Run
uid started_at finished_at is_consecutive reference reference_type transform_id report_id environment_id created_by_id
id
1 WVPqUiCFCMRkrFhTHdc0 2024-06-20 15:21:29.308029+00:00 2024-06-20 15:21:39.474088+00:00 True be4743fe-dac1-4d99-9ef7-85a4bff4dd59 redun_id 1 8 6 1
Storage
uid root description type region instance_uid run_id created_by_id updated_at
id
1 cvEGObL5jXCA /home/runner/work/redun-lamin-fasta/redun-lami... None local None 8SgWe7slTFKk None 1 2024-06-20 15:21:19.765106+00:00
Transform
uid version name key description type reference reference_type latest_report_id source_code_id created_by_id updated_at
id
1 taasWKawCiNA6zf0 0.1.0 workflow.py workflow.py None script https://github.com/laminlabs/redun-lamin-fasta... url None 5 1 2024-06-20 15:21:39.476212+00:00
ULabel
uid name description reference reference_type run_id created_by_id updated_at
id
1 uLlNktEB redun None None None 1 1 2024-06-20 15:21:29.991112+00:00
User
uid handle name updated_at
id
1 DzTjkKse testuser1 Test User1 2024-06-20 15:21:19.759578+00:00
******************
* module: bionty *
******************
Organism
uid name ontology_id scientific_name public_source_id run_id created_by_id updated_at
id
1 1dpCL6Td human NCBITaxon:9606 homo_sapiens 1 1 1 2024-06-20 15:21:31.066972+00:00
Protein
uid name uniprotkb_id synonyms length gene_symbol ensembl_gene_ids organism_id public_source_id run_id created_by_id updated_at
id
4 36jnmKHdiT9m MYC_HUMAN Myc proto-oncogene protein P01106 454 MYC ENST00000377970.6 [P01106-1]|ENST00000524013.2... 1 22 1 1 2024-06-20 15:21:39.454070+00:00
3 6ThKerPbf6DR KLF4_HUMAN Krueppel-like factor 4 O43474 513 KLF4 ENST00000374672.5 [O43474-1] 1 22 1 1 2024-06-20 15:21:37.504092+00:00
2 38rbzWPtKmb2 SOX2_HUMAN Transcription factor SOX-2 P48431 317 SOX2 ENST00000325404.3 1 22 1 1 2024-06-20 15:21:35.503591+00:00
1 3qNrC4hwnDC9 PO5F1_HUMAN POU domain, class 5, transcription... Q01860 transcription factor 1 (Octamer-binding protei... 360 POU5F1 ENST00000259915.13 [Q01860-1]|ENST00000376243.... 1 22 1 1 2024-06-20 15:21:33.491776+00:00
PublicSource
uid entity organism currently_used source source_name version url md5 source_website run_id created_by_id updated_at
id
73 5JnV BioSample all True ncbi NCBI BioSample attributes 2023-09 s3://bionty-assets/df_all__ncbi__2023-09__BioS... 918db9bd1734b97c596c67d9654a4126 https://www.ncbi.nlm.nih.gov/biosample/docs/at... None 1 2024-06-20 15:21:19.881173+00:00
72 3Tlc Ethnicity human True hancestro Human Ancestry Ontology 3.0 https://github.com/EBISPOT/hancestro/raw/3.0/h... 76dd9efda9c2abd4bc32fc57c0b755dd https://github.com/EBISPOT/hancestro None 1 2024-06-20 15:21:19.880998+00:00
71 16tR DevelopmentalStage mouse True mmusdv Mouse Developmental Stages 2020-03-10 http://aber-owl.net/media/ontologies/MMUSDV/9/... 5bef72395d853c7f65450e6c2a1fc653 https://github.com/obophenotype/developmental-... None 1 2024-06-20 15:21:19.877513+00:00
70 7CRn DevelopmentalStage human True hsapdv Human Developmental Stages 2020-03-10 http://aber-owl.net/media/ontologies/HSAPDV/11... 52181d59df84578ed69214a5cb614036 https://github.com/obophenotype/developmental-... None 1 2024-06-20 15:21:19.877350+00:00
69 3TI0 Drug all False dron Drug Ontology 2023-03-10 https://data.bioontology.org/ontologies/DRON/s... 75e86011158fae76bb46d96662a33ba3 https://bioportal.bioontology.org/ontologies/DRON None 1 2024-06-20 15:21:19.877188+00:00
68 5alK Drug all True dron Drug Ontology 2024-03-02 https://data.bioontology.org/ontologies/DRON/s... 84138459de4f65034e979f4e46783747 https://bioportal.bioontology.org/ontologies/DRON None 1 2024-06-20 15:21:19.877026+00:00
67 3rm9 BFXPipeline all True lamin Bioinformatics Pipeline 1.0.0 s3://bionty-assets/bfxpipelines.json a7eff57a256994692fba46e0199ffc94 https://lamin.ai None 1 2024-06-20 15:21:19.876862+00:00

Delete the test instance:

!lamin delete --force redun-lamin-fasta
Hide code cell output
Traceback (most recent call last):
  File "/opt/hostedtoolcache/Python/3.10.14/x64/bin/lamin", line 8, in <module>
    sys.exit(main())
  File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/rich_click/rich_command.py", line 367, in __call__
    return super().__call__(*args, **kwargs)
  File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/rich_click/rich_command.py", line 152, in main
    rv = self.invoke(ctx)
  File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/lamin_cli/__main__.py", line 103, in delete
    return delete(instance, force=force)
  File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/lamindb_setup/_delete.py", line 98, in delete
    n_objects = check_storage_is_empty(
  File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/lamindb_setup/core/upath.py", line 779, in check_storage_is_empty
    raise InstanceNotEmpty(message)
lamindb_setup.core.upath.InstanceNotEmpty: Storage /home/runner/work/redun-lamin-fasta/redun-lamin-fasta/docs/.lamindb contains 1 objects ('_is_initialized' ignored) - delete them prior to deleting the instance
['/home/runner/work/redun-lamin-fasta/redun-lamin-fasta/docs/.lamindb/2VBFNU1jF6FRCeAQbdIp.txt', '/home/runner/work/redun-lamin-fasta/redun-lamin-fasta/docs/.lamindb/_is_initialized']