Snakemake¶
Snakemake is a workflow management system used for executing scientific workflows across platforms scalably, portably, and reproducibly.
Here, we’ll run snakemake-workflows/rna-seq-star-deseq2 to perform differential gene expression analysis with STAR and deseq2 (reference).
Setup¶
Let’s create a test instance:
!lamin init --storage . --name snakemake-bulkrna
✅ saved: User(id='DzTjkKse', handle='testuser1', email='[email protected]', name='Test User1', updated_at=2023-09-20 21:53:07)
✅ saved: Storage(id='t6Qgo37v', root='/home/runner/work/snakemake-lamin-usecases/snakemake-lamin-usecases/docs', type='local', updated_at=2023-09-20 21:53:07, created_by_id='DzTjkKse')
💡 loaded instance: testuser1/snakemake-bulkrna
💡 did not register local instance on hub (if you want, call `lamin register`)
import lamindb as ln
💡 loaded instance: testuser1/snakemake-bulkrna (lamindb 0.54.0)
Download test data¶
The Snakemake pipeline comes with test data. Therefore, we clone the whole pipeline using git:
!git clone https://github.com/snakemake-workflows/rna-seq-star-deseq2 --single-branch --branch v2.0.0
Show code cell output
Cloning into 'rna-seq-star-deseq2'...
remote: Enumerating objects: 759, done.
remote: Counting objects: 0% (1/151)
remote: Counting objects: 1% (2/151)
remote: Counting objects: 2% (4/151)
remote: Counting objects: 3% (5/151)
remote: Counting objects: 4% (7/151)
remote: Counting objects: 5% (8/151)
remote: Counting objects: 6% (10/151)
remote: Counting objects: 7% (11/151)
remote: Counting objects: 8% (13/151)
remote: Counting objects: 9% (14/151)
remote: Counting objects: 10% (16/151)
remote: Counting objects: 11% (17/151)
remote: Counting objects: 12% (19/151)
remote: Counting objects: 13% (20/151)
remote: Counting objects: 14% (22/151)
remote: Counting objects: 15% (23/151)
remote: Counting objects: 16% (25/151)
remote: Counting objects: 17% (26/151)
remote: Counting objects: 18% (28/151)
remote: Counting objects: 19% (29/151)
remote: Counting objects: 20% (31/151)
remote: Counting objects: 21% (32/151)
remote: Counting objects: 22% (34/151)
remote: Counting objects: 23% (35/151)
remote: Counting objects: 24% (37/151)
remote: Counting objects: 25% (38/151)
remote: Counting objects: 26% (40/151)
remote: Counting objects: 27% (41/151)
remote: Counting objects: 28% (43/151)
remote: Counting objects: 29% (44/151)
remote: Counting objects: 30% (46/151)
remote: Counting objects: 31% (47/151)
remote: Counting objects: 32% (49/151)
remote: Counting objects: 33% (50/151)
remote: Counting objects: 34% (52/151)
remote: Counting objects: 35% (53/151)
remote: Counting objects: 36% (55/151)
remote: Counting objects: 37% (56/151)
remote: Counting objects: 38% (58/151)
remote: Counting objects: 39% (59/151)
remote: Counting objects: 40% (61/151)
remote: Counting objects: 41% (62/151)
remote: Counting objects: 42% (64/151)
remote: Counting objects: 43% (65/151)
remote: Counting objects: 44% (67/151)
remote: Counting objects: 45% (68/151)
remote: Counting objects: 46% (70/151)
remote: Counting objects: 47% (71/151)
remote: Counting objects: 48% (73/151)
remote: Counting objects: 49% (74/151)
remote: Counting objects: 50% (76/151)
remote: Counting objects: 51% (78/151)
remote: Counting objects: 52% (79/151)
remote: Counting objects: 53% (81/151)
remote: Counting objects: 54% (82/151)
remote: Counting objects: 55% (84/151)
remote: Counting objects: 56% (85/151)
remote: Counting objects: 57% (87/151)
remote: Counting objects: 58% (88/151)
remote: Counting objects: 59% (90/151)
remote: Counting objects: 60% (91/151)
remote: Counting objects: 61% (93/151)
remote: Counting objects: 62% (94/151)
remote: Counting objects: 63% (96/151)
remote: Counting objects: 64% (97/151)
remote: Counting objects: 65% (99/151)
remote: Counting objects: 66% (100/151)
remote: Counting objects: 67% (102/151)
remote: Counting objects: 68% (103/151)
remote: Counting objects: 69% (105/151)
remote: Counting objects: 70% (106/151)
remote: Counting objects: 71% (108/151)
remote: Counting objects: 72% (109/151)
remote: Counting objects: 73% (111/151)
remote: Counting objects: 74% (112/151)
remote: Counting objects: 75% (114/151)
remote: Counting objects: 76% (115/151)
remote: Counting objects: 77% (117/151)
remote: Counting objects: 78% (118/151)
remote: Counting objects: 79% (120/151)
remote: Counting objects: 80% (121/151)
remote: Counting objects: 81% (123/151)
remote: Counting objects: 82% (124/151)
remote: Counting objects: 83% (126/151)
remote: Counting objects: 84% (127/151)
remote: Counting objects: 85% (129/151)
remote: Counting objects: 86% (130/151)
remote: Counting objects: 87% (132/151)
remote: Counting objects: 88% (133/151)
remote: Counting objects: 89% (135/151)
remote: Counting objects: 90% (136/151)
remote: Counting objects: 91% (138/151)
remote: Counting objects: 92% (139/151)
remote: Counting objects: 93% (141/151)
remote: Counting objects: 94% (142/151)
remote: Counting objects: 95% (144/151)
remote: Counting objects: 96% (145/151)
remote: Counting objects: 97% (147/151)
remote: Counting objects: 98% (148/151)
remote: Counting objects: 99% (150/151)
remote: Counting objects: 100% (151/151)
remote: Counting objects: 100% (151/151), done.
remote: Compressing objects: 1% (1/92)
remote: Compressing objects: 2% (2/92)
remote: Compressing objects: 3% (3/92)
remote: Compressing objects: 4% (4/92)
remote: Compressing objects: 5% (5/92)
remote: Compressing objects: 6% (6/92)
remote: Compressing objects: 7% (7/92)
remote: Compressing objects: 8% (8/92)
remote: Compressing objects: 9% (9/92)
remote: Compressing objects: 10% (10/92)
remote: Compressing objects: 11% (11/92)
remote: Compressing objects: 13% (12/92)
remote: Compressing objects: 14% (13/92)
remote: Compressing objects: 15% (14/92)
remote: Compressing objects: 16% (15/92)
remote: Compressing objects: 17% (16/92)
remote: Compressing objects: 18% (17/92)
remote: Compressing objects: 19% (18/92)
remote: Compressing objects: 20% (19/92)
remote: Compressing objects: 21% (20/92)
remote: Compressing objects: 22% (21/92)
remote: Compressing objects: 23% (22/92)
remote: Compressing objects: 25% (23/92)
remote: Compressing objects: 26% (24/92)
remote: Compressing objects: 27% (25/92)
remote: Compressing objects: 28% (26/92)
remote: Compressing objects: 29% (27/92)
remote: Compressing objects: 30% (28/92)
remote: Compressing objects: 31% (29/92)
remote: Compressing objects: 32% (30/92)
remote: Compressing objects: 33% (31/92)
remote: Compressing objects: 34% (32/92)
remote: Compressing objects: 35% (33/92)
remote: Compressing objects: 36% (34/92)
remote: Compressing objects: 38% (35/92)
remote: Compressing objects: 39% (36/92)
remote: Compressing objects: 40% (37/92)
remote: Compressing objects: 41% (38/92)
remote: Compressing objects: 42% (39/92)
remote: Compressing objects: 43% (40/92)
remote: Compressing objects: 44% (41/92)
remote: Compressing objects: 45% (42/92)
remote: Compressing objects: 46% (43/92)
remote: Compressing objects: 47% (44/92)
remote: Compressing objects: 48% (45/92)
remote: Compressing objects: 50% (46/92)
remote: Compressing objects: 51% (47/92)
remote: Compressing objects: 52% (48/92)
remote: Compressing objects: 53% (49/92)
remote: Compressing objects: 54% (50/92)
remote: Compressing objects: 55% (51/92)
remote: Compressing objects: 56% (52/92)
remote: Compressing objects: 57% (53/92)
remote: Compressing objects: 58% (54/92)
remote: Compressing objects: 59% (55/92)
remote: Compressing objects: 60% (56/92)
remote: Compressing objects: 61% (57/92)
remote: Compressing objects: 63% (58/92)
remote: Compressing objects: 64% (59/92)
remote: Compressing objects: 65% (60/92)
remote: Compressing objects: 66% (61/92)
remote: Compressing objects: 67% (62/92)
remote: Compressing objects: 68% (63/92)
remote: Compressing objects: 69% (64/92)
remote: Compressing objects: 70% (65/92)
remote: Compressing objects: 71% (66/92)
remote: Compressing objects: 72% (67/92)
remote: Compressing objects: 73% (68/92)
remote: Compressing objects: 75% (69/92)
remote: Compressing objects: 76% (70/92)
remote: Compressing objects: 77% (71/92)
remote: Compressing objects: 78% (72/92)
remote: Compressing objects: 79% (73/92)
remote: Compressing objects: 80% (74/92)
remote: Compressing objects: 81% (75/92)
remote: Compressing objects: 82% (76/92)
remote: Compressing objects: 83% (77/92)
remote: Compressing objects: 84% (78/92)
remote: Compressing objects: 85% (79/92)
remote: Compressing objects: 86% (80/92)
remote: Compressing objects: 88% (81/92)
remote: Compressing objects: 89% (82/92)
remote: Compressing objects: 90% (83/92)
remote: Compressing objects: 91% (84/92)
remote: Compressing objects: 92% (85/92)
remote: Compressing objects: 93% (86/92)
remote: Compressing objects: 94% (87/92)
remote: Compressing objects: 95% (88/92)
remote: Compressing objects: 96% (89/92)
remote: Compressing objects: 97% (90/92)
remote: Compressing objects: 98% (91/92)
remote: Compressing objects: 100% (92/92)
remote: Compressing objects: 100% (92/92), done.
Receiving objects: 0% (1/759)
Receiving objects: 1% (8/759)
Receiving objects: 2% (16/759)
Receiving objects: 3% (23/759)
Receiving objects: 4% (31/759)
Receiving objects: 5% (38/759)
Receiving objects: 6% (46/759)
Receiving objects: 7% (54/759)
Receiving objects: 8% (61/759)
Receiving objects: 9% (69/759)
Receiving objects: 10% (76/759)
Receiving objects: 11% (84/759)
Receiving objects: 12% (92/759)
Receiving objects: 13% (99/759)
Receiving objects: 14% (107/759)
Receiving objects: 15% (114/759)
Receiving objects: 16% (122/759)
Receiving objects: 17% (130/759)
Receiving objects: 18% (137/759)
Receiving objects: 19% (145/759)
Receiving objects: 20% (152/759)
Receiving objects: 21% (160/759)
Receiving objects: 22% (167/759)
Receiving objects: 23% (175/759)
Receiving objects: 24% (183/759)
Receiving objects: 25% (190/759)
Receiving objects: 26% (198/759)
Receiving objects: 27% (205/759)
Receiving objects: 28% (213/759)
Receiving objects: 29% (221/759)
Receiving objects: 30% (228/759)
Receiving objects: 31% (236/759)
Receiving objects: 32% (243/759)
Receiving objects: 33% (251/759)
Receiving objects: 34% (259/759)
Receiving objects: 35% (266/759)
Receiving objects: 36% (274/759)
Receiving objects: 37% (281/759)
Receiving objects: 38% (289/759)
Receiving objects: 39% (297/759)
Receiving objects: 40% (304/759)
Receiving objects: 41% (312/759)
Receiving objects: 42% (319/759)
Receiving objects: 43% (327/759)
Receiving objects: 44% (334/759)
Receiving objects: 45% (342/759)
Receiving objects: 46% (350/759)
Receiving objects: 47% (357/759)
Receiving objects: 48% (365/759)
Receiving objects: 49% (372/759)
Receiving objects: 50% (380/759)
Receiving objects: 51% (388/759)
Receiving objects: 52% (395/759)
Receiving objects: 53% (403/759)
Receiving objects: 54% (410/759)
Receiving objects: 55% (418/759)
Receiving objects: 56% (426/759)
Receiving objects: 57% (433/759)
Receiving objects: 58% (441/759)
Receiving objects: 59% (448/759)
Receiving objects: 60% (456/759)
Receiving objects: 61% (463/759)
Receiving objects: 62% (471/759)
Receiving objects: 63% (479/759)
Receiving objects: 64% (486/759)
Receiving objects: 65% (494/759)
Receiving objects: 66% (501/759)
Receiving objects: 67% (509/759)
Receiving objects: 68% (517/759)
Receiving objects: 69% (524/759)
Receiving objects: 70% (532/759)
Receiving objects: 71% (539/759)
Receiving objects: 72% (547/759)
remote: Total 759 (delta 68), reused 105 (delta 52), pack-reused 608
Receiving objects: 73% (555/759)
Receiving objects: 74% (562/759)
Receiving objects: 75% (570/759)
Receiving objects: 76% (577/759)
Receiving objects: 77% (585/759)
Receiving objects: 78% (593/759)
Receiving objects: 79% (600/759)
Receiving objects: 80% (608/759)
Receiving objects: 81% (615/759)
Receiving objects: 82% (623/759)
Receiving objects: 83% (630/759)
Receiving objects: 84% (638/759)
Receiving objects: 85% (646/759)
Receiving objects: 86% (653/759)
Receiving objects: 87% (661/759)
Receiving objects: 88% (668/759)
Receiving objects: 89% (676/759)
Receiving objects: 90% (684/759)
Receiving objects: 91% (691/759)
Receiving objects: 92% (699/759)
Receiving objects: 93% (706/759)
Receiving objects: 94% (714/759)
Receiving objects: 95% (722/759)
Receiving objects: 96% (729/759)
Receiving objects: 97% (737/759)
Receiving objects: 98% (744/759)
Receiving objects: 99% (752/759)
Receiving objects: 100% (759/759)
Receiving objects: 100% (759/759), 16.95 MiB | 34.16 MiB/s, done.
Resolving deltas: 0% (0/379)
Resolving deltas: 1% (4/379)
Resolving deltas: 2% (8/379)
Resolving deltas: 3% (12/379)
Resolving deltas: 4% (16/379)
Resolving deltas: 5% (20/379)
Resolving deltas: 6% (23/379)
Resolving deltas: 7% (27/379)
Resolving deltas: 8% (31/379)
Resolving deltas: 9% (35/379)
Resolving deltas: 10% (38/379)
Resolving deltas: 11% (42/379)
Resolving deltas: 12% (46/379)
Resolving deltas: 13% (50/379)
Resolving deltas: 14% (54/379)
Resolving deltas: 15% (57/379)
Resolving deltas: 16% (61/379)
Resolving deltas: 17% (65/379)
Resolving deltas: 18% (70/379)
Resolving deltas: 19% (74/379)
Resolving deltas: 20% (76/379)
Resolving deltas: 21% (80/379)
Resolving deltas: 22% (84/379)
Resolving deltas: 23% (88/379)
Resolving deltas: 24% (91/379)
Resolving deltas: 25% (95/379)
Resolving deltas: 26% (99/379)
Resolving deltas: 27% (103/379)
Resolving deltas: 28% (107/379)
Resolving deltas: 29% (110/379)
Resolving deltas: 30% (114/379)
Resolving deltas: 31% (118/379)
Resolving deltas: 32% (122/379)
Resolving deltas: 33% (126/379)
Resolving deltas: 34% (129/379)
Resolving deltas: 35% (133/379)
Resolving deltas: 36% (137/379)
Resolving deltas: 37% (141/379)
Resolving deltas: 38% (145/379)
Resolving deltas: 39% (148/379)
Resolving deltas: 40% (152/379)
Resolving deltas: 41% (156/379)
Resolving deltas: 42% (160/379)
Resolving deltas: 43% (163/379)
Resolving deltas: 44% (167/379)
Resolving deltas: 45% (171/379)
Resolving deltas: 46% (175/379)
Resolving deltas: 47% (179/379)
Resolving deltas: 48% (182/379)
Resolving deltas: 49% (186/379)
Resolving deltas: 50% (190/379)
Resolving deltas: 51% (194/379)
Resolving deltas: 52% (198/379)
Resolving deltas: 53% (201/379)
Resolving deltas: 54% (205/379)
Resolving deltas: 55% (209/379)
Resolving deltas: 56% (213/379)
Resolving deltas: 57% (217/379)
Resolving deltas: 58% (220/379)
Resolving deltas: 59% (224/379)
Resolving deltas: 60% (228/379)
Resolving deltas: 61% (232/379)
Resolving deltas: 62% (235/379)
Resolving deltas: 63% (239/379)
Resolving deltas: 64% (243/379)
Resolving deltas: 65% (247/379)
Resolving deltas: 66% (251/379)
Resolving deltas: 67% (254/379)
Resolving deltas: 68% (258/379)
Resolving deltas: 69% (262/379)
Resolving deltas: 70% (266/379)
Resolving deltas: 71% (270/379)
Resolving deltas: 72% (273/379)
Resolving deltas: 73% (277/379)
Resolving deltas: 74% (281/379)
Resolving deltas: 75% (285/379)
Resolving deltas: 76% (289/379)
Resolving deltas: 77% (292/379)
Resolving deltas: 78% (296/379)
Resolving deltas: 79% (300/379)
Resolving deltas: 80% (304/379)
Resolving deltas: 81% (307/379)
Resolving deltas: 82% (311/379)
Resolving deltas: 83% (315/379)
Resolving deltas: 84% (319/379)
Resolving deltas: 85% (323/379)
Resolving deltas: 86% (326/379)
Resolving deltas: 87% (330/379)
Resolving deltas: 88% (334/379)
Resolving deltas: 89% (338/379)
Resolving deltas: 90% (342/379)
Resolving deltas: 91% (345/379)
Resolving deltas: 92% (349/379)
Resolving deltas: 93% (353/379)
Resolving deltas: 94% (357/379)
Resolving deltas: 95% (361/379)
Resolving deltas: 96% (364/379)
Resolving deltas: 97% (368/379)
Resolving deltas: 98% (372/379)
Resolving deltas: 99% (376/379)
Resolving deltas: 100% (379/379)
Resolving deltas: 100% (379/379), done.
Note: switching to 'e103c1cc78feba97cc3cebe8d7f2a51c8958ab96'.
You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by switching back to a branch.
If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -c with the switch command. Example:
git switch -c <new-branch-name>
Or undo this operation with:
git switch -
Turn off this advice by setting config variable advice.detachedHead to false
root_dir = "rna-seq-star-deseq2"
Track the download:
download = ln.Transform(name="Download")
download_url = "https://github.com/snakemake-workflows/rna-seq-star-deseq2"
# create global run containing the download_url
ln.track(download, reference=download_url, reference_type="url")
💡 Transform(id='hCfEkAZ94EeCNt', name='Download', type=notebook, updated_at=2023-09-20 21:53:10, created_by_id='DzTjkKse')
💡 Run(id='XltwcXAZ7H4twaCujeqX', run_at=2023-09-20 21:53:10, reference='https://github.com/snakemake-workflows/rna-seq-star-deseq2', reference_type='url', transform_id='hCfEkAZ94EeCNt', created_by_id='DzTjkKse')
Register input files - they’ll automatically be linked against the download run:
sample_sheet = ln.File(f"{root_dir}/.test/config_basic/samples.tsv")
ln.save(sample_sheet)
input_fastqs = ln.File.from_dir(f"{root_dir}/.test/ngs-test-data/reads/")
ln.save(input_fastqs)
Show code cell output
❗ file has more than one suffix (path.suffixes), using only last suffix: '.fq'
❗ file has more than one suffix (path.suffixes), using only last suffix: '.fq'
❗ file has more than one suffix (path.suffixes), using only last suffix: '.fq'
❗ file has more than one suffix (path.suffixes), using only last suffix: '.fq'
❗ file has more than one suffix (path.suffixes), using only last suffix: '.fq'
❗ file has more than one suffix (path.suffixes), using only last suffix: '.fq'
❗ file has more than one suffix (path.suffixes), using only last suffix: '.fq'
❗ file has more than one suffix (path.suffixes), using only last suffix: '.fq'
❗ file has more than one suffix (path.suffixes), using only last suffix: '.fq'
❗ file has more than one suffix (path.suffixes), using only last suffix: '.fq'
❗ there are different file ids with the same hashes, dropping 2 duplicates out of 10 files:
File(id='giidtfhayfrI68utcbKw', key='rna-seq-star-deseq2/.test/ngs-test-data/reads/a.scerevisiae.2.fq', suffix='.fq', size=2218894, hash='B4hGJwLbEtEx8GCEdcvjgA', hash_type='md5', storage_id='t6Qgo37v', transform_id='hCfEkAZ94EeCNt', run_id='XltwcXAZ7H4twaCujeqX', created_by_id='DzTjkKse')
File(id='lvWwIPa0ZMhEiiqbIwRW', key='rna-seq-star-deseq2/.test/ngs-test-data/reads/b.scerevisiae.1.fq', suffix='.fq', size=2218894, hash='DqfThx982Ai4akCcx-HikA', hash_type='md5', storage_id='t6Qgo37v', transform_id='hCfEkAZ94EeCNt', run_id='XltwcXAZ7H4twaCujeqX', created_by_id='DzTjkKse')
Visualize data lineage for one of the files:
sample_sheet.view_flow()
Track Snakemake run¶
(We’d start here if input files were tracked in the cloud with LaminDB rather than downloaded through git.)
Track the Snakemake workflow & run:
transform = ln.Transform(
name="snakemake-workflows/rna-seq-star-deseq2",
version="2.0.0",
type="pipeline",
reference="https://github.com/laminlabs/snakemake-lamin-usecases",
)
ln.track(transform)
run = ln.dev.run_context.run # let's grab the global run record
💡 Transform(id='0wOgM2TBtkMV6M', name='snakemake-workflows/rna-seq-star-deseq2', version='2.0.0', type='pipeline', reference='https://github.com/laminlabs/snakemake-lamin-usecases', updated_at=2023-09-20 21:53:10, created_by_id='DzTjkKse')
💡 Run(id='O0aXDnvbt4EHLqbc0RO6', run_at=2023-09-20 21:53:10, transform_id='0wOgM2TBtkMV6M', created_by_id='DzTjkKse')
If we now stage input files, they’ll be tracked as run inputs.
(In this test case, data is already locally available and staging won’t download anything.)
input_sample_sheet_path = sample_sheet.stage()
input_paths = [input_fastq.stage() for input_fastq in input_fastqs]
All data is now locally available, and we can run the snakemake pipeline:
!snakemake \
--directory rna-seq-star-deseq2/.test \
--snakefile rna-seq-star-deseq2/workflow/Snakefile \
--configfile rna-seq-star-deseq2/.test/config_basic/config.yaml \
--use-conda \
--show-failed-logs \
--cores 2 \
--conda-frontend conda \
--conda-cleanup-pkgs cache
Show code cell output
Workflow defines that rule get_genome is eligible for caching between workflows (use the --cache argument to enable this).
Workflow defines that rule get_annotation is eligible for caching between workflows (use the --cache argument to enable this).
Workflow defines that rule genome_faidx is eligible for caching between workflows (use the --cache argument to enable this).
Workflow defines that rule bwa_index is eligible for caching between workflows (use the --cache argument to enable this).
Workflow defines that rule star_index is eligible for caching between workflows (use the --cache argument to enable this).
Building DAG of jobs...
Your conda installation is not configured to use strict channel priorities. This is however crucial for having robust and correct environments (for details, see https://conda-forge.org/docs/user/tipsandtricks.html). Please consider to configure strict priorities by executing 'conda config --set channel_priority strict'.
Creating conda environment ../workflow/envs/deseq2.yaml...
Downloading and installing remote packages.
Cleaning up conda package tarballs and package cache.
Environment for /home/runner/work/snakemake-lamin-usecases/snakemake-lamin-usecases/docs/rna-seq-star-deseq2/workflow/rules/../envs/deseq2.yaml created (location: .snakemake/conda/97c4b09bad6bfa4ada6fe14151f3ec1c_)
Creating conda environment ../workflow/envs/biomart.yaml...
Downloading and installing remote packages.
Cleaning up conda package tarballs and package cache.
Environment for /home/runner/work/snakemake-lamin-usecases/snakemake-lamin-usecases/docs/rna-seq-star-deseq2/workflow/rules/../envs/biomart.yaml created (location: .snakemake/conda/c39789d0da717f333b9d416fae158cd7_)
Creating conda environment ../workflow/envs/gffutils.yaml...
Downloading and installing remote packages.
Cleaning up conda package tarballs and package cache.
Environment for /home/runner/work/snakemake-lamin-usecases/snakemake-lamin-usecases/docs/rna-seq-star-deseq2/workflow/rules/../envs/gffutils.yaml created (location: .snakemake/conda/5adf00fcbc983f732eb9e031c62ebc03_)
Creating conda environment ../workflow/envs/pandas.yaml...
Downloading and installing remote packages.
Cleaning up conda package tarballs and package cache.
Environment for /home/runner/work/snakemake-lamin-usecases/snakemake-lamin-usecases/docs/rna-seq-star-deseq2/workflow/rules/../envs/pandas.yaml created (location: .snakemake/conda/45752da70eae001d2312f5cc28f0893b_)
Creating conda environment https://github.com/snakemake/snakemake-wrappers/raw/v1.21.4/bio/star/index/environment.yaml...
Downloading and installing remote packages.
Cleaning up conda package tarballs and package cache.
Environment for https://github.com/snakemake/snakemake-wrappers/raw/v1.21.4/bio/star/index/environment.yaml created (location: .snakemake/conda/d02df01c1a926fffbd5dd7a2048236cd_)
Creating conda environment https://github.com/snakemake/snakemake-wrappers/raw/v1.21.4/bio/cutadapt/pe/environment.yaml...
Downloading and installing remote packages.
Cleaning up conda package tarballs and package cache.
Environment for https://github.com/snakemake/snakemake-wrappers/raw/v1.21.4/bio/cutadapt/pe/environment.yaml created (location: .snakemake/conda/686f2b5813d0bc7102134b5e3825944b_)
Creating conda environment https://github.com/snakemake/snakemake-wrappers/raw/v1.21.4/bio/star/align/environment.yaml...
Downloading and installing remote packages.
Cleaning up conda package tarballs and package cache.
Environment for https://github.com/snakemake/snakemake-wrappers/raw/v1.21.4/bio/star/align/environment.yaml created (location: .snakemake/conda/8c561395ac76b6c2cbace6e1f2e26ae4_)
Creating conda environment https://github.com/snakemake/snakemake-wrappers/raw/v1.21.4/bio/reference/ensembl-annotation/environment.yaml...
Downloading and installing remote packages.
Cleaning up conda package tarballs and package cache.
Environment for https://github.com/snakemake/snakemake-wrappers/raw/v1.21.4/bio/reference/ensembl-annotation/environment.yaml created (location: .snakemake/conda/b989f3f8888314c661109a11e4951d06_)
Creating conda environment https://github.com/snakemake/snakemake-wrappers/raw/v1.21.4/bio/multiqc/environment.yaml...
Downloading and installing remote packages.
Cleaning up conda package tarballs and package cache.
Environment for https://github.com/snakemake/snakemake-wrappers/raw/v1.21.4/bio/multiqc/environment.yaml created (location: .snakemake/conda/fdd167354aa72861c97737721eeeb361_)
Creating conda environment ../workflow/envs/rseqc.yaml...
Downloading and installing remote packages.
Cleaning up conda package tarballs and package cache.
Environment for /home/runner/work/snakemake-lamin-usecases/snakemake-lamin-usecases/docs/rna-seq-star-deseq2/workflow/rules/../envs/rseqc.yaml created (location: .snakemake/conda/f4b389ae96894c64dac5fdb184105ba7_)
Using shell: /usr/bin/bash
Provided cores: 2
Rules claiming more threads will be scaled down.
Singularity containers: ignored
Job stats:
job count
------------------------- -------
align 4
all 1
count_matrix 1
cutadapt_pe 4
cutadapt_pipe 8
deseq2 1
deseq2_init 1
gene_2_symbol 3
get_annotation 1
get_genome 1
multiqc 1
pca 1
rseqc_gtf2bed 1
rseqc_infer 4
rseqc_innerdis 4
rseqc_junction_annotation 4
rseqc_junction_saturation 4
rseqc_readdis 4
rseqc_readdup 4
rseqc_readgc 4
rseqc_stat 4
star_index 1
total 61
Select jobs to execute...
[Wed Sep 20 22:13:01 2023]
group job 7c81e762-1a4a-4e53-a41e-3fb22085eb04 (jobs in lexicogr. order):
[Wed Sep 20 22:13:01 2023]
rule cutadapt_pe:
input: pipe/cutadapt/B2/1.fq1.fastq, pipe/cutadapt/B2/1.fq2.fastq
output: results/trimmed/B2_1_R1.fastq.gz, results/trimmed/B2_1_R2.fastq.gz, results/trimmed/B2_1.paired.qc.txt
log: logs/cutadapt/B2_1.log
jobid: 21
reason: Missing output files: results/trimmed/B2_1_R1.fastq.gz, results/trimmed/B2_1_R2.fastq.gz; Input files updated by another job: pipe/cutadapt/B2/1.fq2.fastq, pipe/cutadapt/B2/1.fq1.fastq
wildcards: sample=B2, unit=1
threads: 2
resources: tmpdir=/tmp
[Wed Sep 20 22:13:01 2023]
rule cutadapt_pipe:
input: ngs-test-data/reads/b.scerevisiae.1.fq
output: pipe/cutadapt/B2/1.fq1.fastq (pipe)
log: logs/pipe-fastqs/catadapt/B2_1.fq1.fastq.log
jobid: 22
reason: Missing output files: pipe/cutadapt/B2/1.fq1.fastq
wildcards: sample=B2, unit=1, fq=fq1, ext=fastq
threads: 0
resources: tmpdir=/tmp
[Wed Sep 20 22:13:01 2023]
rule cutadapt_pipe:
input: ngs-test-data/reads/b.scerevisiae.2.fq
output: pipe/cutadapt/B2/1.fq2.fastq (pipe)
log: logs/pipe-fastqs/catadapt/B2_1.fq2.fastq.log
jobid: 23
reason: Missing output files: pipe/cutadapt/B2/1.fq2.fastq
wildcards: sample=B2, unit=1, fq=fq2, ext=fastq
threads: 0
resources: tmpdir=/tmp
Activating conda environment: .snakemake/conda/686f2b5813d0bc7102134b5e3825944b_
Activating conda environment: .snakemake/conda/686f2b5813d0bc7102134b5e3825944b_
[Wed Sep 20 22:13:03 2023]
Finished job 22.
[Wed Sep 20 22:13:03 2023]
Finished job 23.
[Wed Sep 20 22:13:03 2023]
Finished job 21.
3 of 61 steps (5%) done
Select jobs to execute...
[Wed Sep 20 22:13:03 2023]
group job da2000d7-ffd9-4031-b802-451cbb89a149 (jobs in lexicogr. order):
[Wed Sep 20 22:13:03 2023]
rule cutadapt_pe:
input: pipe/cutadapt/A1/1.fq1.fastq, pipe/cutadapt/A1/1.fq2.fastq
output: results/trimmed/A1_1_R1.fastq.gz, results/trimmed/A1_1_R2.fastq.gz, results/trimmed/A1_1.paired.qc.txt
log: logs/cutadapt/A1_1.log
jobid: 6
reason: Missing output files: results/trimmed/A1_1_R1.fastq.gz, results/trimmed/A1_1_R2.fastq.gz; Input files updated by another job: pipe/cutadapt/A1/1.fq1.fastq, pipe/cutadapt/A1/1.fq2.fastq
wildcards: sample=A1, unit=1
threads: 2
resources: tmpdir=/tmp
[Wed Sep 20 22:13:03 2023]
rule cutadapt_pipe:
input: ngs-test-data/reads/a.scerevisiae.1.fq
output: pipe/cutadapt/A1/1.fq1.fastq (pipe)
log: logs/pipe-fastqs/catadapt/A1_1.fq1.fastq.log
jobid: 7
reason: Missing output files: pipe/cutadapt/A1/1.fq1.fastq
wildcards: sample=A1, unit=1, fq=fq1, ext=fastq
threads: 0
resources: tmpdir=/tmp
[Wed Sep 20 22:13:03 2023]
rule cutadapt_pipe:
input: ngs-test-data/reads/a.scerevisiae.2.fq
output: pipe/cutadapt/A1/1.fq2.fastq (pipe)
log: logs/pipe-fastqs/catadapt/A1_1.fq2.fastq.log
jobid: 8
reason: Missing output files: pipe/cutadapt/A1/1.fq2.fastq
wildcards: sample=A1, unit=1, fq=fq2, ext=fastq
threads: 0
resources: tmpdir=/tmp
Activating conda environment: .snakemake/conda/686f2b5813d0bc7102134b5e3825944b_
Activating conda environment: .snakemake/conda/686f2b5813d0bc7102134b5e3825944b_
[Wed Sep 20 22:13:04 2023]
Finished job 7.
[Wed Sep 20 22:13:04 2023]
Finished job 8.
[Wed Sep 20 22:13:04 2023]
Finished job 6.
6 of 61 steps (10%) done
Select jobs to execute...
[Wed Sep 20 22:13:04 2023]
group job 6288b35b-0f54-4cfe-bd22-4dbdc36e76b6 (jobs in lexicogr. order):
[Wed Sep 20 22:13:04 2023]
rule cutadapt_pe:
input: pipe/cutadapt/A2/1.fq1.fastq, pipe/cutadapt/A2/1.fq2.fastq
output: results/trimmed/A2_1_R1.fastq.gz, results/trimmed/A2_1_R2.fastq.gz, results/trimmed/A2_1.paired.qc.txt
log: logs/cutadapt/A2_1.log
jobid: 13
reason: Missing output files: results/trimmed/A2_1_R1.fastq.gz, results/trimmed/A2_1_R2.fastq.gz; Input files updated by another job: pipe/cutadapt/A2/1.fq2.fastq, pipe/cutadapt/A2/1.fq1.fastq
wildcards: sample=A2, unit=1
threads: 2
resources: tmpdir=/tmp
[Wed Sep 20 22:13:04 2023]
rule cutadapt_pipe:
input: ngs-test-data/reads/c.scerevisiae.1.fq
output: pipe/cutadapt/A2/1.fq1.fastq (pipe)
log: logs/pipe-fastqs/catadapt/A2_1.fq1.fastq.log
jobid: 14
reason: Missing output files: pipe/cutadapt/A2/1.fq1.fastq
wildcards: sample=A2, unit=1, fq=fq1, ext=fastq
threads: 0
resources: tmpdir=/tmp
[Wed Sep 20 22:13:04 2023]
rule cutadapt_pipe:
input: ngs-test-data/reads/c.scerevisiae.2.fq
output: pipe/cutadapt/A2/1.fq2.fastq (pipe)
log: logs/pipe-fastqs/catadapt/A2_1.fq2.fastq.log
jobid: 15
reason: Missing output files: pipe/cutadapt/A2/1.fq2.fastq
wildcards: sample=A2, unit=1, fq=fq2, ext=fastq
threads: 0
resources: tmpdir=/tmp
Activating conda environment: .snakemake/conda/686f2b5813d0bc7102134b5e3825944b_
Activating conda environment: .snakemake/conda/686f2b5813d0bc7102134b5e3825944b_
[Wed Sep 20 22:13:05 2023]
Finished job 14.
[Wed Sep 20 22:13:05 2023]
Finished job 15.
[Wed Sep 20 22:13:05 2023]
Finished job 13.
9 of 61 steps (15%) done
Select jobs to execute...
[Wed Sep 20 22:13:05 2023]
group job 38b55461-5e6f-47b3-9387-a9dcaf8b3b3c (jobs in lexicogr. order):
[Wed Sep 20 22:13:05 2023]
rule cutadapt_pe:
input: pipe/cutadapt/B1/1.fq1.fastq, pipe/cutadapt/B1/1.fq2.fastq
output: results/trimmed/B1_1_R1.fastq.gz, results/trimmed/B1_1_R2.fastq.gz, results/trimmed/B1_1.paired.qc.txt
log: logs/cutadapt/B1_1.log
jobid: 17
reason: Missing output files: results/trimmed/B1_1_R1.fastq.gz, results/trimmed/B1_1_R2.fastq.gz; Input files updated by another job: pipe/cutadapt/B1/1.fq2.fastq, pipe/cutadapt/B1/1.fq1.fastq
wildcards: sample=B1, unit=1
threads: 2
resources: tmpdir=/tmp
[Wed Sep 20 22:13:05 2023]
rule cutadapt_pipe:
input: ngs-test-data/reads/c.scerevisiae.1.fq
output: pipe/cutadapt/B1/1.fq1.fastq (pipe)
log: logs/pipe-fastqs/catadapt/B1_1.fq1.fastq.log
jobid: 18
reason: Missing output files: pipe/cutadapt/B1/1.fq1.fastq
wildcards: sample=B1, unit=1, fq=fq1, ext=fastq
threads: 0
resources: tmpdir=/tmp
[Wed Sep 20 22:13:05 2023]
rule cutadapt_pipe:
input: ngs-test-data/reads/c.scerevisiae.2.fq
output: pipe/cutadapt/B1/1.fq2.fastq (pipe)
log: logs/pipe-fastqs/catadapt/B1_1.fq2.fastq.log
jobid: 19
reason: Missing output files: pipe/cutadapt/B1/1.fq2.fastq
wildcards: sample=B1, unit=1, fq=fq2, ext=fastq
threads: 0
resources: tmpdir=/tmp
Activating conda environment: .snakemake/conda/686f2b5813d0bc7102134b5e3825944b_
Activating conda environment: .snakemake/conda/686f2b5813d0bc7102134b5e3825944b_
[Wed Sep 20 22:13:06 2023]
Finished job 18.
[Wed Sep 20 22:13:06 2023]
Finished job 19.
[Wed Sep 20 22:13:06 2023]
Finished job 17.
12 of 61 steps (20%) done
Select jobs to execute...
[Wed Sep 20 22:13:06 2023]
rule get_genome:
output: resources/genome.fasta
log: logs/get-genome.log
jobid: 10
reason: Missing output files: resources/genome.fasta
resources: tmpdir=/tmp
[Wed Sep 20 22:13:06 2023]
rule get_annotation:
output: resources/genome.gtf
log: logs/get_annotation.log
jobid: 11
reason: Missing output files: resources/genome.gtf
resources: tmpdir=/tmp
Activating conda environment: .snakemake/conda/b989f3f8888314c661109a11e4951d06_
Activating conda environment: .snakemake/conda/b989f3f8888314c661109a11e4951d06_
[Wed Sep 20 22:13:09 2023]
Finished job 11.
13 of 61 steps (21%) done
Select jobs to execute...
[Wed Sep 20 22:13:09 2023]
rule rseqc_gtf2bed:
input: resources/genome.gtf
output: results/qc/rseqc/annotation.bed, results/qc/rseqc/annotation.db
log: logs/rseqc_gtf2bed.log
jobid: 28
reason: Missing output files: results/qc/rseqc/annotation.bed; Input files updated by another job: resources/genome.gtf
resources: tmpdir=/tmp
Activating conda environment: .snakemake/conda/5adf00fcbc983f732eb9e031c62ebc03_
Activating conda environment: .snakemake/conda/5adf00fcbc983f732eb9e031c62ebc03_
[Wed Sep 20 22:13:13 2023]
Finished job 10.
14 of 61 steps (23%) done
Select jobs to execute...
[Wed Sep 20 22:13:15 2023]
Finished job 28.
15 of 61 steps (25%) done
Removing temporary output results/qc/rseqc/annotation.db.
[Wed Sep 20 22:13:15 2023]
rule star_index:
input: resources/genome.fasta, resources/genome.gtf
output: resources/star_genome
log: logs/star_index_genome.log
jobid: 9
reason: Missing output files: resources/star_genome; Input files updated by another job: resources/genome.fasta, resources/genome.gtf
threads: 2
resources: tmpdir=/tmp
Activating conda environment: .snakemake/conda/d02df01c1a926fffbd5dd7a2048236cd_
[Wed Sep 20 22:13:45 2023]
Finished job 9.
16 of 61 steps (26%) done
Select jobs to execute...
[Wed Sep 20 22:13:45 2023]
rule align:
input: results/trimmed/B1_1_R1.fastq.gz, results/trimmed/B1_1_R2.fastq.gz, resources/star_genome, resources/genome.gtf
output: results/star/B1_1/Aligned.sortedByCoord.out.bam, results/star/B1_1/ReadsPerGene.out.tab
log: logs/star/B1_1.log
jobid: 16
reason: Missing output files: results/star/B1_1/Aligned.sortedByCoord.out.bam, results/star/B1_1/ReadsPerGene.out.tab; Input files updated by another job: results/trimmed/B1_1_R1.fastq.gz, resources/star_genome, resources/genome.gtf, results/trimmed/B1_1_R2.fastq.gz
wildcards: sample=B1, unit=1
threads: 2
resources: tmpdir=/tmp
Activating conda environment: .snakemake/conda/8c561395ac76b6c2cbace6e1f2e26ae4_
[Wed Sep 20 22:14:33 2023]
Finished job 16.
17 of 61 steps (28%) done
Select jobs to execute...
[Wed Sep 20 22:14:33 2023]
rule rseqc_readdis:
input: results/star/B1_1/Aligned.sortedByCoord.out.bam, results/qc/rseqc/annotation.bed
output: results/qc/rseqc/B1_1.readdistribution.txt
log: logs/rseqc/rseqc_readdis/B1_1.log
jobid: 50
reason: Missing output files: results/qc/rseqc/B1_1.readdistribution.txt; Input files updated by another job: results/star/B1_1/Aligned.sortedByCoord.out.bam, results/qc/rseqc/annotation.bed
wildcards: sample=B1, unit=1
priority: 1
resources: tmpdir=/tmp
Activating conda environment: .snakemake/conda/f4b389ae96894c64dac5fdb184105ba7_
[Wed Sep 20 22:14:33 2023]
rule rseqc_junction_annotation:
input: results/star/B1_1/Aligned.sortedByCoord.out.bam, results/qc/rseqc/annotation.bed
output: results/qc/rseqc/B1_1.junctionanno.junction.bed
log: logs/rseqc/rseqc_junction_annotation/B1_1.log
jobid: 30
reason: Missing output files: results/qc/rseqc/B1_1.junctionanno.junction.bed, logs/rseqc/rseqc_junction_annotation/B1_1.log; Input files updated by another job: results/star/B1_1/Aligned.sortedByCoord.out.bam, results/qc/rseqc/annotation.bed
wildcards: sample=B1, unit=1
priority: 1
resources: tmpdir=/tmp
Activating conda environment: .snakemake/conda/f4b389ae96894c64dac5fdb184105ba7_
[Wed Sep 20 22:14:35 2023]
Finished job 30.
18 of 61 steps (30%) done
Select jobs to execute...
[Wed Sep 20 22:14:35 2023]
rule rseqc_junction_saturation:
input: results/star/B1_1/Aligned.sortedByCoord.out.bam, results/qc/rseqc/annotation.bed
output: results/qc/rseqc/B1_1.junctionsat.junctionSaturation_plot.pdf
log: logs/rseqc/rseqc_junction_saturation/B1_1.log
jobid: 34
reason: Missing output files: results/qc/rseqc/B1_1.junctionsat.junctionSaturation_plot.pdf; Input files updated by another job: results/star/B1_1/Aligned.sortedByCoord.out.bam, results/qc/rseqc/annotation.bed
wildcards: sample=B1, unit=1
priority: 1
resources: tmpdir=/tmp
Activating conda environment: .snakemake/conda/f4b389ae96894c64dac5fdb184105ba7_
[Wed Sep 20 22:14:36 2023]
Finished job 50.
19 of 61 steps (31%) done
Select jobs to execute...
[Wed Sep 20 22:14:36 2023]
rule rseqc_infer:
input: results/star/B1_1/Aligned.sortedByCoord.out.bam, results/qc/rseqc/annotation.bed
output: results/qc/rseqc/B1_1.infer_experiment.txt
log: logs/rseqc/rseqc_infer/B1_1.log
jobid: 38
reason: Missing output files: results/qc/rseqc/B1_1.infer_experiment.txt; Input files updated by another job: results/star/B1_1/Aligned.sortedByCoord.out.bam, results/qc/rseqc/annotation.bed
wildcards: sample=B1, unit=1
priority: 1
resources: tmpdir=/tmp
Activating conda environment: .snakemake/conda/f4b389ae96894c64dac5fdb184105ba7_
[Wed Sep 20 22:14:38 2023]
Finished job 38.
20 of 61 steps (33%) done
Select jobs to execute...
[Wed Sep 20 22:14:38 2023]
rule rseqc_innerdis:
input: results/star/B1_1/Aligned.sortedByCoord.out.bam, results/qc/rseqc/annotation.bed
output: results/qc/rseqc/B1_1.inner_distance_freq.inner_distance.txt
log: logs/rseqc/rseqc_innerdis/B1_1.log
jobid: 46
reason: Missing output files: results/qc/rseqc/B1_1.inner_distance_freq.inner_distance.txt; Input files updated by another job: results/star/B1_1/Aligned.sortedByCoord.out.bam, results/qc/rseqc/annotation.bed
wildcards: sample=B1, unit=1
priority: 1
resources: tmpdir=/tmp
Activating conda environment: .snakemake/conda/f4b389ae96894c64dac5fdb184105ba7_
[Wed Sep 20 22:14:38 2023]
Finished job 34.
21 of 61 steps (34%) done
Select jobs to execute...
[Wed Sep 20 22:14:38 2023]
rule rseqc_readdup:
input: results/star/B1_1/Aligned.sortedByCoord.out.bam
output: results/qc/rseqc/B1_1.readdup.DupRate_plot.pdf
log: logs/rseqc/rseqc_readdup/B1_1.log
jobid: 54
reason: Missing output files: results/qc/rseqc/B1_1.readdup.DupRate_plot.pdf; Input files updated by another job: results/star/B1_1/Aligned.sortedByCoord.out.bam
wildcards: sample=B1, unit=1
priority: 1
resources: tmpdir=/tmp
Activating conda environment: .snakemake/conda/f4b389ae96894c64dac5fdb184105ba7_
[Wed Sep 20 22:14:40 2023]
Finished job 54.
22 of 61 steps (36%) done
Select jobs to execute...
[Wed Sep 20 22:14:40 2023]
rule rseqc_readgc:
input: results/star/B1_1/Aligned.sortedByCoord.out.bam
output: results/qc/rseqc/B1_1.readgc.GC_plot.pdf
log: logs/rseqc/rseqc_readgc/B1_1.log
jobid: 58
reason: Missing output files: results/qc/rseqc/B1_1.readgc.GC_plot.pdf; Input files updated by another job: results/star/B1_1/Aligned.sortedByCoord.out.bam
wildcards: sample=B1, unit=1
priority: 1
resources: tmpdir=/tmp
Activating conda environment: .snakemake/conda/f4b389ae96894c64dac5fdb184105ba7_
[Wed Sep 20 22:14:40 2023]
Finished job 46.
23 of 61 steps (38%) done
Select jobs to execute...
[Wed Sep 20 22:14:40 2023]
rule rseqc_stat:
input: results/star/B1_1/Aligned.sortedByCoord.out.bam
output: results/qc/rseqc/B1_1.stats.txt
log: logs/rseqc/rseqc_stat/B1_1.log
jobid: 42
reason: Missing output files: results/qc/rseqc/B1_1.stats.txt; Input files updated by another job: results/star/B1_1/Aligned.sortedByCoord.out.bam
wildcards: sample=B1, unit=1
priority: 1
resources: tmpdir=/tmp
Activating conda environment: .snakemake/conda/f4b389ae96894c64dac5fdb184105ba7_
[Wed Sep 20 22:14:42 2023]
Finished job 42.
24 of 61 steps (39%) done
Select jobs to execute...
[Wed Sep 20 22:14:42 2023]
Finished job 58.
25 of 61 steps (41%) done
[Wed Sep 20 22:14:42 2023]
rule align:
input: results/trimmed/A2_1_R1.fastq.gz, results/trimmed/A2_1_R2.fastq.gz, resources/star_genome, resources/genome.gtf
output: results/star/A2_1/Aligned.sortedByCoord.out.bam, results/star/A2_1/ReadsPerGene.out.tab
log: logs/star/A2_1.log
jobid: 12
reason: Missing output files: results/star/A2_1/Aligned.sortedByCoord.out.bam, results/star/A2_1/ReadsPerGene.out.tab; Input files updated by another job: results/trimmed/A2_1_R1.fastq.gz, resources/star_genome, results/trimmed/A2_1_R2.fastq.gz, resources/genome.gtf
wildcards: sample=A2, unit=1
threads: 2
resources: tmpdir=/tmp
Activating conda environment: .snakemake/conda/8c561395ac76b6c2cbace6e1f2e26ae4_
[Wed Sep 20 22:15:30 2023]
Finished job 12.
26 of 61 steps (43%) done
Select jobs to execute...
[Wed Sep 20 22:15:30 2023]
rule rseqc_infer:
input: results/star/A2_1/Aligned.sortedByCoord.out.bam, results/qc/rseqc/annotation.bed
output: results/qc/rseqc/A2_1.infer_experiment.txt
log: logs/rseqc/rseqc_infer/A2_1.log
jobid: 37
reason: Missing output files: results/qc/rseqc/A2_1.infer_experiment.txt; Input files updated by another job: results/star/A2_1/Aligned.sortedByCoord.out.bam, results/qc/rseqc/annotation.bed
wildcards: sample=A2, unit=1
priority: 1
resources: tmpdir=/tmp
Activating conda environment: .snakemake/conda/f4b389ae96894c64dac5fdb184105ba7_
[Wed Sep 20 22:15:30 2023]
rule rseqc_innerdis:
input: results/star/A2_1/Aligned.sortedByCoord.out.bam, results/qc/rseqc/annotation.bed
output: results/qc/rseqc/A2_1.inner_distance_freq.inner_distance.txt
log: logs/rseqc/rseqc_innerdis/A2_1.log
jobid: 45
reason: Missing output files: results/qc/rseqc/A2_1.inner_distance_freq.inner_distance.txt; Input files updated by another job: results/star/A2_1/Aligned.sortedByCoord.out.bam, results/qc/rseqc/annotation.bed
wildcards: sample=A2, unit=1
priority: 1
resources: tmpdir=/tmp
Activating conda environment: .snakemake/conda/f4b389ae96894c64dac5fdb184105ba7_
[Wed Sep 20 22:15:32 2023]
Finished job 37.
27 of 61 steps (44%) done
Select jobs to execute...
[Wed Sep 20 22:15:32 2023]
rule rseqc_readdis:
input: results/star/A2_1/Aligned.sortedByCoord.out.bam, results/qc/rseqc/annotation.bed
output: results/qc/rseqc/A2_1.readdistribution.txt
log: logs/rseqc/rseqc_readdis/A2_1.log
jobid: 49
reason: Missing output files: results/qc/rseqc/A2_1.readdistribution.txt; Input files updated by another job: results/star/A2_1/Aligned.sortedByCoord.out.bam, results/qc/rseqc/annotation.bed
wildcards: sample=A2, unit=1
priority: 1
resources: tmpdir=/tmp
Activating conda environment: .snakemake/conda/f4b389ae96894c64dac5fdb184105ba7_
[Wed Sep 20 22:15:33 2023]
Finished job 45.
28 of 61 steps (46%) done
Select jobs to execute...
[Wed Sep 20 22:15:33 2023]
rule rseqc_junction_annotation:
input: results/star/A2_1/Aligned.sortedByCoord.out.bam, results/qc/rseqc/annotation.bed
output: results/qc/rseqc/A2_1.junctionanno.junction.bed
log: logs/rseqc/rseqc_junction_annotation/A2_1.log
jobid: 29
reason: Missing output files: results/qc/rseqc/A2_1.junctionanno.junction.bed, logs/rseqc/rseqc_junction_annotation/A2_1.log; Input files updated by another job: results/star/A2_1/Aligned.sortedByCoord.out.bam, results/qc/rseqc/annotation.bed
wildcards: sample=A2, unit=1
priority: 1
resources: tmpdir=/tmp
Activating conda environment: .snakemake/conda/f4b389ae96894c64dac5fdb184105ba7_
[Wed Sep 20 22:15:35 2023]
Finished job 49.
29 of 61 steps (48%) done
Select jobs to execute...
[Wed Sep 20 22:15:35 2023]
rule rseqc_junction_saturation:
input: results/star/A2_1/Aligned.sortedByCoord.out.bam, results/qc/rseqc/annotation.bed
output: results/qc/rseqc/A2_1.junctionsat.junctionSaturation_plot.pdf
log: logs/rseqc/rseqc_junction_saturation/A2_1.log
jobid: 33
reason: Missing output files: results/qc/rseqc/A2_1.junctionsat.junctionSaturation_plot.pdf; Input files updated by another job: results/star/A2_1/Aligned.sortedByCoord.out.bam, results/qc/rseqc/annotation.bed
wildcards: sample=A2, unit=1
priority: 1
resources: tmpdir=/tmp
Activating conda environment: .snakemake/conda/f4b389ae96894c64dac5fdb184105ba7_
[Wed Sep 20 22:15:35 2023]
Finished job 29.
30 of 61 steps (49%) done
Select jobs to execute...
[Wed Sep 20 22:15:35 2023]
rule rseqc_stat:
input: results/star/A2_1/Aligned.sortedByCoord.out.bam
output: results/qc/rseqc/A2_1.stats.txt
log: logs/rseqc/rseqc_stat/A2_1.log
jobid: 41
reason: Missing output files: results/qc/rseqc/A2_1.stats.txt; Input files updated by another job: results/star/A2_1/Aligned.sortedByCoord.out.bam
wildcards: sample=A2, unit=1
priority: 1
resources: tmpdir=/tmp
Activating conda environment: .snakemake/conda/f4b389ae96894c64dac5fdb184105ba7_
[Wed Sep 20 22:15:37 2023]
Finished job 41.
31 of 61 steps (51%) done
Select jobs to execute...
[Wed Sep 20 22:15:37 2023]
rule rseqc_readdup:
input: results/star/A2_1/Aligned.sortedByCoord.out.bam
output: results/qc/rseqc/A2_1.readdup.DupRate_plot.pdf
log: logs/rseqc/rseqc_readdup/A2_1.log
jobid: 53
reason: Missing output files: results/qc/rseqc/A2_1.readdup.DupRate_plot.pdf; Input files updated by another job: results/star/A2_1/Aligned.sortedByCoord.out.bam
wildcards: sample=A2, unit=1
priority: 1
resources: tmpdir=/tmp
Activating conda environment: .snakemake/conda/f4b389ae96894c64dac5fdb184105ba7_
[Wed Sep 20 22:15:37 2023]
Finished job 33.
32 of 61 steps (52%) done
Select jobs to execute...
[Wed Sep 20 22:15:37 2023]
rule rseqc_readgc:
input: results/star/A2_1/Aligned.sortedByCoord.out.bam
output: results/qc/rseqc/A2_1.readgc.GC_plot.pdf
log: logs/rseqc/rseqc_readgc/A2_1.log
jobid: 57
reason: Missing output files: results/qc/rseqc/A2_1.readgc.GC_plot.pdf; Input files updated by another job: results/star/A2_1/Aligned.sortedByCoord.out.bam
wildcards: sample=A2, unit=1
priority: 1
resources: tmpdir=/tmp
Activating conda environment: .snakemake/conda/f4b389ae96894c64dac5fdb184105ba7_
[Wed Sep 20 22:15:40 2023]
Finished job 53.
33 of 61 steps (54%) done
Select jobs to execute...
[Wed Sep 20 22:15:40 2023]
Finished job 57.
34 of 61 steps (56%) done
[Wed Sep 20 22:15:40 2023]
rule align:
input: results/trimmed/B2_1_R1.fastq.gz, results/trimmed/B2_1_R2.fastq.gz, resources/star_genome, resources/genome.gtf
output: results/star/B2_1/Aligned.sortedByCoord.out.bam, results/star/B2_1/ReadsPerGene.out.tab
log: logs/star/B2_1.log
jobid: 20
reason: Missing output files: results/star/B2_1/ReadsPerGene.out.tab, results/star/B2_1/Aligned.sortedByCoord.out.bam; Input files updated by another job: resources/star_genome, results/trimmed/B2_1_R1.fastq.gz, resources/genome.gtf, results/trimmed/B2_1_R2.fastq.gz
wildcards: sample=B2, unit=1
threads: 2
resources: tmpdir=/tmp
Activating conda environment: .snakemake/conda/8c561395ac76b6c2cbace6e1f2e26ae4_
[Wed Sep 20 22:15:46 2023]
Finished job 20.
35 of 61 steps (57%) done
Select jobs to execute...
[Wed Sep 20 22:15:46 2023]
rule rseqc_junction_saturation:
input: results/star/B2_1/Aligned.sortedByCoord.out.bam, results/qc/rseqc/annotation.bed
output: results/qc/rseqc/B2_1.junctionsat.junctionSaturation_plot.pdf
log: logs/rseqc/rseqc_junction_saturation/B2_1.log
jobid: 35
reason: Missing output files: results/qc/rseqc/B2_1.junctionsat.junctionSaturation_plot.pdf; Input files updated by another job: results/qc/rseqc/annotation.bed, results/star/B2_1/Aligned.sortedByCoord.out.bam
wildcards: sample=B2, unit=1
priority: 1
resources: tmpdir=/tmp
Activating conda environment: .snakemake/conda/f4b389ae96894c64dac5fdb184105ba7_
[Wed Sep 20 22:15:46 2023]
rule rseqc_infer:
input: results/star/B2_1/Aligned.sortedByCoord.out.bam, results/qc/rseqc/annotation.bed
output: results/qc/rseqc/B2_1.infer_experiment.txt
log: logs/rseqc/rseqc_infer/B2_1.log
jobid: 39
reason: Missing output files: results/qc/rseqc/B2_1.infer_experiment.txt; Input files updated by another job: results/qc/rseqc/annotation.bed, results/star/B2_1/Aligned.sortedByCoord.out.bam
wildcards: sample=B2, unit=1
priority: 1
resources: tmpdir=/tmp
Activating conda environment: .snakemake/conda/f4b389ae96894c64dac5fdb184105ba7_
[Wed Sep 20 22:15:48 2023]
Finished job 39.
36 of 61 steps (59%) done
Select jobs to execute...
[Wed Sep 20 22:15:48 2023]
rule rseqc_innerdis:
input: results/star/B2_1/Aligned.sortedByCoord.out.bam, results/qc/rseqc/annotation.bed
output: results/qc/rseqc/B2_1.inner_distance_freq.inner_distance.txt
log: logs/rseqc/rseqc_innerdis/B2_1.log
jobid: 47
reason: Missing output files: results/qc/rseqc/B2_1.inner_distance_freq.inner_distance.txt; Input files updated by another job: results/qc/rseqc/annotation.bed, results/star/B2_1/Aligned.sortedByCoord.out.bam
wildcards: sample=B2, unit=1
priority: 1
resources: tmpdir=/tmp
Activating conda environment: .snakemake/conda/f4b389ae96894c64dac5fdb184105ba7_
[Wed Sep 20 22:15:48 2023]
Finished job 35.
37 of 61 steps (61%) done
Select jobs to execute...
[Wed Sep 20 22:15:48 2023]
rule rseqc_readdis:
input: results/star/B2_1/Aligned.sortedByCoord.out.bam, results/qc/rseqc/annotation.bed
output: results/qc/rseqc/B2_1.readdistribution.txt
log: logs/rseqc/rseqc_readdis/B2_1.log
jobid: 51
reason: Missing output files: results/qc/rseqc/B2_1.readdistribution.txt; Input files updated by another job: results/qc/rseqc/annotation.bed, results/star/B2_1/Aligned.sortedByCoord.out.bam
wildcards: sample=B2, unit=1
priority: 1
resources: tmpdir=/tmp
Activating conda environment: .snakemake/conda/f4b389ae96894c64dac5fdb184105ba7_
[Wed Sep 20 22:15:51 2023]
Finished job 47.
38 of 61 steps (62%) done
Select jobs to execute...
[Wed Sep 20 22:15:51 2023]
rule rseqc_junction_annotation:
input: results/star/B2_1/Aligned.sortedByCoord.out.bam, results/qc/rseqc/annotation.bed
output: results/qc/rseqc/B2_1.junctionanno.junction.bed
log: logs/rseqc/rseqc_junction_annotation/B2_1.log
jobid: 31
reason: Missing output files: logs/rseqc/rseqc_junction_annotation/B2_1.log, results/qc/rseqc/B2_1.junctionanno.junction.bed; Input files updated by another job: results/qc/rseqc/annotation.bed, results/star/B2_1/Aligned.sortedByCoord.out.bam
wildcards: sample=B2, unit=1
priority: 1
resources: tmpdir=/tmp
Activating conda environment: .snakemake/conda/f4b389ae96894c64dac5fdb184105ba7_
[Wed Sep 20 22:15:51 2023]
Finished job 51.
39 of 61 steps (64%) done
Select jobs to execute...
[Wed Sep 20 22:15:51 2023]
rule rseqc_stat:
input: results/star/B2_1/Aligned.sortedByCoord.out.bam
output: results/qc/rseqc/B2_1.stats.txt
log: logs/rseqc/rseqc_stat/B2_1.log
jobid: 43
reason: Missing output files: results/qc/rseqc/B2_1.stats.txt; Input files updated by another job: results/star/B2_1/Aligned.sortedByCoord.out.bam
wildcards: sample=B2, unit=1
priority: 1
resources: tmpdir=/tmp
Activating conda environment: .snakemake/conda/f4b389ae96894c64dac5fdb184105ba7_
[Wed Sep 20 22:15:53 2023]
Finished job 43.
40 of 61 steps (66%) done
Select jobs to execute...
[Wed Sep 20 22:15:53 2023]
rule rseqc_readgc:
input: results/star/B2_1/Aligned.sortedByCoord.out.bam
output: results/qc/rseqc/B2_1.readgc.GC_plot.pdf
log: logs/rseqc/rseqc_readgc/B2_1.log
jobid: 59
reason: Missing output files: results/qc/rseqc/B2_1.readgc.GC_plot.pdf; Input files updated by another job: results/star/B2_1/Aligned.sortedByCoord.out.bam
wildcards: sample=B2, unit=1
priority: 1
resources: tmpdir=/tmp
Activating conda environment: .snakemake/conda/f4b389ae96894c64dac5fdb184105ba7_
[Wed Sep 20 22:15:53 2023]
Finished job 31.
41 of 61 steps (67%) done
Select jobs to execute...
[Wed Sep 20 22:15:53 2023]
rule rseqc_readdup:
input: results/star/B2_1/Aligned.sortedByCoord.out.bam
output: results/qc/rseqc/B2_1.readdup.DupRate_plot.pdf
log: logs/rseqc/rseqc_readdup/B2_1.log
jobid: 55
reason: Missing output files: results/qc/rseqc/B2_1.readdup.DupRate_plot.pdf; Input files updated by another job: results/star/B2_1/Aligned.sortedByCoord.out.bam
wildcards: sample=B2, unit=1
priority: 1
resources: tmpdir=/tmp
Activating conda environment: .snakemake/conda/f4b389ae96894c64dac5fdb184105ba7_
[Wed Sep 20 22:15:56 2023]
Finished job 59.
42 of 61 steps (69%) done
Select jobs to execute...
[Wed Sep 20 22:15:56 2023]
Finished job 55.
43 of 61 steps (70%) done
[Wed Sep 20 22:15:56 2023]
rule align:
input: results/trimmed/A1_1_R1.fastq.gz, results/trimmed/A1_1_R2.fastq.gz, resources/star_genome, resources/genome.gtf
output: results/star/A1_1/Aligned.sortedByCoord.out.bam, results/star/A1_1/ReadsPerGene.out.tab
log: logs/star/A1_1.log
jobid: 5
reason: Missing output files: results/star/A1_1/ReadsPerGene.out.tab, results/star/A1_1/Aligned.sortedByCoord.out.bam; Input files updated by another job: resources/star_genome, resources/genome.gtf, results/trimmed/A1_1_R1.fastq.gz, results/trimmed/A1_1_R2.fastq.gz
wildcards: sample=A1, unit=1
threads: 2
resources: tmpdir=/tmp
Activating conda environment: .snakemake/conda/8c561395ac76b6c2cbace6e1f2e26ae4_
[Wed Sep 20 22:16:02 2023]
Finished job 5.
44 of 61 steps (72%) done
Select jobs to execute...
[Wed Sep 20 22:16:02 2023]
rule rseqc_infer:
input: results/star/A1_1/Aligned.sortedByCoord.out.bam, results/qc/rseqc/annotation.bed
output: results/qc/rseqc/A1_1.infer_experiment.txt
log: logs/rseqc/rseqc_infer/A1_1.log
jobid: 36
reason: Missing output files: results/qc/rseqc/A1_1.infer_experiment.txt; Input files updated by another job: results/qc/rseqc/annotation.bed, results/star/A1_1/Aligned.sortedByCoord.out.bam
wildcards: sample=A1, unit=1
priority: 1
resources: tmpdir=/tmp
Activating conda environment: .snakemake/conda/f4b389ae96894c64dac5fdb184105ba7_
[Wed Sep 20 22:16:02 2023]
rule rseqc_junction_saturation:
input: results/star/A1_1/Aligned.sortedByCoord.out.bam, results/qc/rseqc/annotation.bed
output: results/qc/rseqc/A1_1.junctionsat.junctionSaturation_plot.pdf
log: logs/rseqc/rseqc_junction_saturation/A1_1.log
jobid: 32
reason: Missing output files: results/qc/rseqc/A1_1.junctionsat.junctionSaturation_plot.pdf; Input files updated by another job: results/qc/rseqc/annotation.bed, results/star/A1_1/Aligned.sortedByCoord.out.bam
wildcards: sample=A1, unit=1
priority: 1
resources: tmpdir=/tmp
Activating conda environment: .snakemake/conda/f4b389ae96894c64dac5fdb184105ba7_
[Wed Sep 20 22:16:04 2023]
Finished job 36.
45 of 61 steps (74%) done
Select jobs to execute...
[Wed Sep 20 22:16:04 2023]
rule rseqc_junction_annotation:
input: results/star/A1_1/Aligned.sortedByCoord.out.bam, results/qc/rseqc/annotation.bed
output: results/qc/rseqc/A1_1.junctionanno.junction.bed
log: logs/rseqc/rseqc_junction_annotation/A1_1.log
jobid: 27
reason: Missing output files: logs/rseqc/rseqc_junction_annotation/A1_1.log, results/qc/rseqc/A1_1.junctionanno.junction.bed; Input files updated by another job: results/qc/rseqc/annotation.bed, results/star/A1_1/Aligned.sortedByCoord.out.bam
wildcards: sample=A1, unit=1
priority: 1
resources: tmpdir=/tmp
Activating conda environment: .snakemake/conda/f4b389ae96894c64dac5fdb184105ba7_
[Wed Sep 20 22:16:04 2023]
Finished job 32.
46 of 61 steps (75%) done
Select jobs to execute...
[Wed Sep 20 22:16:04 2023]
rule rseqc_innerdis:
input: results/star/A1_1/Aligned.sortedByCoord.out.bam, results/qc/rseqc/annotation.bed
output: results/qc/rseqc/A1_1.inner_distance_freq.inner_distance.txt
log: logs/rseqc/rseqc_innerdis/A1_1.log
jobid: 44
reason: Missing output files: results/qc/rseqc/A1_1.inner_distance_freq.inner_distance.txt; Input files updated by another job: results/qc/rseqc/annotation.bed, results/star/A1_1/Aligned.sortedByCoord.out.bam
wildcards: sample=A1, unit=1
priority: 1
resources: tmpdir=/tmp
Activating conda environment: .snakemake/conda/f4b389ae96894c64dac5fdb184105ba7_
[Wed Sep 20 22:16:07 2023]
Finished job 27.
47 of 61 steps (77%) done
Select jobs to execute...
[Wed Sep 20 22:16:07 2023]
rule rseqc_readdis:
input: results/star/A1_1/Aligned.sortedByCoord.out.bam, results/qc/rseqc/annotation.bed
output: results/qc/rseqc/A1_1.readdistribution.txt
log: logs/rseqc/rseqc_readdis/A1_1.log
jobid: 48
reason: Missing output files: results/qc/rseqc/A1_1.readdistribution.txt; Input files updated by another job: results/qc/rseqc/annotation.bed, results/star/A1_1/Aligned.sortedByCoord.out.bam
wildcards: sample=A1, unit=1
priority: 1
resources: tmpdir=/tmp
Activating conda environment: .snakemake/conda/f4b389ae96894c64dac5fdb184105ba7_
[Wed Sep 20 22:16:07 2023]
Finished job 44.
48 of 61 steps (79%) done
Select jobs to execute...
[Wed Sep 20 22:16:07 2023]
rule rseqc_stat:
input: results/star/A1_1/Aligned.sortedByCoord.out.bam
output: results/qc/rseqc/A1_1.stats.txt
log: logs/rseqc/rseqc_stat/A1_1.log
jobid: 40
reason: Missing output files: results/qc/rseqc/A1_1.stats.txt; Input files updated by another job: results/star/A1_1/Aligned.sortedByCoord.out.bam
wildcards: sample=A1, unit=1
priority: 1
resources: tmpdir=/tmp
Activating conda environment: .snakemake/conda/f4b389ae96894c64dac5fdb184105ba7_
[Wed Sep 20 22:16:09 2023]
Finished job 40.
49 of 61 steps (80%) done
Select jobs to execute...
[Wed Sep 20 22:16:09 2023]
rule rseqc_readdup:
input: results/star/A1_1/Aligned.sortedByCoord.out.bam
output: results/qc/rseqc/A1_1.readdup.DupRate_plot.pdf
log: logs/rseqc/rseqc_readdup/A1_1.log
jobid: 52
reason: Missing output files: results/qc/rseqc/A1_1.readdup.DupRate_plot.pdf; Input files updated by another job: results/star/A1_1/Aligned.sortedByCoord.out.bam
wildcards: sample=A1, unit=1
priority: 1
resources: tmpdir=/tmp
Activating conda environment: .snakemake/conda/f4b389ae96894c64dac5fdb184105ba7_
[Wed Sep 20 22:16:09 2023]
Finished job 48.
50 of 61 steps (82%) done
Select jobs to execute...
[Wed Sep 20 22:16:09 2023]
rule rseqc_readgc:
input: results/star/A1_1/Aligned.sortedByCoord.out.bam
output: results/qc/rseqc/A1_1.readgc.GC_plot.pdf
log: logs/rseqc/rseqc_readgc/A1_1.log
jobid: 56
reason: Missing output files: results/qc/rseqc/A1_1.readgc.GC_plot.pdf; Input files updated by another job: results/star/A1_1/Aligned.sortedByCoord.out.bam
wildcards: sample=A1, unit=1
priority: 1
resources: tmpdir=/tmp
Activating conda environment: .snakemake/conda/f4b389ae96894c64dac5fdb184105ba7_
[Wed Sep 20 22:16:12 2023]
Finished job 52.
51 of 61 steps (84%) done
Select jobs to execute...
[Wed Sep 20 22:16:12 2023]
rule count_matrix:
input: results/star/A1_1/ReadsPerGene.out.tab, results/star/A2_1/ReadsPerGene.out.tab, results/star/B1_1/ReadsPerGene.out.tab, results/star/B2_1/ReadsPerGene.out.tab
output: results/counts/all.tsv
log: logs/count-matrix.log
jobid: 4
reason: Missing output files: results/counts/all.tsv; Input files updated by another job: results/star/B2_1/ReadsPerGene.out.tab, results/star/B1_1/ReadsPerGene.out.tab, results/star/A1_1/ReadsPerGene.out.tab, results/star/A2_1/ReadsPerGene.out.tab
resources: tmpdir=/tmp
Activating conda environment: .snakemake/conda/45752da70eae001d2312f5cc28f0893b_
[Wed Sep 20 22:16:12 2023]
Finished job 56.
52 of 61 steps (85%) done
Select jobs to execute...
[Wed Sep 20 22:16:12 2023]
rule multiqc:
input: results/star/A1_1/Aligned.sortedByCoord.out.bam, results/star/A2_1/Aligned.sortedByCoord.out.bam, results/star/B1_1/Aligned.sortedByCoord.out.bam, results/star/B2_1/Aligned.sortedByCoord.out.bam, results/qc/rseqc/A1_1.junctionanno.junction.bed, results/qc/rseqc/A2_1.junctionanno.junction.bed, results/qc/rseqc/B1_1.junctionanno.junction.bed, results/qc/rseqc/B2_1.junctionanno.junction.bed, results/qc/rseqc/A1_1.junctionsat.junctionSaturation_plot.pdf, results/qc/rseqc/A2_1.junctionsat.junctionSaturation_plot.pdf, results/qc/rseqc/B1_1.junctionsat.junctionSaturation_plot.pdf, results/qc/rseqc/B2_1.junctionsat.junctionSaturation_plot.pdf, results/qc/rseqc/A1_1.infer_experiment.txt, results/qc/rseqc/A2_1.infer_experiment.txt, results/qc/rseqc/B1_1.infer_experiment.txt, results/qc/rseqc/B2_1.infer_experiment.txt, results/qc/rseqc/A1_1.stats.txt, results/qc/rseqc/A2_1.stats.txt, results/qc/rseqc/B1_1.stats.txt, results/qc/rseqc/B2_1.stats.txt, results/qc/rseqc/A1_1.inner_distance_freq.inner_distance.txt, results/qc/rseqc/A2_1.inner_distance_freq.inner_distance.txt, results/qc/rseqc/B1_1.inner_distance_freq.inner_distance.txt, results/qc/rseqc/B2_1.inner_distance_freq.inner_distance.txt, results/qc/rseqc/A1_1.readdistribution.txt, results/qc/rseqc/A2_1.readdistribution.txt, results/qc/rseqc/B1_1.readdistribution.txt, results/qc/rseqc/B2_1.readdistribution.txt, results/qc/rseqc/A1_1.readdup.DupRate_plot.pdf, results/qc/rseqc/A2_1.readdup.DupRate_plot.pdf, results/qc/rseqc/B1_1.readdup.DupRate_plot.pdf, results/qc/rseqc/B2_1.readdup.DupRate_plot.pdf, results/qc/rseqc/A1_1.readgc.GC_plot.pdf, results/qc/rseqc/A2_1.readgc.GC_plot.pdf, results/qc/rseqc/B1_1.readgc.GC_plot.pdf, results/qc/rseqc/B2_1.readgc.GC_plot.pdf, logs/rseqc/rseqc_junction_annotation/A1_1.log, logs/rseqc/rseqc_junction_annotation/A2_1.log, logs/rseqc/rseqc_junction_annotation/B1_1.log, logs/rseqc/rseqc_junction_annotation/B2_1.log
output: results/qc/multiqc_report.html
log: logs/multiqc.log
jobid: 26
reason: Missing output files: results/qc/multiqc_report.html; Input files updated by another job: results/qc/rseqc/A1_1.inner_distance_freq.inner_distance.txt, results/star/A2_1/Aligned.sortedByCoord.out.bam, logs/rseqc/rseqc_junction_annotation/A1_1.log, logs/rseqc/rseqc_junction_annotation/B1_1.log, results/qc/rseqc/B1_1.stats.txt, results/qc/rseqc/B1_1.readdup.DupRate_plot.pdf, results/qc/rseqc/A1_1.junctionanno.junction.bed, results/qc/rseqc/A1_1.stats.txt, results/qc/rseqc/A2_1.inner_distance_freq.inner_distance.txt, results/qc/rseqc/B2_1.junctionsat.junctionSaturation_plot.pdf, results/qc/rseqc/B2_1.readgc.GC_plot.pdf, results/star/A1_1/Aligned.sortedByCoord.out.bam, results/qc/rseqc/B2_1.readdistribution.txt, results/qc/rseqc/A1_1.readgc.GC_plot.pdf, results/qc/rseqc/A1_1.readdup.DupRate_plot.pdf, results/qc/rseqc/B2_1.readdup.DupRate_plot.pdf, results/qc/rseqc/B1_1.inner_distance_freq.inner_distance.txt, results/qc/rseqc/B1_1.infer_experiment.txt, logs/rseqc/rseqc_junction_annotation/A2_1.log, results/qc/rseqc/A2_1.junctionsat.junctionSaturation_plot.pdf, results/qc/rseqc/B1_1.readdistribution.txt, results/qc/rseqc/A2_1.readdup.DupRate_plot.pdf, results/qc/rseqc/B2_1.junctionanno.junction.bed, logs/rseqc/rseqc_junction_annotation/B2_1.log, results/star/B2_1/Aligned.sortedByCoord.out.bam, results/qc/rseqc/B2_1.inner_distance_freq.inner_distance.txt, results/qc/rseqc/A1_1.readdistribution.txt, results/qc/rseqc/A2_1.junctionanno.junction.bed, results/star/B1_1/Aligned.sortedByCoord.out.bam, results/qc/rseqc/B2_1.stats.txt, results/qc/rseqc/A1_1.infer_experiment.txt, results/qc/rseqc/B1_1.junctionsat.junctionSaturation_plot.pdf, results/qc/rseqc/A2_1.infer_experiment.txt, results/qc/rseqc/B1_1.readgc.GC_plot.pdf, results/qc/rseqc/A2_1.stats.txt, results/qc/rseqc/A2_1.readgc.GC_plot.pdf, results/qc/rseqc/B2_1.infer_experiment.txt, results/qc/rseqc/B1_1.junctionanno.junction.bed, results/qc/rseqc/A2_1.readdistribution.txt, results/qc/rseqc/A1_1.junctionsat.junctionSaturation_plot.pdf
resources: tmpdir=/tmp
Activating conda environment: .snakemake/conda/45752da70eae001d2312f5cc28f0893b_
Activating conda environment: .snakemake/conda/fdd167354aa72861c97737721eeeb361_
Activating conda environment: .snakemake/conda/fdd167354aa72861c97737721eeeb361_
[Wed Sep 20 22:16:13 2023]
Finished job 4.
53 of 61 steps (87%) done
Select jobs to execute...
[Wed Sep 20 22:16:13 2023]
rule deseq2_init:
input: results/counts/all.tsv
output: results/deseq2/all.rds, results/deseq2/normcounts.tsv
log: logs/deseq2/init.log
jobid: 3
reason: Missing output files: results/deseq2/all.rds, results/deseq2/normcounts.tsv; Input files updated by another job: results/counts/all.tsv
resources: tmpdir=/tmp
Activating conda environment: .snakemake/conda/97c4b09bad6bfa4ada6fe14151f3ec1c_
[Wed Sep 20 22:16:16 2023]
Finished job 26.
54 of 61 steps (89%) done
Select jobs to execute...
[Wed Sep 20 22:16:16 2023]
rule gene_2_symbol:
input: results/counts/all.tsv
output: results/counts/all.symbol.tsv
log: logs/gene2symbol/results/counts/all.log
jobid: 25
reason: Missing output files: results/counts/all.symbol.tsv; Input files updated by another job: results/counts/all.tsv
wildcards: prefix=results/counts/all
resources: tmpdir=/tmp
Activating conda environment: .snakemake/conda/c39789d0da717f333b9d416fae158cd7_
?25h
──
Attaching packages
─────────────────────────────────────── tidyverse 1.3.2 ──
✔
ggplot2
3.4.3
✔
purrr
1.0.1
✔
tibble
3.2.1
✔
dplyr
1.1.3
✔
tidyr
1.3.0
✔
stringr
1.5.0
✔
readr
2.1.4
✔
forcats
1.0.0
──
Conflicts
────────────────────────────────────────── tidyverse_conflicts() ──
✖
dplyr
::
filter()
masks
stats
::filter()
✖
dplyr
::
lag()
masks
stats
::lag()
✖
dplyr
::
select()
masks
biomaRt
::select()
?25h?25h?25h
Possible SSL connectivity problems detected.
Please report this issue at https://github.com/grimbough/biomaRt/issues
Error in curl::curl_fetch_memory(url, handle = handle) :
SSL peer certificate or SSH remote key was not OK: [uswest.ensembl.org] SSL certificate problem: certificate has expired
[Wed Sep 20 22:16:26 2023]
Finished job 3.
55 of 61 steps (90%) done
Select jobs to execute...
[Wed Sep 20 22:16:26 2023]
rule deseq2:
input: results/deseq2/all.rds
output: results/diffexp/treated-vs-untreated.diffexp.tsv, results/diffexp/treated-vs-untreated.ma-plot.svg
log: logs/deseq2/treated-vs-untreated.diffexp.log
jobid: 2
reason: Missing output files: results/diffexp/treated-vs-untreated.diffexp.tsv; Input files updated by another job: results/deseq2/all.rds
wildcards: contrast=treated-vs-untreated
resources: tmpdir=/tmp
Activating conda environment: .snakemake/conda/97c4b09bad6bfa4ada6fe14151f3ec1c_
Ensembl site unresponsive, trying asia mirror
[Wed Sep 20 22:16:36 2023]
Finished job 2.
56 of 61 steps (92%) done
Select jobs to execute...
[Wed Sep 20 22:16:36 2023]
rule gene_2_symbol:
input: results/diffexp/treated-vs-untreated.diffexp.tsv
output: results/diffexp/treated-vs-untreated.diffexp.symbol.tsv
log: logs/gene2symbol/results/diffexp/treated-vs-untreated.diffexp.log
jobid: 1
reason: Missing output files: results/diffexp/treated-vs-untreated.diffexp.symbol.tsv; Input files updated by another job: results/diffexp/treated-vs-untreated.diffexp.tsv
wildcards: prefix=results/diffexp/treated-vs-untreated.diffexp
resources: tmpdir=/tmp
Activating conda environment: .snakemake/conda/c39789d0da717f333b9d416fae158cd7_
?25h
──
Attaching packages
─────────────────────────────────────── tidyverse 1.3.2 ──
✔
ggplot2
3.4.3
✔
purrr
1.0.1
✔
tibble
3.2.1
✔
dplyr
1.1.3
✔
tidyr
1.3.0
✔
stringr
1.5.0
✔
readr
2.1.4
✔
forcats
1.0.0
──
Conflicts
────────────────────────────────────────── tidyverse_conflicts() ──
✖
dplyr
::
filter()
masks
stats
::filter()
✖
dplyr
::
lag()
masks
stats
::lag()
✖
dplyr
::
select()
masks
biomaRt
::select()
?25h
?25h?25h
Ensembl site unresponsive, trying www mirror
Ensembl site unresponsive, trying www mirror
?25h?25h
Batch submitting query [===============>---------------] 50% eta: 5s
?25h
?25h?25h?25h?25h?25h
[Wed Sep 20 22:17:00 2023]
Finished job 25.
57 of 61 steps (93%) done
Select jobs to execute...
[Wed Sep 20 22:17:00 2023]
rule pca:
input: results/deseq2/all.rds
output: results/pca.condition.svg
log: logs/pca.condition.log
jobid: 60
reason: Missing output files: results/pca.condition.svg; Input files updated by another job: results/deseq2/all.rds
wildcards: variable=condition
resources: tmpdir=/tmp
Activating conda environment: .snakemake/conda/97c4b09bad6bfa4ada6fe14151f3ec1c_
?25h?25h
[Wed Sep 20 22:17:10 2023]
Finished job 60.
58 of 61 steps (95%) done
Select jobs to execute...
[Wed Sep 20 22:17:10 2023]
rule gene_2_symbol:
input: results/deseq2/normcounts.tsv
output: results/deseq2/normcounts.symbol.tsv
log: logs/gene2symbol/results/deseq2/normcounts.log
jobid: 24
reason: Missing output files: results/deseq2/normcounts.symbol.tsv; Input files updated by another job: results/deseq2/normcounts.tsv
wildcards: prefix=results/deseq2/normcounts
resources: tmpdir=/tmp
Activating conda environment: .snakemake/conda/c39789d0da717f333b9d416fae158cd7_
?25h?25h?25h?25h?25h?25h
[Wed Sep 20 22:17:12 2023]
Finished job 1.
59 of 61 steps (97%) done
?25h
──
Attaching packages
─────────────────────────────────────── tidyverse 1.3.2 ──
✔
ggplot2
3.4.3
✔
purrr
1.0.1
✔
tibble
3.2.1
✔
dplyr
1.1.3
✔
tidyr
1.3.0
✔
stringr
1.5.0
✔
readr
2.1.4
✔
forcats
1.0.0
──
Conflicts
────────────────────────────────────────── tidyverse_conflicts() ──
✖
dplyr
::
filter()
masks
stats
::filter()
✖
dplyr
::
lag()
masks
stats
::lag()
✖
dplyr
::
select()
masks
biomaRt
::select()
?25h?25h?25h
Ensembl site unresponsive, trying www mirror
?25h?25h
?25h?25h?25h?25h?25h?25h
[Wed Sep 20 22:17:38 2023]
Finished job 24.
60 of 61 steps (98%) done
Select jobs to execute...
[Wed Sep 20 22:17:38 2023]
localrule all:
input: results/diffexp/treated-vs-untreated.diffexp.symbol.tsv, results/deseq2/normcounts.symbol.tsv, results/counts/all.symbol.tsv, results/qc/multiqc_report.html, results/pca.condition.svg, results/pca.condition.svg
jobid: 0
reason: Input files updated by another job: results/counts/all.symbol.tsv, results/qc/multiqc_report.html, results/diffexp/treated-vs-untreated.diffexp.symbol.tsv, results/pca.condition.svg, results/deseq2/normcounts.symbol.tsv
resources: tmpdir=/tmp
[Wed Sep 20 22:17:38 2023]
Finished job 0.
61 of 61 steps (100%) done
Complete log: .snakemake/log/2023-09-20T215311.271697.snakemake.log
Register outputs¶
QC¶
multiqc_file = ln.File(f"{root_dir}/.test/results/qc/multiqc_report.html")
multiqc_file.save()
How would I register all QC files?
multiqc_results = ln.File.from_dir(f"{root_dir}/results/qc/multiqc_report_data/")
ln.save(multiqc_results)
Count matrix¶
count_matrix = ln.File(f"{root_dir}/.test/results/counts/all.symbol.tsv")
count_matrix.save()
❗ file has more than one suffix (path.suffixes), using only last suffix: '.tsv'
Track Snakemake ID¶
Snakemake does not have an easily accessible ID that is associated with a run. Therefore, we need to extract it from the log files. We’re planning to simplify this process in the future.
import pathlib
from datetime import datetime
PATH_TO_DOT_SNAKEMAKE_LOG = "rna-seq-star-deseq2/.test/.snakemake/log"
log_files_file_names = list(
map(
lambda lf: str(lf).split("/")[-1],
list(pathlib.Path(PATH_TO_DOT_SNAKEMAKE_LOG).glob("*.snakemake.log")),
)
)
timestamps = [
datetime.strptime(filename.split(".")[0], "%Y-%m-%dT%H%M%S")
for filename in log_files_file_names
]
snakemake_id = log_files_file_names[timestamps.index(max(timestamps))].split(".")[1]
Let us add the information about the session ID to our run record:
run.reference = snakemake_id
run.reference_type = "snakemake_id"
run.save()
Link biological entities¶
To make the count matrix queryable by biological entities (genes, experimental metadata, etc.), we can now proceed with: Bulk RNA-seq
Visualize¶
View data lineage:
count_matrix.view_flow()
View the database content:
ln.view()
File
storage_id | key | suffix | accessor | description | version | size | hash | hash_type | transform_id | run_id | initial_version_id | updated_at | created_by_id | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | ||||||||||||||
M8klgyfZO619dYLGwUVy | t6Qgo37v | rna-seq-star-deseq2/.test/results/counts/all.s... | .tsv | None | None | None | 115658 | Zf_hnhy4E3w4b30mRmhS2w | md5 | 0wOgM2TBtkMV6M | O0aXDnvbt4EHLqbc0RO6 | None | 2023-09-20 22:17:38 | DzTjkKse |
hzj9JOr81f7F4MsM04pO | t6Qgo37v | rna-seq-star-deseq2/.test/results/qc/multiqc_r... | .html | None | None | None | 1125890 | YPAwYAd7A-mhaHCXV_aWXw | md5 | 0wOgM2TBtkMV6M | O0aXDnvbt4EHLqbc0RO6 | None | 2023-09-20 22:17:38 | DzTjkKse |
ajrQiMQ0VjTnbTY1EziY | t6Qgo37v | rna-seq-star-deseq2/.test/ngs-test-data/reads/... | .fq | None | None | None | 2159449 | ofhOQDhdGWvkyzMgeuVh1g | md5 | hCfEkAZ94EeCNt | XltwcXAZ7H4twaCujeqX | None | 2023-09-20 21:53:10 | DzTjkKse |
NfeSeRde79nEawAOjdQx | t6Qgo37v | rna-seq-star-deseq2/.test/ngs-test-data/reads/... | .fq | None | None | None | 935120 | 27JbZ5KW0JsMRkICIMVoAQ | md5 | hCfEkAZ94EeCNt | XltwcXAZ7H4twaCujeqX | None | 2023-09-20 21:53:10 | DzTjkKse |
qc1Rwx7x3Kw1ZrBWfjfr | t6Qgo37v | rna-seq-star-deseq2/.test/ngs-test-data/reads/... | .fq | None | None | None | 2218894 | DqfThx982Ai4akCcx-HikA | md5 | hCfEkAZ94EeCNt | XltwcXAZ7H4twaCujeqX | None | 2023-09-20 21:53:10 | DzTjkKse |
VMlHiTDcEtWocndgXCUG | t6Qgo37v | rna-seq-star-deseq2/.test/ngs-test-data/reads/... | .fq | None | None | None | 925634 | B1MLHnWgnl4yOok0kkYIvA | md5 | hCfEkAZ94EeCNt | XltwcXAZ7H4twaCujeqX | None | 2023-09-20 21:53:10 | DzTjkKse |
Fz3z3IP3CAQevQzHtxfL | t6Qgo37v | rna-seq-star-deseq2/.test/ngs-test-data/reads/... | .fq | None | None | None | 2159449 | zg7RgcXv7ue_dHb_Q7-1LQ | md5 | hCfEkAZ94EeCNt | XltwcXAZ7H4twaCujeqX | None | 2023-09-20 21:53:10 | DzTjkKse |
Run
transform_id | run_at | created_by_id | reference | reference_type | |
---|---|---|---|---|---|
id | |||||
XltwcXAZ7H4twaCujeqX | hCfEkAZ94EeCNt | 2023-09-20 21:53:10 | DzTjkKse | https://github.com/snakemake-workflows/rna-seq... | url |
O0aXDnvbt4EHLqbc0RO6 | 0wOgM2TBtkMV6M | 2023-09-20 21:53:10 | DzTjkKse | 271697 | snakemake_id |
Storage
root | type | region | updated_at | created_by_id | |
---|---|---|---|---|---|
id | |||||
t6Qgo37v | /home/runner/work/snakemake-lamin-usecases/sna... | local | None | 2023-09-20 21:53:07 | DzTjkKse |
Transform
name | short_name | version | type | reference | reference_type | initial_version_id | updated_at | created_by_id | |
---|---|---|---|---|---|---|---|---|---|
id | |||||||||
0wOgM2TBtkMV6M | snakemake-workflows/rna-seq-star-deseq2 | None | 2.0.0 | pipeline | https://github.com/laminlabs/snakemake-lamin-u... | None | None | 2023-09-20 22:17:38 | DzTjkKse |
hCfEkAZ94EeCNt | Download | None | None | notebook | None | None | None | 2023-09-20 21:53:10 | DzTjkKse |
User
handle | name | updated_at | ||
---|---|---|---|---|
id | ||||
DzTjkKse | testuser1 | [email protected] | Test User1 | 2023-09-20 21:53:07 |
Clean up the test instance:
!lamin delete --force snakemake-bulkrna
Show code cell output
💡 deleting instance testuser1/snakemake-bulkrna
✅ deleted instance settings file: /home/runner/.lamin/instance--testuser1--snakemake-bulkrna.env
✅ instance cache deleted
✅ deleted '.lndb' sqlite file
❗ consider manually deleting your stored data: /home/runner/work/snakemake-lamin-usecases/snakemake-lamin-usecases/docs