There will be no exercise for you to complete in nextflow this week. Instead, I have designed a simple nextflow script that can perform both single-end and paired-end analysis with kallisto by using if/else
statements.
Furthermore, the script is capable of downloading the reference cDNA index file and performing kallisto index
if neither file is not provided.
Before we walk thorugh the script, you need to know how to handle single-end fastq files with nextflow. In the example below, we specify the paths to the fastq files using fromPath()
. This will return a channel with the paths to the fastq files.
Save the below script and run it:
#!/usr/bin/env nextflow
params.input = "/data/MA5112/Practicals/RNA-Seq/Kallisto_Practical/Data/*.fastq.gz"
Channel
.fromPath(params.input)
.set{ch_reads}
ch_reads.view()
/data/MA5112/Practicals/RNA-Seq/Kallisto_Practical/Data/ctrl_2.fastq.gz
/data/MA5112/Practicals/RNA-Seq/Kallisto_Practical/Data/ctrl_1.fastq.gz
/data/MA5112/Practicals/RNA-Seq/Kallisto_Practical/Data/ctrl_3.fastq.gz
/data/MA5112/Practicals/RNA-Seq/Kallisto_Practical/Data/A375_1.fastq.gz
/data/MA5112/Practicals/RNA-Seq/Kallisto_Practical/Data/A375_2.fastq.gz
/data/MA5112/Practicals/RNA-Seq/Kallisto_Practical/Data/A375_3.fastq.gz
/data/MA5112/Practicals/RNA-Seq/Kallisto_Practical/Data/A549_1.fastq.gz
/data/MA5112/Practicals/RNA-Seq/Kallisto_Practical/Data/A549_2.fastq.gz
/data/MA5112/Practicals/RNA-Seq/Kallisto_Practical/Data/A549_3.fastq.gz
The channel outputs file paths. We want it to mimick fromFilePairs()
and create a tuple with a key for each file path, so we can name the output directories when running kallisto quant
for single end data.
Save the below script and run it:
#!/usr/bin/env nextflow
params.input = "/data/MA5112/Practicals/RNA-Seq/Kallisto_Practical/Data/*.fastq.gz"
Channel
.fromPath(params.input)
.map{ file -> [file.simpleName, file]}
.set{ch_reads}
ch_reads.view()
[ctrl_2, /data/MA5112/Practicals/RNA-Seq/Kallisto_Practical/Data/ctrl_2.fastq.gz]
[ctrl_1, /data/MA5112/Practicals/RNA-Seq/Kallisto_Practical/Data/ctrl_1.fastq.gz]
[ctrl_3, /data/MA5112/Practicals/RNA-Seq/Kallisto_Practical/Data/ctrl_3.fastq.gz]
[A375_1, /data/MA5112/Practicals/RNA-Seq/Kallisto_Practical/Data/A375_1.fastq.gz]
[A375_2, /data/MA5112/Practicals/RNA-Seq/Kallisto_Practical/Data/A375_2.fastq.gz]
[A375_3, /data/MA5112/Practicals/RNA-Seq/Kallisto_Practical/Data/A375_3.fastq.gz]
[A549_1, /data/MA5112/Practicals/RNA-Seq/Kallisto_Practical/Data/A549_1.fastq.gz]
[A549_2, /data/MA5112/Practicals/RNA-Seq/Kallisto_Practical/Data/A549_2.fastq.gz]
[A549_3, /data/MA5112/Practicals/RNA-Seq/Kallisto_Practical/Data/A549_3.fastq.gz]
Now we can use the format tuple val(base), file(reads) from ch_reads
for kallisto alignment.
Read here for descriptions on how to get unique IDs based on filenames using simpleName
and baseName
.
Go to the main nextflow script available at this link: https://github.com/BarryDigby/barrydigby.github.io/blob/master/RNA-Seq/main.nf.
Save the contents as main.nf
and invoke the help message by running nextflow run main.nf --help
.
Note: the fastq files for the practical are located at: /data/MA5112/Practicals/RNA-Seq/Kallisto_Practical/Data/