A description of the kallisto pipeline has been provided for you below.
There is no need to run this for this weeks tutorial, it has been posted as a learning resource.
Kallisto requires an indexed transcriptome file for downstream quantification.
kallisto index -i GRCh38.idx Homo_sapiens.GRCh38.cdna.all.fa
-i: Output filenameKallisto index produces an indexed transcriptome file
Kallisto can perform quantification using either single-end or paired-end fastq files.
kallisto quant \
         -i Genome Index file \
         -t 2 \
         -o outDir/ \
         --bias \
         FASTQ_1, FASTQ2
kallisto quant \
         --single \
         -l 200 \
         -s 30 \
         -i Genome Index file \
         -t 2 \
         -o outDir/ \
         --bias \
         FASTQ
-i Indexed genome file from kallisto index--single Indicate input is single-end reads (requires -l and -s)-t n threads to use-o Output directory of the sample, containing 3 files. Do not name the directory outDir, name it according to the sample name e.g CTRL_2.fastq should have the directory name CTRL_2/ etc.-l Estimated average fragment length-s Estimated standard deviation of fragment length--bias Perform sequence based bias correctionNote: Estimated fragment lengths and standard deviation must be retrieved from the sequencing center. If using public data, try to find this information online. Using incorrect values will greatly influence the outputs. 200 and 30 are typically used for average fragment length and standard deviation, respectively.
Kallisto quant will output a directory for each sample containing:
abundance.h5abundance.tsvrun_info.jsonThe two abundance files contain transcript quantification information. For downstream analysis we can use either the .h5 file or the .tsv file. The difference between the .h5 and .tsv file is the .h5 file contains bootstrapping information if the option was specified during kallisto quant. This is for downstream analysis using Sleuth (not covered in this tutorial).
We did not specify the -bootstrap option, thus we can use either the .h5 or .tsv file for analysis in R as they are identical.