A description of the kallisto
pipeline has been provided for you below.
There is no need to run this for this weeks tutorial, it has been posted as a learning resource.
Kallisto requires an indexed transcriptome file for downstream quantification.
kallisto index -i GRCh38.idx Homo_sapiens.GRCh38.cdna.all.fa
-i
: Output filenameKallisto index produces an indexed transcriptome file
Kallisto can perform quantification using either single-end or paired-end fastq files.
kallisto quant \
-i Genome Index file \
-t 2 \
-o outDir/ \
--bias \
FASTQ_1, FASTQ2
kallisto quant \
--single \
-l 200 \
-s 30 \
-i Genome Index file \
-t 2 \
-o outDir/ \
--bias \
FASTQ
-i
Indexed genome file from kallisto index
--single
Indicate input is single-end reads (requires -l
and -s
)-t
n threads to use-o
Output directory of the sample, containing 3 files. Do not name the directory outDir
, name it according to the sample name e.g CTRL_2.fastq should have the directory name CTRL_2/ etc.-l
Estimated average fragment length-s
Estimated standard deviation of fragment length--bias
Perform sequence based bias correctionNote: Estimated fragment lengths and standard deviation must be retrieved from the sequencing center. If using public data, try to find this information online. Using incorrect values will greatly influence the outputs. 200 and 30 are typically used for average fragment length and standard deviation, respectively.
Kallisto quant
will output a directory for each sample containing:
abundance.h5
abundance.tsv
run_info.json
The two abundance files contain transcript quantification information. For downstream analysis we can use either the .h5
file or the .tsv
file. The difference between the .h5
and .tsv
file is the .h5
file contains bootstrapping information if the option was specified during kallisto quant. This is for downstream analysis using Sleuth (not covered in this tutorial).
We did not specify the -bootstrap option
, thus we can use either the .h5
or .tsv
file for analysis in R as they are identical.