Abstract

Genome aligners are an important tool in bioinformatics research as they can be used to detect gene variants to create higher crop yields, detect abnormal gene production in cancer cell lines, or identify weaknesses in a newly discovered pathogen. Aligners work by taking sequenced DNA or RNA and mapping these reads to their corresponding location in a reference genome. Although beneficial as a tool, choosing which aligner to use for a project is often a difficult decision due to the large number of tools available and each one claiming to be the best at what it does. The goal of this project is to determine which aligner performs the best in a controlled environment using the default settings for six of the most used genome aligners: Bowtie2 (using both end-to-end and local alignment modes), Burrows-Wheeler Aligner (BWA), Hierarchical Indexing for Spliced Alignment of Transcripts (HISAT2), MUMmer4, Spliced Transcripts Alignment to a Reference (STAR), and TopHat2. Each aligner was run using 48 geographically distinct samples of Erysiphe necator, more commonly known as powdery mildew. Alignment results were assessed based on three major criteria: 1) the number of reads successfully mapped to the reference genome, 2) their runtimes using a varying number of cores, and 3) the percentage of the full transcriptome covered. Aligners were further analyzed for potential biases in the types of genes that were unable to be mapped. The results for each aligner were compared against one another to determine the aligner which had the best performance on the provided dataset. The two best performing aligners were BWA, which achieved the highest alignment rate, and HISAT2, which achieved the fastest runtime. Overall, HISAT2 was determined to be the better aligner of the two as both aligners had similar transcriptome coverage regardless of alignment rate.

Library of Congress Subject Headings

Genomics--Data processing; Nucleotide sequence--Data processing; Sequence alignment (Bioinformatics)

Publication Date

5-18-2020

Document Type

Thesis

Student Type

Graduate

Degree Name

Bioinformatics (MS)

Department, Program, or Center

Thomas H. Gosnell School of Life Sciences (COS)

Advisor

Michael V. Osier

Advisor/Committee Member

Lance Cadle-Davidson

Advisor/Committee Member

Andre O. Hudson

Campus

RIT – Main Campus

Plan Codes

BIOINFO-MS

Share

COinS