Gene finding is an important aspect of biological research. The state of gene finding is such that many approaches exist yet the problem itself is still largely unsolved. The various signals involved in gene location and modification offer a window of opportunity for the accurate prediction of genes. Many algorithms attempt to break down the problem of gene prediction into smaller portions focusing on various signals and properties. The individual study of these signals becomes warranted. This work focuses on splice site prediction, and more specifically, acceptor splice site prediction. Several current approaches, weight matrix models and Markov models, are utilized as well as a novel approach known as the log odds ratio. The log odds ratio is found to be able to double the positive predictive value obtained through the other methods. In agreement with a similar work performed by Lukas Habegger those log odds ratio models which incorporate 2nd order Markov models perform favorably. Also, a maximum dependency decomposition is performed which, in congruence with Lukas Habegger’s findings, highlights a position close to that of the branch point sequence as being a position of maximum dependency. These results suggest that maximum dependency decompositions may be a novel method towards examining the elusive branch point sequence in eukaryotic organisms. Lukas Habegger observed a stronger maximum dependency in Leishmania major most likely because of differences between spliceosome function in lower and upper eukaryotes.
Library of Congress Subject Headings
Genomes--Data processing; Vertebrates--Genetics; Gene targeting; RNA splicing
Department, Program, or Center
Biomedical Sciences (CHST)
Foster, Eric, "Acceptor splice site prediction" (2007). Thesis. Rochester Institute of Technology. Accessed from
RIT – Main Campus