Intrinsically disordered proteins (IDPs) are polypeptide sequences that do not form a rigid three dimensional structure when isolated in the cytosol of the cell. These sequences are very common in most genomes, and usually are involved in many protein-protein interactions. IDPs also play a key role in many diseases, including cancer and Huntington's disease. However, IDPs are difficult to study because of their amorphous shape, high mutation rate, and unique amino acid composition. These obstacles make homology studies especially difficult.
This study focused on generating a new substitution matrix designed to aid in homology studies of IDPs. The matrix was generated using a genetic algorithm (GA). GAs are alternative hill-climbing methods for finding solutions in complex problem spaces. To achieve this goal, a GA models the evolutionary process found in nature by "breeding" solutions to the problem until one of sufficient quality is produced.
The GA implemented in this study produced a substitution matrix for use in differentiation between homologous and non-homologous proteins containing disordered regions. The matrix showed some correlation to the patterns of evolution found in disordered proteins and their general sequence makeup. However, when compared to a commonly used substitution matrix, BLOSUM, the GA's solution did not show significant improvement. But the results here do show a general proof of concept, and that given modifications to the GA, more time, or more resources, a substitution matrix capable of out-performing BLOSUM is potentially possible.
Department, Program, or Center
Thomas H. Gosnell School of Life Sciences (COS)
Handen, Adam, "Determining a new alignment scoring matrix for disordered proteins" (2013). Thesis. Rochester Institute of Technology. Accessed from
RIT – Main Campus