Generating motifs from known active sites and matching those motifs to an uncharacterized protein is a classic way of determining protein function. Until now, the generation of motifs has been based purely on enzymatic function. This approach does not account for situations where highly different active sites can arrive at the same function by processes like convergent evolution. As such, a secondary metric on which to base the generation of motifs is necessary. This metric exists in the form of UniProt designation for homologous proteins on a global scale or PFam for designation of homologous proteins at the active site level.
Here, we describe a tool to generate highly selective motifs using the aforementioned metrics. We were able to collapse a large number of proteins into their representative motifs with little loss in sensitivity, creating an “average” representation of each motif. These motifs will aid the characterizing proteins of known structure but unknown function.
Library of Congress Subject Headings
Proteins--Analysis--Data processing; Proteomics
Department, Program, or Center
Thomas H. Gosnell School of Life Sciences (COS)
Paul A. Craig
Baker, Cameron, "Homology Based Motif Generation" (2015). Thesis. Rochester Institute of Technology. Accessed from
RIT – Main Campus