I applied clustering analysis to the problem of creating tagged training data for optical character recognition (OCR). The creation of labeled character data by hand is a slow and cumbersome process. My belief is that clustering methods can be applied to character data before tagging it, allowing the operator to label entire groups of characters at once and greatly speeding the time in which tagged character data can be generated. This thesis will provide proof of concept as a basis for more in depth research and eventually the creation of a sophisticated application utilizing these techniques for the generation of labeled training data for OCR systems.
Library of Congress Subject Headings
Cluster analysis; Genetic algorithms; Optical character recognition devices
Department, Program, or Center
Computer Science (GCCIS)
Greenwald, Jennifer, "Optical character categorization: Clustering as it applies to OCR" (1997). Thesis. Rochester Institute of Technology. Accessed from
RIT – Main Campus