Document image analysis is the study of converting documents from paper form to an electronic form that captures the information content of the document. Necessary processing includes recognition of document layout (to determine reading order, and to distinguish text from diagrams), recognition of text (called Optical Character Recognition, OCR), and processing of diagrams and photographs. The processing of diagrams has been an active research area for several decades. A selection of existing diagram recognition techniques are presented in this paper. Challenging problems in diagram recognition include (1) the great diversity of diagram types, (2) the difficulty of adequately describing the syntax and semantics of diagram notations, and (3) the need to handle imaging noise. Recognition techniques that are discussed include blackboard systems, stochastic grammars, Hidden Markov Models, and graph grammars.
Date of creation, presentation, or exhibit
Department, Program, or Center
Computer Science (GCCIS)
"Treatment of Diagrams in Document Image Analysis," Theory and Application of Diagrams, First International Conference. Held in Edinburgh, Scotland, UK: 1-3 September 2000. Lecture Notes in Computer Science 1889 (2000) 330-344 isbn: 3-540-67915-4
RIT – Main Campus