Abstract

We describe a symbol classification technique for identifying the expected locations of neighboring symbols in mathematical expressions. We use the seven symbol layout classes of the DRACULAE math notation parser (Zanibbi, Blostein, and Cordy, 2002) to represent expected locations for neighboring symbols: Ascender, Descender, Centered, Open Bracket, Non-Script, Variable Range (e.g. integrals) and Square Root. A new feature based on shape contexts (Belongie et al., 2002) named layout context is used to describe the arrangement of neighboring symbol bounding boxes relative to a reference symbol, and the nearest neighbor rule is used for classification. 1917 mathematical symbols from the University of Washington III document database are used in our experiments. Using a leave-one-out estimate, our best classification rate reaches nearly 80%. In our experiments, we find that the size of the symbol neighborhood, and number and arrangement of key points representing a symbol affect performance significantly.

Publication Date

2009

Comments

Note: imported from RIT’s Digital Media Library running on DSpace to RIT Scholar Works in February 2014.

Document Type

Article

Department, Program, or Center

Center for Advancing the Study of CyberInfrastructure

Campus

RIT – Main Campus

Share

COinS