Author

Lei HuFollow

Abstract

Math expressions are an essential part of scientific documents. Handwritten math expressions recognition can benefit human-computer interaction especially in the education domain and is a critical part of document recognition and analysis.

Parsing the spatial arrangement of symbols is an essential part of math expression recognition. A variety of parsing techniques have been developed during the past three decades, and fall into two groups. The first group is graph-based parsing. It selects a path or sub-graph which obeys some rule to form a possible interpretation for the given expression. The second group is grammar driven parsing. Grammars and related parameters are defined manually for different tasks. The time complexity of these two groups parsing is high, and they often impose some strict constraints to reduce the computation.

The aim of this thesis is working towards building a straightforward and effective parser with as few constraints as possible. First, we propose using a line of sight graph for representing the layout of strokes and symbols in math expressions. It achieves higher F-score than other graph representations and reduces search space for parsing. Second, we modify the shape context feature with Parzen window density estimation. This feature set works well for symbol segmentation, symbol classification and symbol layout analysis. We get a higher symbol segmentation F-score than other systems on CROHME 2014 dataset. Finally, we develop a Maximum Spanning Tree (MST) based parser using Edmonds' algorithm, which extracts an MST from the directed line of sight graph in two passes: first symbols are segmented, and then symbols and spatial relationship are labeled. The time complexity of our MST-based parsing is lower than the time complexity of CYK parsing with context-free grammars. Also, our MST-based parsing obtains higher structure rate and expression rate than CYK parsing when symbol segmentation is accurate. Correct structure means we get the structure of the symbol layout tree correct, even though the label of the edge in the symbol layout tree might be wrong. The performance of our math expression recognition system with MST-based parsing is competitive on CROHME 2012 and 2014 datasets.

For future work, how to incorporate symbol classifier result and correct segmentation error in MST-based parsing needs more research.

Library of Congress Subject Headings

Mathematical symbols (Typefaces)--Classification; Image processing--Digital techniques; Optical pattern recognition; Human-computer interaction

Publication Date

5-2016

Document Type

Dissertation

Student Type

Graduate

Degree Name

Computing and Information Sciences (Ph.D.)

Advisor

Richard Zanibbi

Advisor/Committee Member

Nathan Cahill

Advisor/Committee Member

Harold Mouchere

Comments

Physical copy available from RIT's Wallace Library at Z250.6.M3 H8 2016

Campus

RIT – Main Campus

Share

COinS