We have developed a layout-based math retrieval system by indexing on pairs of symbols in mathematical expressions. Existing approaches to layout-based retrieval include tree edit distance-based matching on MathML trees (Kamali and Tompa, 2013) and longest common subsequence matching in LATEX strings (Kumar et al., 2012). In our work, we compare our new layout-based retrieval method with a math retrieval system built using the conventional text-based retrieval system Lucene (Zanibbi and Yuan, 2011), as such systems are commonly used for math search. We show that the search results returned by our system are scored by participants in a study as significantly more similar than those of the comparison system and that our system is fast enough to be used in real time.
Library of Congress Subject Headings
Mathematical symbols (Typefaces)--Classification; Information retrieval; Layout (Printing)
Department, Program, or Center
Computer Science (GCCIS)
Stalnaker, David, "Math expression retrieval using symbol pairs in layout trees" (2013). Thesis. Rochester Institute of Technology. Accessed from
RIT – Main Campus