Author

Kyle Dewey

Abstract

Within the biological sciences, spreadsheets are commonly used as a data entry and storage medium. While this practice is simple and generally well understood, the unrestrained flexibility of the spreadsheet medium allows errors to accumulate and potentially propagate. Such errors impede accurate analysis, hindering research. The underlying problem is that the error correction facilities of typical spreadsheet programs are lackluster at best, if they exist at all. For this reason, Error Sentinel was developed. Error Sentinel is a spreadsheet program with programmable error correction facilities. These facilities allow users to define exactly what clean data is, along with corrections for erroneous data. Such rules are specified via a custom visual programming language. Once error correction rules are written, users inputting data need not be familiar with the rules or even have programming skills in order to utilize them. Error Sentinel can be used interactively like a typical spreadsheet program, or non-interactively as with more traditional error correction techniques. To test Error Sentinel's real-world capabilities, it was successfully applied to the correction of the mtHaplogroups data set. This application has shown that Error Sentinel requires far less time and code to perform error correction than with previous methods. Benchmarking has shown that such gains are at only a modest cost in performance. While Error Sentinel appears quite simplistic compared to typical spreadsheet programs, its error correction facilities are robust, and it is fully capable of being applied to arbitrary data sets represented in the spreadsheet medium.

Library of Congress Subject Headings

Electronic spreadsheets--Computer programs; Biology--Data processing; Error-correcting codes (Information theory)

Publication Date

6-21-2011

Document Type

Thesis

Department, Program, or Center

Thomas H. Gosnell School of Life Sciences (COS)

Advisor

Osier, Michael

Advisor/Committee Member

Skuse, Gary

Advisor/Committee Member

Newman, Dina

Comments

Note: imported from RIT’s Digital Media Library running on DSpace to RIT Scholar Works. Physical copy available through RIT's The Wallace Library at: HF5548.2 .D49 2011

Campus

RIT – Main Campus

Share

COinS