Author

Prabin Dutta

Abstract

The Human Genome Project generated vast amounts of DNA sequenced data scattered in disparate data sources in a variety of formats. Integrating biological data and extracting information held in DNA sequences are major ongoing tasks for biologists and software professionals. This thesis explored issues of finding, extracting, merging and synthesizing information from multiple disparate data sources containing DNA sequenced data, which is composed of 3 billion chemical building blocks of bases. We proposed a biological data integration framework based on typical usage patterns to simplify these issues for biologists. The framework uses a relational database management system at the backend, and provides techniques to extract, store, and manage the data. This framework was implemented, evaluated, and compared with existing biological data integration solutions.

Library of Congress Subject Headings

Human Genome Project--Data processing; Database management; Computational biology; Relational databases

Publication Date

2008

Document Type

Thesis

Student Type

Graduate

Degree Name

Computer Science (MS)

Department, Program, or Center

Computer Science (GCCIS)

Advisor

Rajendra Raj

Advisor/Committee Member

Paul Tymann

Advisor/Committee Member

Warren Carithers

Comments

Note: imported from RIT’s Digital Media Library running on DSpace to RIT Scholar Works. Physical copy available through RIT's The Wallace Library at: QH447 .D88 2008

Campus

RIT – Main Campus

Share

COinS