Abstract

A wide range of researchers is beginning to utilize customized statistical methods for analyzing data as hardware and software become cheaper and more widely available. Cluster Rank Analysis (CRA) is an existing multivariate statistical algorithm that existed as an inefficient service-oriented application. Here it is described how CRA was optimized and parallelized using an available computing cluster and both open source and custom software. This was followed by the development of a command-line submission system for CRA jobs, as well as a Web retrieval system for the results of analyses. A subsequent timing study revealed speedup that quickly rose to 15 by the use 35 processors, and should reach a proposed maximum of 19 given over 100 processors. It was found that this speedup was limited primarily by the serial portion of code; the Ethernet communication network was sufficient for this application. By the time that even 10 processors were involved in parallel runs, the average runtime had dropped from over 100 minutes to approximately 15 minutes, before being reduced to 6 minutes by 80 processors. The locations of bottlenecks suggest that further performance increases are possible through additional parallelization. This work with CRA illustrates (1) the speed with which high-performance in-house applications can be developed and (2) the speed and efficiency with which statistical analyses of complex data structures can be carried out given commodity hardware and software resources.

Library of Congress Subject Headings

Cluster analysis--Data processing; Biology--Research--Data processing; Parallel processing (Electronic computers)

Publication Date

2007

Document Type

Thesis

Student Type

Graduate

Degree Name

Bioinformatics (MS)

Department, Program, or Center

Thomas H. Gosnell School of Life Sciences (COS)

Advisor

Michael Osier

Advisor/Committee Member

Dina Newman

Advisor/Committee Member

Paul Shipman

Comments

Physical copy available from RIT's Wallace Library at QA278 .E77 2007

Recommended Citation

Esposito, Anthony G. Jr., "Parallelizing the Cluster Rank Analysis application" (2007). Thesis. Rochester Institute of Technology. Accessed from
https://repository.rit.edu/theses/7786

Campus

RIT – Main Campus

Download

COinS

Theses

Parallelizing the Cluster Rank Analysis application

Abstract

Library of Congress Subject Headings

Publication Date

Document Type

Student Type

Degree Name

Department, Program, or Center

Advisor

Advisor/Committee Member

Advisor/Committee Member

Comments

Recommended Citation

Campus

Search

Browse

Author Corner

RIT Links

Theses

Parallelizing the Cluster Rank Analysis application

Author

Abstract

Library of Congress Subject Headings

Publication Date

Document Type

Student Type

Degree Name

Department, Program, or Center

Advisor

Advisor/Committee Member

Advisor/Committee Member

Comments

Recommended Citation

Campus

Share

Search

Browse

Author Corner

RIT Links