Abstract

Timely access to the most up to date versions of resources, such as data and software, is of paramount importance for researchers in an active field like Biology. We introduce a grid enabled biological data and software collection portal architecture, SALSA (a Scalable Simple Architecture), that is tailored towards fast integration of new computational resources made available by ever faster advancing and diversifying research in this area. We identify two models that guide the design of SALSA: heterogeneous database model and network growth model with preferential attachment. SALSA recognizes the challenges that are noted by the previous research on heterogeneous database model inherent in biological database resources; these resources are autonomously managed and lack a common database schema. SALSA is also guided by a model for the growth of the portal’s collection (of data and associated software to process this data) from previous research on related collections (e.g. citation networks and software package dependencies). This model suggests that in the presence of components that have a higher likelihood of gaining new connections (e.g., popular resources such as BLAST or FASTA sequences), the relationships between components tend to organize in a small-world scale-free network. The growth model helps the portal developers identify important hub components that emerge by taking part in increasing number of tasks as the portal grows. In order to effectively improve the overall user experience, developers can direct expensive development efforts (e.g., query optimization, user interface, documentation, etc.) to hub components, rather than to specialized components that have a lesser likelihood of developing to become hubs. In this paper we discuss a grid enabled web portal implementation that is built to contain a growing collection of biological data and software to process this data. The implementation that we present is a realization of Scalable Simple Architecture (SALSA) that strives to rapidly integrate newly published components into the existing collection in a sustainable fashion. Notably, this implementation uses flexibility of XML for component management, XSL for web user interface, SRB and MCAT for large data storage.

Publication Date

2006

Comments

ACM, IEEE Note: imported from RIT’s Digital Media Library running on DSpace to RIT Scholar Works in February 2014.

Document Type

Article

Department, Program, or Center

Computer Science (GCCIS)

Campus

RIT – Main Campus

Share

COinS