Achieving fault tolerance is an inevitable problem in distributed systems, with it becoming more challenging in decentralized, heterogeneous, and dynamic-environment systems such as a Grid. When deploying applications requires time-criticality, how to allocate resources for jobs in a fault-tolerant manner is an important issue for the delivery of the services. The Water Threat Management project is a research to find solutions for the contamination incidents problems in urban water distribution systems, and it involves the development of the cyberinfrastructure in a Grid environment. To handle such urgent events properly, the deployment of the system demands real-time processing without the failure. Our approach of integrating a fault-tolerant framework into a Water Threat Management system provides fault tolerance at the "queuing stage" rather than the "job-execution stage" by scheduling jobs in fault-tolerant ways. This includes the development of the batch queuing system in the Cyberaide Shell project. In addition, we present a dynamic workflow in the Water Threat Management system that can reduce the queue wait time in the changing environment.
Library of Congress Subject Headings
Computational grids (Computer systems); Fault-tolerant computing; Water quality management--Data processing
Department, Program, or Center
Computer Science (GCCIS)
Moon, Young Suk, "Dynamic fault tolerant grid workflow in the water threat management project" (2010). Thesis. Rochester Institute of Technology. Accessed from
RIT – Main Campus