Abstract

In multiprocessor systems, data parallelism is the execution of the same task on data distributed across multiple processors. It involves splitting the data set into smaller data partitions or batches. The process to split the data among the different processors is call “Data Partitioning” and it is an important factor of efficiency for data parallel processing implementation. Data partitioning influences the workload in each processing unit and the network traffic between processes. A poor partition quality can lead to serious performance problems. This research presents a data partitioning method that can be used to improve the performance of data parallel implementations. The proposed method relies on using an initial screening experiment to run a portion of data units. Regression is then used to create a prediction model of the processing times for each data unit. Using the estimated processing time, load balancing is achieved by implementing a greedy algorithm to distribute the units in a parallel environment. Discrete event simulation is used as the application of this research. Comparisons between equal data partitioning and the methodology proposed in this research indicate that time savings and equal load balancing can be achieved.

Library of Congress Subject Headings

Multiprocessors--Data processing; Statistics; Parallel processing (Electronic computers)

Publication Date

9-2016

Document Type

Thesis

Student Type

Graduate

Degree Name

Industrial and Systems Engineering (MS)

Department, Program, or Center

Industrial and Systems Engineering (KGCOE)

Advisor

Rachel Silvestrini

Advisor/Committee Member

Katie McConky

Comments

Physical copy available from RIT's Wallace Library at QA76.5 .H44 2016

Campus

RIT – Main Campus

Plan Codes

ISEE-MS

Share

COinS