In multiprocessor systems, data parallelism is the execution of the same task on data distributed across multiple processors. It involves splitting the data set into smaller data partitions or batches. The process to split the data among the different processors is call “Data Partitioning” and it is an important factor of efficiency for data parallel processing implementation. Data partitioning influences the workload in each processing unit and the network traffic between processes. A poor partition quality can lead to serious performance problems. This research presents a data partitioning method that can be used to improve the performance of data parallel implementations. The proposed method relies on using an initial screening experiment to run a portion of data units. Regression is then used to create a prediction model of the processing times for each data unit. Using the estimated processing time, load balancing is achieved by implementing a greedy algorithm to distribute the units in a parallel environment. Discrete event simulation is used as the application of this research. Comparisons between equal data partitioning and the methodology proposed in this research indicate that time savings and equal load balancing can be achieved.
Library of Congress Subject Headings
Multiprocessors--Data processing; Statistics; Parallel processing (Electronic computers)
Industrial and Systems Engineering (MS)
Department, Program, or Center
Industrial and Systems Engineering (KGCOE)
Hidalgo Murillo, Manuel E., "Using Statistical Analysis to Improve Data Partitioning in Algorithms for Data Parallel Processing Implementation" (2016). Thesis. Rochester Institute of Technology. Accessed from
RIT – Main Campus