The quest for higher performance in computationally intensive tasks is and will always be an ongoing effort. General purpose processors (GPP) have not been sufficient for many of these tasks which has led to research focused towards computing on specialty processors and graphics processing units (GPU). While GPU provide sufficient speedups for some tasks, other specialty processors may be better suited, more economical, or more efficient for different types of tasks. Sorting is an important task in many applications and can be computationally intensive when dealing with large data sets. One such specialty processor that has proven to be a viable solution for sorting is the Cell Broadband Engine (CBE). The CBE is being used as the main platform for this thesis since there are already applications for it that require sorting software. The Cell processor is a general purpose processor that combines one master PowerPC core with eight other vector processors connected via a high bandwidth interconnect bus. The user must explicitly manage the communication, scheduling, and load-balancing between the vector processors and the PowerPC processor to achieve the highest efficiency. By optimizing the sorting algorithms for the vector processors, large speedups can be achieved because multiple operations occur simultaneously. Optimized sorting software is often sought when sorting is not the main purpose of the application. This keeps overheads low so that the performance gains can be realized from the actual code that is to be optimized on specialty processors. Often having sorted datasets enable algorithms to run faster and are more predictably. The motivation behind this thesis is that there is currently no standard library of sorting algorithms that have been optimized for the CBE. Lack of standard libraries makes writing code for the CBE difficult. Results from previous works have also not been sufficient in providing specific measurements of sorting performance. This thesis will explore the development and analysis of a variety of optimized parallel sorting algorithms written for the Cell processor. This thesis will focus on the sorting of both individual elements within vectors as well as sorting entire vectors within arrays. The sorting algorithms, written in C++, that will be optimized and analyzed include, but are not limited to bitonic sort, heap sort, merge sort, and quick sort. A communication management framework will also be created as a main focus of this thesis in order to better understand the architecture of the processor.

Library of Congress Subject Headings

Information storage and retrieval systems--Data processing; Classification--Data processing; Computer algorithms; Computer network architectures

Publication Date


Document Type



Not listed


Note: imported from RIT’s Digital Media Library running on DSpace to RIT Scholar Works in December 2013. Physical copy available through RIT's The Wallace Library at: QA76.9.D37 O44 2010


RIT – Main Campus