We describe source code level parallelization for the kira direct gravitational Nbody integrator, the workhorse of the starlab production environment for simulating dense stellar systems. The parallelization strategy, called “j-parallelization”, involves the partition of the computational domain by distributing all particles in the system among the available processors. Partial forces on the particles to be advanced are calculated in parallel by their parent processors, and are then summed in a final global operation. Once total forces are obtained, the computing elements proceed to the computation of their particle trajectories. We report the results of timing measurements on four different parallel computers, and compare them with theoretical predictions. The computers employ either a high-speed interconnect, a NUMA architecture to minimize the communication overhead or are distributed in a grid. The code scales well in the domain tested, which ranges from 1024 - 65536 stars on 1 - 128 processors, providing satisfactory speedup. Running the production environment on a grid becomes inefficient for more than 60 processors distributed across three sites.

Publication Date



Also archived in: arXiv:0711.0643 v1 Nov 5, 2007. Publisher's version can be found here: http://www.sciencedirect.com/science?_ob=MImg&_imagekey=B6TJK-4R5F0BG-1-16&_cdi=5313&_user=47004&_orig=browse&_coverDate=07%2F31%2F2008&_sk=999869994&view=c&wchp=dGLzVzz-zSkWA&md5=9e4ca1eec1d155ecf11c05a5bdbf61b2&ie=/sdarticle.pdf ISSN:1384-1076 Note: imported from RIT’s Digital Media Library running on DSpace to RIT Scholar Works in February 2014.

Document Type


Department, Program, or Center

School of Physics and Astronomy (COS)


RIT – Main Campus