any advice for parallel programming?

Joined
Apr 4, 2003
Messages
836
fairly soon, i am getting an awesome opportunity to help program on a high-performance system. okay, well not THAT high-performace because it's not in the top500 list, but still cool.

any advice to help me achieve vectorizeable code at a very high percentage of my loops?

any advice also in general for a MIMD shared-memory system?

if it helps, it will be mostly flow networks and linear algebra.
 
The best rule of thumb (which is probably obvious) is make sure your innermost loops are data-independent from one iteration to the next. Keep data dependency at the outermost levels if at all possible.

Other than that, it's tough to give much advice without knowing the specific architecture of the system or details of the algorithm itself. How many nodes? What bus is used for shared memory access? What programming interface is used to share memory? What API are you using? MPI? OpenMP? How big are the vector operands?
 
The best rule of thumb (which is probably obvious) is make sure your innermost loops are data-independent from one iteration to the next. Keep data dependency at the outermost levels if at all possible.

yeah, that's what i know to do now, and that's about all i know about increasing parallelism on a vector proc.
Other than that, it's tough to give much advice without knowing the specific architecture of the system or details of the algorithm itself. How many nodes? What bus is used for shared memory access? What programming interface is used to share memory? What API are you using? MPI? OpenMP? How big are the vector operands?

4 nodes + 1 head. the shared memory isn't at the hardware level between processors in the same self-contained system but rather across a gig ethernet LAN. i'll be honest and say that i'm not certain what you mean by "programming interface to share memory."

i'll be using OpenMPI, and i'm still looking for a lower-level documentation for the vector systems.
 
4 nodes + 1 head. the shared memory isn't at the hardware level between processors in the same self-contained system but rather across a gig ethernet LAN. i'll be honest and say that i'm not certain what you mean by "programming interface to share memory."

i'll be using OpenMPI, and i'm still looking for a lower-level documentation for the vector systems.

You're using a distributed memory architecture. I asked about the programming interface because there are different ways to access memory in a shared arch. Knowing it's not a shared memory system and knowing you're using MPI answers my question. Are the nodes single processor/core?

Again, it's tough to give any specific advice. (Your question is kind of like "Hey guys, I have to write a program - any advice?" :p ) Vector sizes, operations, and number of vector operands would be important things to look at for starters. Since the system uses distributed memory, also look for places to exploit coarse-grained parallelism - that will probably give you the best performance and scalability.
 
thank you very much for your advice on an extremely ambiguous question. i've gotten some good leads from you.

enjoy the upcoming holiday if you're in the US.
 
Back
Top