parallel processing - Which are the common causes for non scalability of shared memory programs? -
whenever paralelizes application expected outcome decent speedup, not case.
it usual program runs in x
seconds, parallelized use 8 cores not achieve x/8
seconds (optimal speedup). in extreme cases, takes more time original sequential program.
why? , importantly, how improve scalability?
there few common causes of non scalability:
- too synchronization: problems (and conservative programmers) require lots of synchronization between parallel tasks, eliminates of parallelism in algorithm, making slower.
1.1. make sure use minimum synchronization possible algorithm. openmp
instance, simple change synchronized
atomic
can result in relevant difference.
1.2 worse sequential algorithm might offer better parallelism opportunities, if have chance try else might worth shot.
- memory bandwidth limitation: common "trivial" implementation of algorithm not optimized locality, implies heavy communication costs between processors , main memory.
2.1 optimize locality: means know application run, available cache memories , how change data structures maximize cache usage.
- too parallelization overhead: parallel task "small" overhead thread/process creation big compared parallel region total time, causes poor speedup or speed-down.
Comments
Post a Comment