gpu - The impact of goto instruction at intra-warp divergence in CUDA code -

for simple intra-warp thread divergence in cuda, know sm selects re-convergence point (pc address), , executes instructions in both/multiple paths while disabling effects of execution threads haven't taken path.
example, in below piece of code:

if( threadidx.x < 16 ) {     a:     // something. } else {     b:     // else. } c: // rest of code.

c re-convergence point, warp scheduler schedules instructions @ both a , b, while disabling instructions @ a upper half-warp , disabling instructions @ b lower half-warp. when reaches c, instructions enabled threads inside warp.

my question sm able handle code including goto instruction above? or there's no guarantee chosen re-convergence point optimum?
instance, if have below control flow in cuda code implemented using goto

a: // code here. b: // code here too. if( threadidx.x < 16 ) {     c:     // something.     goto a; } // else. goto b;

will sm smart enough decide b re-convergence point intra-warp divergence caused if instruction?

in general, goto unstructured control flow interferes many compiler optimizations, regardless of platform. cuda c compiler should handle code goto in functionally correct way, performance may suboptimal.

part of suboptimal performance may compiler's placement of convergence points. can examine convergence points in generated machine code (sass) cuobjdump --dump-sass. ssy instruction records convergence points, , .s suffix on instruction indicates control transferred last recorded convergence point.

Search This Blog

WIKI

gpu - The impact of goto instruction at intra-warp divergence in CUDA code -

Comments

Post a Comment

Popular posts from this blog

android - Automated my builds -

how to proxy from https to http with lighttpd -

python - Flask migration error -