gpu - The impact of goto instruction at intra-warp divergence in CUDA code -
for simple intra-warp thread divergence in cuda, know sm selects re-convergence point (pc address), , executes instructions in both/multiple paths while disabling effects of execution threads haven't taken path.
example, in below piece of code:
if( threadidx.x < 16 ) { a: // something. } else { b: // else. } c: // rest of code.
c
re-convergence point, warp scheduler schedules instructions @ both a
, b
, while disabling instructions @ a
upper half-warp , disabling instructions @ b
lower half-warp. when reaches c
, instructions enabled threads inside warp.
my question sm able handle code including goto
instruction above? or there's no guarantee chosen re-convergence point optimum?
instance, if have below control flow in cuda code implemented using goto
a: // code here. b: // code here too. if( threadidx.x < 16 ) { c: // something. goto a; } // else. goto b;
will sm smart enough decide b
re-convergence point intra-warp divergence caused if
instruction?
in general, goto
unstructured control flow interferes many compiler optimizations, regardless of platform. cuda c compiler should handle code goto
in functionally correct way, performance may suboptimal.
part of suboptimal performance may compiler's placement of convergence points. can examine convergence points in generated machine code (sass) cuobjdump --dump-sass
. ssy
instruction records convergence points, , .s
suffix on instruction indicates control transferred last recorded convergence point.
Comments
Post a Comment