int atomic_add ( volatile __global int *p , int val)
unsigned int atomic_add ( volatile __global unsigned int *p , unsigned int val)
int atomic_add ( volatile __local int *p , int val)
unsigned int atomic_add ( volatile __local unsigned int *p ,unsigned int val)
p | val |
---|---|
pointer to cell | added to cell |
Performance depends on atomic operations number and memory space. Doing serial work almost always slows kernel execution because of gpu being a SIMD array and each unit in an array waits other units if they don't do same type of work.