Lets look at a gamma correction kernel
__constant sampler_t sampler = CLK_NORMALIZED_COORDS_FALSE |
CLK_ADDRESS_CLAMP_TO_EDGE |
CLK_FILTER_LINEAR;
__kernel void Gamma(__read_only image2d_t input, __write_only image2d_t output, __constant float gamma) {
int2 gid = (int2)(get_global_id(0), get_global_id(1));
int2 size = get_image_dim(input);
if(all(gid < size)){
uint4 pixel = read_imageui(input, sampler, gid);
float4 color = convert_float4(pixel) / 255;
color = pow(color, (float4)(gamma));
pixel = convert_uint4_rte(color * 255);
write_imageui(output, gid, pixel);
}
}
Now lets walk through that code step by step. The first line creates a variable in the __constant memory region of type sampler_t. This sampler is used to further specify the access to our image data. Please refer to the Khronos Docs for a full documentation.
We allocated the input as read_only and the output as write_only before we called our kernel so we add those modifiers here.
image2d and image3d are always allocated on the global memory, therefore we can omit the __global modifier here. Our gamma value is located in __constant memory, so we specify that too.
Then we get our thread id which defines the pixel we are going to gamma correct. We also query the size to make sure that our thread is not accessing unallocated memory. This will definitley crash your kernel if you forget that.
After we made sure that we are a legitimate thread, we read our pixel out of our input image. We then convert it to float to avoid loss of decimal places, do some calculations, convert it back and write it into the output.