Lets build a kernel to generate a grayscale image. We will use image data which is defined using uints for each component and with order RGBA.
__constant sampler_t sampler = CLK_NORMALIZED_COORDS_FALSE |
CLK_ADDRESS_CLAMP_TO_EDGE |
CLK_FILTER_LINEAR;
__kernel void Grayscale(__read_only image2d_t input, __write_only image2d_t output) {
int2 gid = (int2)(get_global_id(0), get_global_id(1));
int2 size = get_image_dim(input);
if(all(gid < size)){
uint4 pixel = read_imageui(input, sampler, gid);
float4 color = convert_float4(pixel) / 255;
color.xyz = 0.2126*color.x + 0.7152*color.y + 0.0722*color.z;
pixel = convert_uint4_rte(color * 255);
write_imageui(output, gid, pixel);
}
}
Now lets walk through that code step by step. The first line creates a variable in the __constant memory region of type sampler_t. This sampler is used to further specify the access to our image data. Please refer to the Khronos Docs for a full documentation.
We allocated the input as read_only and the output as write_only before we called our kernel so we add those modifiers here.
image2d and image3d are always allocated on the global memory, therefore we can omit the __global modifier here.
Then we get our thread id which defines the pixel we are going to convert to grayscale. We also query the size to make sure that our thread is not accessing unallocated memory. This will definitley crash your kernel if you forget that.
After we made sure that we are a legitimate thread, we read our pixel out of our input image. We then convert it to float to avoid loss of decimal places, do some calculations, convert it back and write it into the output.