Tutorial by Examples | RIP Tutorial

Threads and Execution

The key of parallelism is to use multiple threads to solve a problem (duh.) but there are some differences to classical multithreaded programming in how threads are organized. First lets talk about your typical GPU, for simplicities sake I'll focus on A GPU has many processing cores, which make it...

opencl • OpenCL hardware basics

GPU Memory

The GPU offers six different memory regions. They differ in their latency, size and accessibility from different threads. Global Memory: The largest memory available and one of the few ones to exchange data with the host. This memory has the highest latency and is available for all threads. Cons...

opencl • OpenCL hardware basics

Memory access

The typical scenario for your memory usage is to store the source data and the processed data in the global memory. When a threadblock starts, it first copies all relevant parts into the shared memory before getting their parts into the registers. Memory access latency also depends on your memory s...

opencl • OpenCL hardware basics