Tutorial by Examples

The key of parallelism is to use multiple threads to solve a problem (duh.) but there are some differences to classical multithreaded programming in how threads are organized. First lets talk about your typical GPU, for simplicities sake I'll focus on A GPU has many processing cores, which make it...
The GPU offers six different memory regions. They differ in their latency, size and accessibility from different threads. Global Memory: The largest memory available and one of the few ones to exchange data with the host. This memory has the highest latency and is available for all threads. Cons...
The typical scenario for your memory usage is to store the source data and the processed data in the global memory. When a threadblock starts, it first copies all relevant parts into the shared memory before getting their parts into the registers. Memory access latency also depends on your memory s...

Page 1 of 1