Example
The GPU offers six different memory regions. They differ in their latency, size and accessibility from different threads.
- Global Memory: The largest memory available and one of the few ones to exchange data with the host. This memory has the highest latency and is available for all threads.
- Constant Memory: A read only part of the global memory, which can only be read by other threads. Its advantage is the lower latency compared to the global memory
- Texture Memory: Also a part of constant memory, specifically designed for textures
- Shared Memory: This memory region is placed close to the SM and can only accessed by a single thread block. It offers way lower latency than the global memory and a bit less latency than the constant memory.
- Registers: Only accessible by a single thread and the fastest memory of them all. But if the compiler detects that there are not enough Registers for the kernel needs, it will outsource variables to local memory.
- Local Memory: A thread-only accessible part of memory in the global memory region. Used as a backup for registers, to be avoided if possible.