When allocating Memory you have the option to choose between different modes:
Read-only memory is allocated in the __constant memory region, while the other two are allocated in the normal __global region.
In addition to the accessibility you can define where your memory is allocated.
Speed-wise, access to device global memory is the fastest one. But you also need to call it twice to copy data. Using the Host-pointer is the slowest one, while Alloc_host_ptr offers a higher speed.
When using use_host_ptr, the device does exactly that: It uses your data in the system ram, which of course is paged by the os. So every memory call has to go through the cpu to handle potential pagefaults. When the data is available, the cpu copies it into pinned memory and passes it to the DMA controller using precious cpu clock cycles. On the contrary, alloc_host_ptr allocates pinned memory in the system ram. This memory is placed outside of the pageswap mechanism and therefore has a guaranteed availability. Therefore the device can skip the cpu entirely when accessing system ram and utilize DMA to quickly copy data to the device.