cuda Getting started with cuda Let's launch a single CUDA thread to say hello


This simple CUDA program demonstrates how to write a function that will execute on the GPU (aka "device"). The CPU, or "host", creates CUDA threads by calling special functions called "kernels". CUDA programs are C++ programs with additional syntax.

To see how it works, put the following code in a file named

#include <stdio.h>

// __global__ functions, or "kernels", execute on the device
__global__ void hello_kernel(void)
  printf("Hello, world from the device!\n");

int main(void)
  // greet from the host
  printf("Hello, world from the host!\n");

  // launch a kernel with a single thread to greet from the device

  // wait for the device to finish so that we see the message

  return 0;

(Note that in order to use the printf function on the device, you need a device that has a compute capability of at least 2.0. See the versions overview for details.)

Now let's compile the program using the NVIDIA compiler and run it:

$ nvcc -o hello
$ ./hello
Hello, world from the host!
Hello, world from the device!

Some additional information about the above example:

  • nvcc stands for "NVIDIA CUDA Compiler". It separates source code into host and device components.
  • __global__ is a CUDA keyword used in function declarations indicating that the function runs on the GPU device and is called from the host.
  • Triple angle brackets (<<<,>>>) mark a call from host code to device code (also called "kernel launch"). The numbers within these triple brackets indicate the number of times to execute in parallel and the number of threads.