apache-flink Tutorial => Checkpointing

Introduction

(tested on Flink 1.2 and below)

Every function, source or operator in Flink can be stateful. Checkpoints allow Flink to recover state and positions in the streams to give the application the same semantics as a failure-free execution. It is the mecanism behind the guarantees of fault tolerance and exactly-once processing.

Read this article to understand the internals.

Remarks

Checkpoints are only useful when a failure happens in the cluster, for example when a taskmanager fails. They do not persist after the job itself failed or was canceled.

To be able to resume a stateful job after failure/cancellation, have a look at savepoints or externalized checkpoints (flink 1.2+).