A savepoint stores two things: (a) the positions of all datasources, (b) the states of operators. Savepoints are useful in many circonstances:
As of version 1.3 (also valid for earlier version):
checkpoint must be enabled for the savepoints to be possible. If you forget to explicitly enable checkpoint using:
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); env.enableCheckpointing(checkpointInterval);
you will get:
java.lang.IllegalStateException: Checkpointing disabled. You can enable it via the execution environment of your job
when using window operations, it is crucial to use event-time (vs ingestion or processing time) to yield proper results;
to be able to upgrade a program and reuse savepoints, manual uid must be set. This is because, by default, Flink changes the operator's UID after any change in their code;
Chained operators are identified by the ID of the first task. It’s not possible to manually assign an ID to an intermediate chained task, e.g. in the chain [ a -> b -> c ] only a can have its ID assigned manually, but not b or c. To work around this, you can manually define the task chains. If you rely on the automatic ID assignment, a change in the chaining behaviour will also change the IDs (see point above).
More info is available in the FAQ.