What:
Caching can optimize computation in Spark. Caching stores data in memory and is a special case of persistence. Here is explained what happens when you cache an RDD in Spark.
Why:
Basically, caching saves an interim partial result - usually after transformations - of your original data. So, ...