apache-spark Shared Variables Accumulators

Help us to keep this website almost Ad Free! It takes only 10 seconds of your time:
> Step 1: Go view our video on YouTube: EF Core Bulk Extensions
> Step 2: And Like the video. BONUS: You can also share it!

Example

Accumulators are write-only variables which can be created with SparkContext.accumulator:

val accumulator = sc.accumulator(0, name = "My accumulator") // name is optional

modified with +=:

val someRDD = sc.parallelize(Array(1, 2, 3, 4))
someRDD.foreach(element => accumulator += element)

and accessed with value method:

accumulator.value // 'value' is now equal to 10

Using accumulators is complicated by Spark's run-at-least-once guarantee for transformations. If a transformation needs to be recomputed for any reason, the accumulator updates during that transformation will be repeated. This means that accumulator values may be very different than they would be if tasks had run only once.


Note:

  1. Executors cannot read accumulator's value. Only the driver program can read the accumulator’s value, using its value method.
  2. It is almost similar to counter in Java/MapReduce. So you can relate accumulators to counters to understanding it easily


Got any apache-spark Question?