Tutorial by Examples

Broadcast variables are read only shared objects which can be created with SparkContext.broadcast method: val broadcastVariable = sc.broadcast(Array(1, 2, 3)) and read using value method: val someRDD = sc.parallelize(Array(1, 2, 3, 4)) someRDD.map( i => broadcastVariable.value.apply(...
Accumulators are write-only variables which can be created with SparkContext.accumulator: val accumulator = sc.accumulator(0, name = "My accumulator") // name is optional modified with +=: val someRDD = sc.parallelize(Array(1, 2, 3, 4)) someRDD.foreach(element => accumulator += el...
Define AccumulatorParam import org.apache.spark.AccumulatorParam object StringAccumulator extends AccumulatorParam[String] { def zero(s: String): String = s def addInPlace(s1: String, s2: String)= s1 + s2 } Use: val accumulator = sc.accumulator("")(StringAccumulator) sc.pa...
Define AccumulatorParam: from pyspark import AccumulatorParam class StringAccumulator(AccumulatorParam): def zero(self, s): return s def addInPlace(self, s1, s2): return s1 + s2 accumulator = sc.accumulator("", StringAccumulator()) def add(x): gl...

Page 1 of 1