apache-spark Tutorial => Joins

Remarks

One thing to note is your resources versus the size of data you are joining. This is where your Spark Join code might fail giving you memory errors. For this reason make sure you configure your Spark jobs really well depending on the size of data. Following is an example of a configuration for a join of 1.5 million to 200 million.

Using Spark-Shell

spark-shell   --executor-memory 32G   --num-executors 80  --driver-memory 10g --executor-cores 10

Using Spark Submit

spark-submit   --executor-memory 32G   --num-executors 80  --driver-memory 10g --executor-cores 10 code.jar

Broadcast Hash Join in Spark

PDF - Download apache-spark for free

Previous Next

apache-spark

Fastest Entity Framework Extensions

Remarks

Got any apache-spark Question?

apache-spark

apache-spark Joins

Fastest Entity Framework Extensions

Remarks

Joins Related Examples

Got any apache-spark Question?