One thing to note is your resources versus the size of data you are joining. This is where your Spark Join code might fail giving you memory errors. For this reason make sure you configure your Spark jobs really well depending on the size of data. Following is an example of a configuration for a join of 1.5 million to 200 million.
Using Spark-Shell
spark-shell --executor-memory 32G --num-executors 80 --driver-memory 10g --executor-cores 10
Using Spark Submit
spark-submit --executor-memory 32G --num-executors 80 --driver-memory 10g --executor-cores 10 code.jar