Tutorial by Examples | RIP Tutorial

Creating a Scala functions that receives a python RDD

Creating a Scala function that receives an python RDD is easy. What you need to build is a function that get a JavaRDD[Any] import org.apache.spark.api.java.JavaRDD def doSomethingByPythonRDD(rdd :JavaRDD[Any]) = { //do something rdd.map { x => ??? } }

apache-spark • Calling scala jobs from pyspark

Serialize and Send python RDD to scala code

This part of development you should serialize the python RDD to the JVM. This process uses the main development of Spark to call the jar function. from pyspark.serializers import PickleSerializer, AutoBatchedSerializer rdd = sc.parallelize(range(10000)) reserialized_rdd = rdd._reserialize(Aut...

apache-spark • Calling scala jobs from pyspark

How to call spark-submit

To call this code you should create the jar of your scala code. Than you have to call your spark submit like this: spark-submit --master yarn-client --jars ./my-scala-code.jar --driver-class-path ./my-scala-code.jar main.py This will allow you to call any kind of scala code that you need in your...

apache-spark • Calling scala jobs from pyspark