Tutorial by Examples

Creating a Scala function that receives an python RDD is easy. What you need to build is a function that get a JavaRDD[Any] import org.apache.spark.api.java.JavaRDD def doSomethingByPythonRDD(rdd :JavaRDD[Any]) = { //do something rdd.map { x => ??? } }
This part of development you should serialize the python RDD to the JVM. This process uses the main development of Spark to call the jar function. from pyspark.serializers import PickleSerializer, AutoBatchedSerializer rdd = sc.parallelize(range(10000)) reserialized_rdd = rdd._reserialize(Aut...
To call this code you should create the jar of your scala code. Than you have to call your spark submit like this: spark-submit --master yarn-client --jars ./my-scala-code.jar --driver-class-path ./my-scala-code.jar main.py This will allow you to call any kind of scala code that you need in your...

Page 1 of 1