apache-spark Calling scala jobs from pyspark Creating a Scala functions that receives a python RDD


Example

Creating a Scala function that receives an python RDD is easy. What you need to build is a function that get a JavaRDD[Any]

import org.apache.spark.api.java.JavaRDD

def doSomethingByPythonRDD(rdd :JavaRDD[Any]) = {
    //do something
    rdd.map { x => ??? }
}