pyspark Tutorial => Sample Word Count in Pyspark

Example

The underlying example is just the one given in the official pyspark documentation. Please click here to reach this example.

# the first step involves reading the source text file from HDFS 
text_file = sc.textFile("hdfs://...")

# this step involves the actual computation for reading the number of words in the file
# flatmap, map and reduceByKey are all spark RDD functions
counts = text_file.flatMap(lambda line: line.split(" ")) \
             .map(lambda word: (word, 1)) \
             .reduceByKey(lambda a, b: a + b)

# the final step is just saving the result.
counts.saveAsTextFile("hdfs://...")

PDF - Download pyspark for free

Previous Next

pyspark

Fastest Entity Framework Extensions

Example

Got any pyspark Question?

pyspark

pyspark Getting started with pyspark Sample Word Count in Pyspark

Fastest Entity Framework Extensions

Example

Got any pyspark Question?