Tutorial by Examples

Detailed instructions on getting pyspark set up or installed.
The underlying example is just the one given in the official pyspark documentation. Please click here to reach this example. # the first step involves reading the source text file from HDFS text_file = sc.textFile("hdfs://...") # this step involves the actual computation for reading...
There are two methods using which you can consume data from AWS S3 bucket. Using sc.textFile (or sc.wholeTextFiles) API: This api can be used for HDFS and local file system as well. aws_config = {} # set your aws credential here sc._jsc.hadoopConfiguration().set("fs.s3n.awsSecretAccessK...

Page 1 of 1