apache-spark Partitions of an RDD


Example

As mentioned in "Remarks", a partition is a part/slice/chunk of an RDD. Below is a minimal example on how to request a minimum number of partitions for your RDD:

In [1]: mylistRDD = sc.parallelize([1, 2, 3, 4, 5, 6, 7, 8, 9, 10], 2)

In [2]: mylistRDD.getNumPartitions()
Out[2]: 2

Notice in [1] how we passed 2 as a second parameter of parallelize(). That parameter says that we want our RDD to has at least 2 partitions.