The SparkR package let's you work with distributed data frames on top of a Spark cluster. These allow you to do operations like selection, filtering, aggregation on very large datasets. SparkR overview SparkR package documentation
SparkR