To start working with Sparks distributed dataframes, you must connect your R program with an existing Spark Cluster.
library(SparkR) sc <- sparkR.init() # connection to Spark context sqlContext <- sparkRSQL.init(sc) # connection to SQL context
Here are infos how to connect your IDE to a Spark cluster.
There is an Apache Spark introduction topic with install instructions. Basically, you can employ a Spark Cluster locally via java (see instructions) or use (non-free) cloud applications (e.g. Microsoft Azure [topic site], IBM).