Tutorial by Examples

Window functions are used to do operations(generally aggregation) on a set of rows collectively called as window. Window functions work in Spark 1.4 or later. Window functions provides more operations then the built-in functions or UDFs, such as substr or round (extensively used before Spark 1.4). W...
To calculate moving average of salary of the employers based on their role: val movAvg = sampleData.withColumn("movingAverage", avg(sampleData("Salary")) .over( Window.partitionBy("Role").rowsBetween(-1,1)) ) withColumn() creates a new column named m...
To calculate moving average of salary of the employers based on their role: val cumSum = sampleData.withColumn("cumulativeSum", sum(sampleData("Salary")) .over( Window.partitionBy("Role").orderBy("Salary"))) orderBy() sorts salary column an...
This topic demonstrates how to use functions like withColumn, lead, lag, Level etc using Spark. Spark dataframe is an sql abstract layer on spark core functionalities. This enable user to write SQL on distributed data. Spark SQL supports hetrogenous file formats including JSON, XML, CSV , TSV etc. ...

Page 1 of 1