R Language Reshaping data


Often data comes in tables. Generally one can divide this tabular data in wide and long formats. In a wide format, each variable has its own column.

PersonHeight [cm]Age [yr]

However, sometimes it is more convenient to have a long format, in which all variables are in one column and the values are in a second column.

AlisonHeight [cm]178
BobHeight [cm]174
CarlHeight [cm]182
AlisonAge [yr]20
BobAge [yr]45
CarlAge [yr]31

Base R, as well as third party packages can be used to simplify this process. For each of the options, the mtcars dataset will be used. By default, this dataset is in a long format. In order for the packages to work, we will insert the row names as the first column.

mtcars # shows the dataset
data <- data.frame(observation=row.names(mtcars),mtcars)

Base R

There are two functions in base R that can be used to convert between wide and long format: stack() and unstack().

long <- stack(data)
long # this shows the long format
wide <- unstack(long)    
wide # this shows the wide format

However, these functions can become very complex for more advanced use cases. Luckily, there are other options using third party packages.

The tidyr package

This package uses gather() to convert from wide to long and spread() to convert from long to wide.

long <- gather(data, variable, value, 2:12) # where variable is the name of the 
# variable column, value indicates the name of the value column and 2:12 refers to
# the columns to be converted.
long # shows the long result
wide <- spread(long,variable,value)
wide # shows the wide result (~data)

The data.table package

The data.table package extends the reshape2 functions and uses the function melt() to go from wide to long and dcast() to go from long to wide.

long <- melt(data,'observation',2:12,'variable', 'value')
long # shows the long result
wide <- dcast(long, observation ~ variable)
wide # shows the wide result (~data)