R Language Tutorial => Pivot and unpivot tabular data with...

Example

Convert from wide form to long form

Load data USArrests from datasets.

data("USArrests")
head(USArrests)

           Murder Assault UrbanPop Rape
Alabama      13.2     236       58 21.2
Alaska       10.0     263       48 44.5
Arizona       8.1     294       80 31.0
Arkansas      8.8     190       50 19.5
California    9.0     276       91 40.6
Colorado      7.9     204       78 38.7

Use ?USArrests to find out more. First, convert to data.table. The names of states are row names in the original data.frame.

library(data.table)
DT <- as.data.table(USArrests, keep.rownames=TRUE)

This is data in the wide form. It has a column for each variable. The data can also be stored in long form without loss of information. The long form has one column that stores the variable names. Then, it has another column for the variable values. The long form of USArrests looks like so.

            State    Crime  Rate
  1:       Alabama   Murder  13.2
  2:        Alaska   Murder  10.0
  3:       Arizona   Murder   8.1
  4:      Arkansas   Murder   8.8
  5:    California   Murder   9.0
 ---                             
196:      Virginia     Rape  20.7
197:    Washington     Rape  26.2
198: West Virginia     Rape   9.3
199:     Wisconsin     Rape  10.8
200:       Wyoming     Rape  15.6

We use the melt function to switch from wide form to long form.

DTm <- melt(DT)
names(DTm) <- c("State", "Crime", "Rate")

By default, melt treats all columns with numeric data as variables with values. In USArrests, the variable UrbanPop represents the percentage urban population of a state. It is different from the other variabes, Murder, Assault and Rape, which are violent crimes reported per 100,000 people. Suppose we want to retain UrbanPop column. We achieve this by setting id.vars as follows.

DTmu <- melt(DT, id.vars=c("rn", "UrbanPop" ), 
             variable.name='Crime', value.name = "Rate")
names(DTmu)[1] <- "State"

Note that we have specified the names of the column containing category names (Murder, Assault, etc.) with variable.name and the column containing the values with value.name. Our data looks like so.

             State UrbanPop  Crime Rate
  1:       Alabama       58 Murder 13.2
  2:        Alaska       48 Murder 10.0
  3:       Arizona       80 Murder  8.1
  4:      Arkansas       50 Murder  8.8
  5:    California       91 Murder  9.0

Generating summaries with with split-apply-combine style approach is a breeze. For example, to summarize violent crimes by state?

DTmu[, .(ViolentCrime = sum(Rate)), by=State]

This gives:

        State ViolentCrime
1:    Alabama        270.4
2:     Alaska        317.5
3:    Arizona        333.1
4:   Arkansas        218.3
5: California        325.6
6:   Colorado        250.6

PDF - Download R Language for free

Previous Next

R Language

Fastest Entity Framework Extensions

Example

Got any R Language Question?

R Language

R Language Pivot and unpivot with data.table Pivot and unpivot tabular data with data.table - I

Fastest Entity Framework Extensions

Example

Got any R Language Question?