Convert from long form to wide form
To recover data from the previous example, use dcast
like so.
DTc <- dcast(DTmu, State + UrbanPop ~ Crime)
This gives the data in the original wide form.
State UrbanPop Murder Assault Rape
1: Alabama 58 13.2 236 21.2
2: Alaska 48 10.0 263 44.5
3: Arizona 80 8.1 294 31.0
4: Arkansas 50 8.8 190 19.5
5: California 91 9.0 276 40.6
Here, the formula notation is used to specify the columns that form a unique record identifier (LHS) and the column containing category labels for new column names (RHS). Which column to use for the numeric values? By default, dcast
uses the first column with numerical values left over when from the formula specification. To make explicit, use the parameter value.var
with column name.
When the operation produces a list of values in each cell, dcast
provides a fun.aggregate
method to handle the situation. Say I am interested in states with similar urban population when investigating crime rates. I add a column Decile
with computed information.
DTmu[, Decile := cut(UrbanPop, quantile(UrbanPop, probs = seq(0, 1, by=0.1)))]
levels(DTmu$Decile) <- paste0(1:10, "D")
Now, casting Decile ~ Crime
produces multiple values per cell. I can use fun.aggregate
to determine how these are handled. Both text and numerical values can be handle this way.
dcast(DTmu, Decile ~ Crime, value.var="Rate", fun.aggregate=sum)
This gives:
dcast(DTmu, Decile ~ Crime, value.var="Rate", fun.aggregate=mean)
This gives:
State UrbanPop Crime Rate Decile
1: Alabama 58 Murder 13.2 4D
2: Alaska 48 Murder 10.0 2D
3: Arizona 80 Murder 8.1 8D
4: Arkansas 50 Murder 8.8 2D
5: California 91 Murder 9.0 10D
There are multiple states in each decile of the urban population. Use fun.aggregate
to specify how these should be handled.
dcast(DTmu, Decile ~ Crime, value.var="Rate", fun.aggregate=sum)
This sums over the data for like states, giving the following.
Decile Murder Assault Rape
1: 1D 39.4 808 62.6
2: 2D 35.3 815 94.3
3: 3D 22.6 451 67.7
4: 4D 54.9 898 106.0
5: 5D 42.4 758 107.6