data.table Reshaping, stacking and splitting melt and cast with data.table


Example

data.table offers a wide range of possibilities to reshape your data both efficiently and easily

For instance, while reshaping from long to wide you can both pass several variables into the value.var and into the fun.aggregate parameters at the same time

library(data.table) #v>=1.9.6
DT <- data.table(mtcars)

Long to wide

dcast(DT, gear ~ cyl, value.var = c("disp", "hp"), fun = list(mean, sum))
   gear disp_mean_4 disp_mean_6 disp_mean_8 hp_mean_4 hp_mean_6 hp_mean_8 disp_sum_4 disp_sum_6 disp_sum_8 hp_sum_4 hp_sum_6 hp_sum_8
1:    3     120.100       241.5    357.6167        97     107.5  194.1667      120.1      483.0     4291.4       97      215     2330
2:    4     102.625       163.8         NaN        76     116.5       NaN      821.0      655.2        0.0      608      466        0
3:    5     107.700       145.0    326.0000       102     175.0  299.5000      215.4      145.0      652.0      204      175      599

This will set gear as the index column, while mean and sum will be calculated for disp and hp for every gear and cyl combination. In case some combinations don't exist you could specify additional parameters such as na.rm = TRUE (which will be passed to mean and sum functions) or specify the builtin fill argument. You can also add margins, drop missing combinations and subset the data. See more in ?data.table::dcast


Wide to long

While reshaping from wide to long, you can pass columns to the measure.vars parameter using regular expressions, for instance

print(melt(DT, c("cyl", "gear"), measure = patterns("^d", "e")), n = 10)
    cyl gear variable value1 value2
 1:   6    4        1 160.00  16.46
 2:   6    4        1 160.00  17.02
 3:   4    4        1 108.00  18.61
 4:   6    3        1 258.00  19.44
 5:   8    3        1 360.00  17.02
---                                
60:   4    5        2   3.77   5.00
61:   8    5        2   4.22   5.00
62:   6    5        2   3.62   5.00
63:   8    5        2   3.54   5.00
64:   4    4        2   4.11   4.00

This will melt the data by cyl and gear as the index columns, while all the values for the variables that begin with d (disp & drat) will be present in value1 and the values for the variables that contain the letter e in them (qsec and gear) will be present in the value2 column.

You can also rename all the column names in the result while specifying variable.name and value.name arguments or decide if you want the character columns to be automatically converted to factors or not while specifying variable.factor and value.factor arguments. See more in ?data.table::melt