data.table Reshaping, stacking and splitting Going from wide to long format using melt


Example

Melting: The basics

Melting is used to transform data from wide to long format.

Starting with a wide data set:

DT = data.table(ID = letters[1:3], Age = 20:22, OB_A = 1:3, OB_B = 4:6, OB_C = 7:9)

We can melt our data using the melt function in data.table. This returns another data.table in long format:

melt(DT, id.vars = c("ID","Age"))
1:  a  20     OB_A     1
2:  b  21     OB_A     2
3:  c  22     OB_A     3
4:  a  20     OB_B     4
5:  b  21     OB_B     5
6:  c  22     OB_B     6
7:  a  20     OB_C     7
8:  b  21     OB_C     8
9:  c  22     OB_C     9

class(melt(DT, id.vars = c("ID","Age")))
# "data.table" "data.frame"

Any columns not set in the id.vars parameter are assumed to be variables. Alternatively, we can set these explicitly using the measure.vars argument:

melt(DT, measure.vars = c("OB_A","OB_B","OB_C"))
   ID Age variable value
1:  a  20     OB_A     1
2:  b  21     OB_A     2
3:  c  22     OB_A     3
4:  a  20     OB_B     4
5:  b  21     OB_B     5
6:  c  22     OB_B     6
7:  a  20     OB_C     7
8:  b  21     OB_C     8
9:  c  22     OB_C     9

In this case, any columns not set in measure.vars are assumed to be IDs.

If we set both explicitly, it will only return the columns selected:

melt(DT, id.vars = "ID", measure.vars = c("OB_C"))
   ID variable value
1:  a     OB_C     7
2:  b     OB_C     8
3:  c     OB_C     9

Naming variables and values in the result

We can manipulate the column names of the returned table using variable.name and value.name

melt(DT,
     id.vars = c("ID"), 
     measure.vars = c("OB_C"), 
     variable.name = "Test", 
     value.name = "Result"
     )
   ID Test Result
1:  a OB_C      7
2:  b OB_C      8
3:  c OB_C      9

Setting types for measure variables in the result

By default, melting a data.table converts all measure.vars to factors:

M_DT <- melt(DT,id.vars = c("ID"), measure.vars = c("OB_C"))
class(M_DT[, variable])
# "factor"

To set as character instead, use the variable.factor argument:

M_DT <- melt(DT,id.vars = c("ID"), measure.vars = c("OB_C"), variable.factor = FALSE)
class(M_DT[, variable])
# "character"

Values generally inherit from the data type of the originating column:

class(DT[, value])
# "integer"
class(M_DT[, value])
# "integer"

If there is a conflict, data types will be coerced. For example:

M_DT <- melt(DT,id.vars = c("Age"), measure.vars = c("ID","OB_C"))
class(M_DT[, value])
# "character"

When melting, any factor variables will be coerced to character type:

DT[, OB_C := factor(OB_C)]
M_DT <- melt(DT,id.vars = c("ID"), measure.vars = c("OB_C"))
class(M_DT)
# "character"

To avoid this and preserve the initial typing, use the value.factor argument:

M_DT <- melt(DT,id.vars = c("ID"), measure.vars = c("OB_C"), value.factor = TRUE)
class(M_DT)
# "factor"

Handling missing values

By default, any NA values are preserved in the molten data

DT = data.table(ID = letters[1:3], Age = 20:22, OB_A = 1:3, OB_B = 4:6, OB_C = c(7:8,NA))
melt(DT,id.vars = c("ID"), measure.vars = c("OB_C"))
   ID variable value
1:  a     OB_C     7
2:  b     OB_C     8
3:  c     OB_C    NA

If these should be removed from your data, set na.rm = TRUE

melt(DT,id.vars = c("ID"), measure.vars = c("OB_C"), na.rm = TRUE)
   ID variable value
1:  a     OB_C     7
2:  b     OB_C     8