data.table Tutorial => Going from wide to long format using melt

Example

Melting: The basics

Melting is used to transform data from wide to long format.

Starting with a wide data set:

DT = data.table(ID = letters[1:3], Age = 20:22, OB_A = 1:3, OB_B = 4:6, OB_C = 7:9)

We can melt our data using the melt function in data.table. This returns another data.table in long format:

melt(DT, id.vars = c("ID","Age"))
1:  a  20     OB_A     1
2:  b  21     OB_A     2
3:  c  22     OB_A     3
4:  a  20     OB_B     4
5:  b  21     OB_B     5
6:  c  22     OB_B     6
7:  a  20     OB_C     7
8:  b  21     OB_C     8
9:  c  22     OB_C     9

class(melt(DT, id.vars = c("ID","Age")))
# "data.table" "data.frame"

Any columns not set in the id.vars parameter are assumed to be variables. Alternatively, we can set these explicitly using the measure.vars argument:

melt(DT, measure.vars = c("OB_A","OB_B","OB_C"))
   ID Age variable value
1:  a  20     OB_A     1
2:  b  21     OB_A     2
3:  c  22     OB_A     3
4:  a  20     OB_B     4
5:  b  21     OB_B     5
6:  c  22     OB_B     6
7:  a  20     OB_C     7
8:  b  21     OB_C     8
9:  c  22     OB_C     9

In this case, any columns not set in measure.vars are assumed to be IDs.

If we set both explicitly, it will only return the columns selected:

melt(DT, id.vars = "ID", measure.vars = c("OB_C"))
   ID variable value
1:  a     OB_C     7
2:  b     OB_C     8
3:  c     OB_C     9

Naming variables and values in the result

We can manipulate the column names of the returned table using variable.name and value.name

melt(DT,
     id.vars = c("ID"), 
     measure.vars = c("OB_C"), 
     variable.name = "Test", 
     value.name = "Result"
     )
   ID Test Result
1:  a OB_C      7
2:  b OB_C      8
3:  c OB_C      9

Setting types for measure variables in the result

By default, melting a data.table converts all measure.vars to factors:

M_DT <- melt(DT,id.vars = c("ID"), measure.vars = c("OB_C"))
class(M_DT[, variable])
# "factor"

To set as character instead, use the variable.factor argument:

M_DT <- melt(DT,id.vars = c("ID"), measure.vars = c("OB_C"), variable.factor = FALSE)
class(M_DT[, variable])
# "character"

Values generally inherit from the data type of the originating column:

class(DT[, value])
# "integer"
class(M_DT[, value])
# "integer"

If there is a conflict, data types will be coerced. For example:

M_DT <- melt(DT,id.vars = c("Age"), measure.vars = c("ID","OB_C"))
class(M_DT[, value])
# "character"

When melting, any factor variables will be coerced to character type:

DT[, OB_C := factor(OB_C)]
M_DT <- melt(DT,id.vars = c("ID"), measure.vars = c("OB_C"))
class(M_DT)
# "character"

To avoid this and preserve the initial typing, use the value.factor argument:

M_DT <- melt(DT,id.vars = c("ID"), measure.vars = c("OB_C"), value.factor = TRUE)
class(M_DT)
# "factor"

Handling missing values

By default, any NA values are preserved in the molten data

DT = data.table(ID = letters[1:3], Age = 20:22, OB_A = 1:3, OB_B = 4:6, OB_C = c(7:8,NA))
melt(DT,id.vars = c("ID"), measure.vars = c("OB_C"))
   ID variable value
1:  a     OB_C     7
2:  b     OB_C     8
3:  c     OB_C    NA

If these should be removed from your data, set na.rm = TRUE

melt(DT,id.vars = c("ID"), measure.vars = c("OB_C"), na.rm = TRUE)
   ID variable value
1:  a     OB_C     7
2:  b     OB_C     8

PDF - Download data.table for free

Previous Next

data.table

Fastest Entity Framework Extensions

Example

Melting: The basics

Naming variables and values in the result

Setting types for measure variables in the result

Handling missing values

Got any data.table Question?

data.table

data.table Reshaping, stacking and splitting Going from wide to long format using melt

Fastest Entity Framework Extensions

Example

Melting: The basics

Naming variables and values in the result

Setting types for measure variables in the result

Handling missing values

Got any data.table Question?