Melting is used to transform data from wide to long format.
Starting with a wide data set:
DT = data.table(ID = letters[1:3], Age = 20:22, OB_A = 1:3, OB_B = 4:6, OB_C = 7:9)
We can melt our data using the melt
function in data.table. This returns another data.table in long format:
melt(DT, id.vars = c("ID","Age"))
1: a 20 OB_A 1
2: b 21 OB_A 2
3: c 22 OB_A 3
4: a 20 OB_B 4
5: b 21 OB_B 5
6: c 22 OB_B 6
7: a 20 OB_C 7
8: b 21 OB_C 8
9: c 22 OB_C 9
class(melt(DT, id.vars = c("ID","Age")))
# "data.table" "data.frame"
Any columns not set in the id.vars
parameter are assumed to be variables. Alternatively, we can set these explicitly using the measure.vars
argument:
melt(DT, measure.vars = c("OB_A","OB_B","OB_C"))
ID Age variable value
1: a 20 OB_A 1
2: b 21 OB_A 2
3: c 22 OB_A 3
4: a 20 OB_B 4
5: b 21 OB_B 5
6: c 22 OB_B 6
7: a 20 OB_C 7
8: b 21 OB_C 8
9: c 22 OB_C 9
In this case, any columns not set in measure.vars
are assumed to be IDs.
If we set both explicitly, it will only return the columns selected:
melt(DT, id.vars = "ID", measure.vars = c("OB_C"))
ID variable value
1: a OB_C 7
2: b OB_C 8
3: c OB_C 9
We can manipulate the column names of the returned table using variable.name
and value.name
melt(DT,
id.vars = c("ID"),
measure.vars = c("OB_C"),
variable.name = "Test",
value.name = "Result"
)
ID Test Result
1: a OB_C 7
2: b OB_C 8
3: c OB_C 9
By default, melting a data.table converts all measure.vars
to factors:
M_DT <- melt(DT,id.vars = c("ID"), measure.vars = c("OB_C"))
class(M_DT[, variable])
# "factor"
To set as character instead, use the variable.factor
argument:
M_DT <- melt(DT,id.vars = c("ID"), measure.vars = c("OB_C"), variable.factor = FALSE)
class(M_DT[, variable])
# "character"
Values generally inherit from the data type of the originating column:
class(DT[, value])
# "integer"
class(M_DT[, value])
# "integer"
If there is a conflict, data types will be coerced. For example:
M_DT <- melt(DT,id.vars = c("Age"), measure.vars = c("ID","OB_C"))
class(M_DT[, value])
# "character"
When melting, any factor variables will be coerced to character type:
DT[, OB_C := factor(OB_C)]
M_DT <- melt(DT,id.vars = c("ID"), measure.vars = c("OB_C"))
class(M_DT)
# "character"
To avoid this and preserve the initial typing, use the value.factor
argument:
M_DT <- melt(DT,id.vars = c("ID"), measure.vars = c("OB_C"), value.factor = TRUE)
class(M_DT)
# "factor"
By default, any NA
values are preserved in the molten data
DT = data.table(ID = letters[1:3], Age = 20:22, OB_A = 1:3, OB_B = 4:6, OB_C = c(7:8,NA))
melt(DT,id.vars = c("ID"), measure.vars = c("OB_C"))
ID variable value
1: a OB_C 7
2: b OB_C 8
3: c OB_C NA
If these should be removed from your data, set na.rm = TRUE
melt(DT,id.vars = c("ID"), measure.vars = c("OB_C"), na.rm = TRUE)
ID variable value
1: a OB_C 7
2: b OB_C 8