DT[where, select|update|do, by] syntax is used to work with columns of a data.table.
These two arguments are usually passed by position instead of by name.
Our example data below is
mtcars = data.table(mtcars, keep.rownames = TRUE)
:= operator inside
j to assign new columns:
mtcars[, mpg_sq := mpg^2]
Remove columns by setting to
mtcars[, mpg_sq := NULL]
Add multiple columns by using the
:= operator's multivariate format:
mtcars[, `:=`(mpg_sq = mpg^2, wt_sqrt = sqrt(wt))] # or mtcars[, c("mpg_sq", "wt_sqrt") := .(mpg^2, sqrt(wt))]
If the columns are dependent and must be defined in sequence, one way is:
mtcars[, c("mpg_sq", "mpg2_hp") := .(temp1 <- mpg^2, temp1/hp)]
.() syntax is used when the right-hand side of
LHS := RHS is a list of columns.
For dynamically-determined column names, use parentheses:
vn = "mpg_sq" mtcars[, (vn) := mpg^2]
Columns can also be modified with
set, though this is rarely necessary:
set(mtcars, j = "hp_over_wt", v = mtcars$hp/mtcars$wt)
i argument to subset to rows "where" edits should be made:
mtcars[1:3, newvar := "Hello"] # or set(mtcars, j = "newvar", i = 1:3, v = "Hello")
As in a data.frame, we can subset using row numbers or logical tests. It is also possible to use a "join" in
i, but that more complicated task is covered in another example.
Functions that edit attributes, such as
names<-, actually replace an object with a modified copy. Even if only used on one column in a data.table, the entire object is copied and replaced.
To modify an object without copies, use
setnames to change the column names of a data.table or data.frame and
setattr to change an attribute for any object.
# Print a message to the console whenever the data.table is copied tracemem(mtcars) mtcars[, cyl2 := factor(cyl)] # Neither of these statements copy the data.table setnames(mtcars, old = "cyl2", new = "cyl_fac") setattr(mtcars$cyl_fac, "levels", c("four", "six", "eight")) # Each of these statements copies the data.table names(mtcars)[names(mtcars) == "cyl_fac"] <- "cf" levels(mtcars$cf) <- c("IV", "VI", "VIII")
Be aware that these changes are made by reference, so they are global. Changing them within one environment affects the object in all environments.
# This function also changes the levels in the global environment edit_levels <- function(x) setattr(x, "levels", c("low", "med", "high")) edit_levels(mtcars$cyl_factor)