DT[where, select|update|do, by]
syntax is used to work with columns of a data.table.
i
argumentj
argumentThese two arguments are usually passed by position instead of by name.
Our example data below is
mtcars = data.table(mtcars, keep.rownames = TRUE)
Use the :=
operator inside j
to assign new columns:
mtcars[, mpg_sq := mpg^2]
Remove columns by setting to NULL
:
mtcars[, mpg_sq := NULL]
Add multiple columns by using the :=
operator's multivariate format:
mtcars[, `:=`(mpg_sq = mpg^2, wt_sqrt = sqrt(wt))]
# or
mtcars[, c("mpg_sq", "wt_sqrt") := .(mpg^2, sqrt(wt))]
If the columns are dependent and must be defined in sequence, one way is:
mtcars[, c("mpg_sq", "mpg2_hp") := .(temp1 <- mpg^2, temp1/hp)]
The .()
syntax is used when the right-hand side of LHS := RHS
is a list of columns.
For dynamically-determined column names, use parentheses:
vn = "mpg_sq"
mtcars[, (vn) := mpg^2]
Columns can also be modified with set
, though this is rarely necessary:
set(mtcars, j = "hp_over_wt", v = mtcars$hp/mtcars$wt)
Use the i
argument to subset to rows "where" edits should be made:
mtcars[1:3, newvar := "Hello"]
# or
set(mtcars, j = "newvar", i = 1:3, v = "Hello")
As in a data.frame, we can subset using row numbers or logical tests. It is also possible to use a "join" in i
, but that more complicated task is covered in another example.
Functions that edit attributes, such as levels<-
or names<-
, actually replace an object with a modified copy. Even if only used on one column in a data.table, the entire object is copied and replaced.
To modify an object without copies, use setnames
to change the column names of a data.table or data.frame and setattr
to change an attribute for any object.
# Print a message to the console whenever the data.table is copied
tracemem(mtcars)
mtcars[, cyl2 := factor(cyl)]
# Neither of these statements copy the data.table
setnames(mtcars, old = "cyl2", new = "cyl_fac")
setattr(mtcars$cyl_fac, "levels", c("four", "six", "eight"))
# Each of these statements copies the data.table
names(mtcars)[names(mtcars) == "cyl_fac"] <- "cf"
levels(mtcars$cf) <- c("IV", "VI", "VIII")
Be aware that these changes are made by reference, so they are global. Changing them within one environment affects the object in all environments.
# This function also changes the levels in the global environment
edit_levels <- function(x) setattr(x, "levels", c("low", "med", "high"))
edit_levels(mtcars$cyl_factor)