R Language Adding and modifying columns


Example

DT[where, select|update|do, by] syntax is used to work with columns of a data.table.

  • The "where" part is the i argument
  • The "select|update|do" part is the j argument

These two arguments are usually passed by position instead of by name.

Our example data below is

mtcars = data.table(mtcars, keep.rownames = TRUE)

Editing entire columns

Use the := operator inside j to assign new columns:

mtcars[, mpg_sq := mpg^2]

Remove columns by setting to NULL:

mtcars[, mpg_sq := NULL]

Add multiple columns by using the := operator's multivariate format:

mtcars[, `:=`(mpg_sq = mpg^2, wt_sqrt = sqrt(wt))]
# or 
mtcars[, c("mpg_sq", "wt_sqrt") := .(mpg^2, sqrt(wt))]

If the columns are dependent and must be defined in sequence, one way is:

mtcars[, c("mpg_sq", "mpg2_hp") := .(temp1 <- mpg^2, temp1/hp)]

The .() syntax is used when the right-hand side of LHS := RHS is a list of columns.

For dynamically-determined column names, use parentheses:

vn = "mpg_sq"
mtcars[, (vn) := mpg^2]

Columns can also be modified with set, though this is rarely necessary:

set(mtcars, j = "hp_over_wt", v = mtcars$hp/mtcars$wt)

Editing subsets of columns

Use the i argument to subset to rows "where" edits should be made:

mtcars[1:3, newvar := "Hello"]
# or
set(mtcars, j = "newvar", i = 1:3, v = "Hello")  

As in a data.frame, we can subset using row numbers or logical tests. It is also possible to use a "join" in i, but that more complicated task is covered in another example.

Editing column attributes

Functions that edit attributes, such as levels<- or names<-, actually replace an object with a modified copy. Even if only used on one column in a data.table, the entire object is copied and replaced.

To modify an object without copies, use setnames to change the column names of a data.table or data.frame and setattr to change an attribute for any object.

# Print a message to the console whenever the data.table is copied
tracemem(mtcars)
mtcars[, cyl2 := factor(cyl)]

# Neither of these statements copy the data.table
setnames(mtcars, old = "cyl2", new = "cyl_fac")
setattr(mtcars$cyl_fac, "levels", c("four", "six", "eight"))

# Each of these statements copies the data.table
names(mtcars)[names(mtcars) == "cyl_fac"] <- "cf"
levels(mtcars$cf) <- c("IV", "VI", "VIII")

Be aware that these changes are made by reference, so they are global. Changing them within one environment affects the object in all environments.

# This function also changes the levels in the global environment
edit_levels <- function(x) setattr(x, "levels", c("low", "med", "high"))
edit_levels(mtcars$cyl_factor)