R Language Column wise operation sum of each column

Help us to keep this website almost Ad Free! It takes only 10 seconds of your time:
> Step 1: Go view our video on YouTube: EF Core Bulk Extensions
> Step 2: And Like the video. BONUS: You can also share it!

Example

Suppose we need to do the sum of each column in a dataset

set.seed(20)
df1 <- data.frame(ID = rep(c("A", "B", "C"), each = 3), V1 = rnorm(9), V2 = rnorm(9))
m1 <- as.matrix(df1[-1])

There are many ways to do this. Using base R, the best option would be colSums

colSums(df1[-1], na.rm = TRUE)

Here, we removed the first column as it is non-numeric and did the sum of each column, specifying the na.rm = TRUE (in case there are any NAs in the dataset)

This also works with matrix

colSums(m1, na.rm = TRUE)

This can be done in a loop with lapply/sapply/vapply

 lapply(df1[-1], sum, na.rm = TRUE)

It should be noted that the output is a list. If we need a vector output

 sapply(df1[-1], sum, na.rm = TRUE)

Or

 vapply(df1[-1], sum, na.rm = TRUE, numeric(1))

For matrices, if we want to loop through columns, then use apply with MARGIN = 1

 apply(m1, 2, FUN = sum, na.rm = TRUE)

There are ways to do this with packages like dplyr or data.table

 library(dplyr)
 df1 %>%
     summarise_at(vars(matches("^V\\d+")), sum, na.rm = TRUE)

Here, we are passing a regular expression to match the column names that we need to get the sum in summarise_at. The regex will match all columns that start with V followed by one or more numbers (\\d+).

A data.table option is

library(data.table)   
setDT(df1)[, lapply(.SD, sum, na.rm = TRUE), .SDcols = 2:ncol(df1)]

We convert the 'data.frame' to 'data.table' (setDT(df1)), specified the columns to be applied the function in .SDcols and loop through the Subset of Data.table (.SD) and get the sum.


If we need to use a group by operation, we can do this easily by specifying the group by column/columns

 df1 %>%
   group_by(ID) %>%   
   summarise_at(vars(matches("^V\\d+")), sum, na.rm = TRUE)

In cases where we need the sum of all the columns, summarise_each can be used instead of summarise_at

df1 %>%
    group_by(ID) %>%
    summarise_each(funs(sum(., na.rm = TRUE)))

The data.table option is

setDT(df1)[, lapply(.SD, sum, na.rm = TRUE), by = ID]   


Got any R Language Question?