The pipe (%>%) operator could be used in combination with dplyr
functions. In this example we use the mtcars
dataset (see help("mtcars")
for more information) to show how to sumarize a data frame, and to add variables to the data with the result of the application of a function.
library(dplyr)
library(magrittr)
df <- mtcars
df$cars <- rownames(df) #just add the cars names to the df
df <- df[,c(ncol(df),1:(ncol(df)-1))] # and place the names in the first column
1. Sumarize the data
To compute statistics we use summarize
and the appropriate functions. In this case n()
is used for counting the number of cases.
df %>%
summarize(count=n(),mean_mpg = mean(mpg, na.rm = TRUE),
min_weight = min(wt),max_weight = max(wt))
# count mean_mpg min_weight max_weight
#1 32 20.09062 1.513 5.424
2. Compute statistics by group
It is possible to compute the statistics by groups of the data. In this case by Number of cylinders and Number of forward gears
df %>%
group_by(cyl, gear) %>%
summarize(count=n(),mean_mpg = mean(mpg, na.rm = TRUE),
min_weight = min(wt),max_weight = max(wt))
# Source: local data frame [8 x 6]
# Groups: cyl [?]
#
# cyl gear count mean_mpg min_weight max_weight
# <dbl> <dbl> <int> <dbl> <dbl> <dbl>
#1 4 3 1 21.500 2.465 2.465
#2 4 4 8 26.925 1.615 3.190
#3 4 5 2 28.200 1.513 2.140
#4 6 3 2 19.750 3.215 3.460
#5 6 4 4 19.750 2.620 3.440
#6 6 5 1 19.700 2.770 2.770
#7 8 3 12 15.050 3.435 5.424
#8 8 5 2 15.400 3.170 3.570