Helper functions are used in conjunction with select
to identify variables to return. Unless otherwise noted, these functions expect a string as the first parameter match
. Passing a vector or another object will generate an error.
library(dplyr) library(nycflights13)
starts_with
allows us to identify variables whose name begins with a string.
Returns all variables that begin with the letter "e".
planes %>% select(starts_with("e"))
## # A tibble: 3,322 × 2
## engines engine
## <int> <chr>
## 1 2 Turbo-fan
## 2 2 Turbo-fan
## 3 2 Turbo-fan
## 4 2 Turbo-fan
## 5 2 Turbo-fan
## 6 2 Turbo-fan
## 7 2 Turbo-fan
## 8 2 Turbo-fan
## 9 2 Turbo-fan
## 10 2 Turbo-fan
## # ... with 3,312 more rows
Set ignore.case
parameter to FALSE for strict casing.
planes %>% select(starts_with("E", ignore.case = FALSE))
## # A tibble: 3,322 × 0
Return all variables that end with the letter "e".
planes %>% select(ends_with("e"))
## # A tibble: 3,322 × 2
## type engine
## <chr> <chr>
## 1 Fixed wing multi engine Turbo-fan
## 2 Fixed wing multi engine Turbo-fan
## 3 Fixed wing multi engine Turbo-fan
## 4 Fixed wing multi engine Turbo-fan
## 5 Fixed wing multi engine Turbo-fan
## 6 Fixed wing multi engine Turbo-fan
## 7 Fixed wing multi engine Turbo-fan
## 8 Fixed wing multi engine Turbo-fan
## 9 Fixed wing multi engine Turbo-fan
## 10 Fixed wing multi engine Turbo-fan
## # ... with 3,312 more rows
Set ignore.case
parameter to FALSE for strict casing.
planes %>% select(ends_with("E", ignore.case = FALSE))
## # A tibble: 3,322 × 0
contains
allows you to find any variables that contain a given string.
planes %>% select(contains("ea"))
## # A tibble: 3,322 × 2
## year seats
## <int> <int>
## 1 2004 55
## 2 1998 182
## 3 1999 182
## 4 1999 182
## 5 2002 55
## 6 1999 182
## 7 1999 182
## 8 1999 182
## 9 1999 182
## 10 1999 182
## # ... with 3,312 more rows
Set ignore.case
parameter to FALSE for strict casing.
planes %>% select(contains("EA", ignore.case = FALSE))
## # A tibble: 3,322 × 0
matches
is the only helper function that allows the use of regular expressions.
Return all variables with a name at least six alpha characters:
planes %>% select(matches("[[:alpha:]]{6,}"))
## # A tibble: 3,322 × 4
## tailnum manufacturer engines engine
## <chr> <chr> <int> <chr>
## 1 N10156 EMBRAER 2 Turbo-fan
## 2 N102UW AIRBUS INDUSTRIE 2 Turbo-fan
## 3 N103US AIRBUS INDUSTRIE 2 Turbo-fan
## 4 N104UW AIRBUS INDUSTRIE 2 Turbo-fan
## 5 N10575 EMBRAER 2 Turbo-fan
## 6 N105UW AIRBUS INDUSTRIE 2 Turbo-fan
## 7 N107US AIRBUS INDUSTRIE 2 Turbo-fan
## 8 N108UW AIRBUS INDUSTRIE 2 Turbo-fan
## 9 N109UW AIRBUS INDUSTRIE 2 Turbo-fan
## 10 N110UW AIRBUS INDUSTRIE 2 Turbo-fan
## # ... with 3,312 more rows
Set ignore.case
parameter to FALSE for strict casing.
For this example I will generate a dummy dataframe with random values and sequential variable names.
set.seed(1) df <- data.frame(x1 = runif(10), x2 = runif(10), x3 = runif(10), x4 = runif(10), x5 = runif(10))
num_range
can be used to select a range of varaibles given a consistent prefix
.
Select the variables 2:4 from df
:
df %>% select(num_range('x', range = 2:4))
## x2 x3 x4
## 1 0.2059746 0.93470523 0.4820801
## 2 0.1765568 0.21214252 0.5995658
## 3 0.6870228 0.65167377 0.4935413
## 4 0.3841037 0.12555510 0.1862176
## 5 0.7698414 0.26722067 0.8273733
## 6 0.4976992 0.38611409 0.6684667
## 7 0.7176185 0.01339033 0.7942399
## 8 0.9919061 0.38238796 0.1079436
## 9 0.3800352 0.86969085 0.7237109
## 10 0.7774452 0.34034900 0.4112744
one_of
can take a vector as the match
parameter and returns each variable.
planes %>% select(one_of(c("tailnum", "model")))
## # A tibble: 3,322 × 2
## tailnum model
## <chr> <chr>
## 1 N10156 EMB-145XR
## 2 N102UW A320-214
## 3 N103US A320-214
## 4 N104UW A320-214
## 5 N10575 EMB-145LR
## 6 N105UW A320-214
## 7 N107US A320-214
## 8 N108UW A320-214
## 9 N109UW A320-214
## 10 N110UW A320-214
## # ... with 3,312 more rows
everything
can be used to reposition variables in the dataframe.
Make manufacturer
the first variable followed by all remaining variables.
planes %>% select(manufacturer, everything())
## # A tibble: 3,322 × 9
## manufacturer tailnum year type model
## <chr> <chr> <int> <chr> <chr>
## 1 EMBRAER N10156 2004 Fixed wing multi engine EMB-145XR
## 2 AIRBUS INDUSTRIE N102UW 1998 Fixed wing multi engine A320-214
## 3 AIRBUS INDUSTRIE N103US 1999 Fixed wing multi engine A320-214
## 4 AIRBUS INDUSTRIE N104UW 1999 Fixed wing multi engine A320-214
## 5 EMBRAER N10575 2002 Fixed wing multi engine EMB-145LR
## 6 AIRBUS INDUSTRIE N105UW 1999 Fixed wing multi engine A320-214
## 7 AIRBUS INDUSTRIE N107US 1999 Fixed wing multi engine A320-214
## 8 AIRBUS INDUSTRIE N108UW 1999 Fixed wing multi engine A320-214
## 9 AIRBUS INDUSTRIE N109UW 1999 Fixed wing multi engine A320-214
## 10 AIRBUS INDUSTRIE N110UW 1999 Fixed wing multi engine A320-214
## # ... with 3,312 more rows, and 4 more variables: engines <int>,
## # seats <int>, speed <int>, engine <chr>
Though the :
and -
operators are not part of the dplyr
package we can still use them to identify variables to return.
Define an inclusive range of variables to return.
Return every variable from year
to manufacturer
:
planes %>% select(year:manufacturer)
## # A tibble: 3,322 × 3
## year type manufacturer
## <int> <chr> <chr>
## 1 2004 Fixed wing multi engine EMBRAER
## 2 1998 Fixed wing multi engine AIRBUS INDUSTRIE
## 3 1999 Fixed wing multi engine AIRBUS INDUSTRIE
## 4 1999 Fixed wing multi engine AIRBUS INDUSTRIE
## 5 2002 Fixed wing multi engine EMBRAER
## 6 1999 Fixed wing multi engine AIRBUS INDUSTRIE
## 7 1999 Fixed wing multi engine AIRBUS INDUSTRIE
## 8 1999 Fixed wing multi engine AIRBUS INDUSTRIE
## 9 1999 Fixed wing multi engine AIRBUS INDUSTRIE
## 10 1999 Fixed wing multi engine AIRBUS INDUSTRIE
## # ... with 3,312 more rows
Return multiple ranges of variables:
planes %>% select(c(year:manufacturer, seats:engine))
## # A tibble: 3,322 × 6
## year type manufacturer seats speed engine
## <int> <chr> <chr> <int> <int> <chr>
## 1 2004 Fixed wing multi engine EMBRAER 55 NA Turbo-fan
## 2 1998 Fixed wing multi engine AIRBUS INDUSTRIE 182 NA Turbo-fan
## 3 1999 Fixed wing multi engine AIRBUS INDUSTRIE 182 NA Turbo-fan
## 4 1999 Fixed wing multi engine AIRBUS INDUSTRIE 182 NA Turbo-fan
## 5 2002 Fixed wing multi engine EMBRAER 55 NA Turbo-fan
## 6 1999 Fixed wing multi engine AIRBUS INDUSTRIE 182 NA Turbo-fan
## 7 1999 Fixed wing multi engine AIRBUS INDUSTRIE 182 NA Turbo-fan
## 8 1999 Fixed wing multi engine AIRBUS INDUSTRIE 182 NA Turbo-fan
## 9 1999 Fixed wing multi engine AIRBUS INDUSTRIE 182 NA Turbo-fan
## 10 1999 Fixed wing multi engine AIRBUS INDUSTRIE 182 NA Turbo-fan
## # ... with 3,312 more rows
The -
operator will remove a variable from a result set.
Return all variables with the exception of type
:
planes %>% select(-type)
## # A tibble: 3,322 × 8
## tailnum year manufacturer model engines seats speed engine
## <chr> <int> <chr> <chr> <int> <int> <int> <chr>
## 1 N10156 2004 EMBRAER EMB-145XR 2 55 NA Turbo-fan
## 2 N102UW 1998 AIRBUS INDUSTRIE A320-214 2 182 NA Turbo-fan
## 3 N103US 1999 AIRBUS INDUSTRIE A320-214 2 182 NA Turbo-fan
## 4 N104UW 1999 AIRBUS INDUSTRIE A320-214 2 182 NA Turbo-fan
## 5 N10575 2002 EMBRAER EMB-145LR 2 55 NA Turbo-fan
## 6 N105UW 1999 AIRBUS INDUSTRIE A320-214 2 182 NA Turbo-fan
## 7 N107US 1999 AIRBUS INDUSTRIE A320-214 2 182 NA Turbo-fan
## 8 N108UW 1999 AIRBUS INDUSTRIE A320-214 2 182 NA Turbo-fan
## 9 N109UW 1999 AIRBUS INDUSTRIE A320-214 2 182 NA Turbo-fan
## 10 N110UW 1999 AIRBUS INDUSTRIE A320-214 2 182 NA Turbo-fan
## # ... with 3,312 more rows
You can also pass a vector of variable names to exclude from your result set.
planes %>% select(-c(type, engines:engine))
## # A tibble: 3,322 × 4
## tailnum year manufacturer model
## <chr> <int> <chr> <chr>
## 1 N10156 2004 EMBRAER EMB-145XR
## 2 N102UW 1998 AIRBUS INDUSTRIE A320-214
## 3 N103US 1999 AIRBUS INDUSTRIE A320-214
## 4 N104UW 1999 AIRBUS INDUSTRIE A320-214
## 5 N10575 2002 EMBRAER EMB-145LR
## 6 N105UW 1999 AIRBUS INDUSTRIE A320-214
## 7 N107US 1999 AIRBUS INDUSTRIE A320-214
## 8 N108UW 1999 AIRBUS INDUSTRIE A320-214
## 9 N109UW 1999 AIRBUS INDUSTRIE A320-214
## 10 N110UW 1999 AIRBUS INDUSTRIE A320-214
## # ... with 3,312 more rows
Select all variables between type
and speed
(inclusive) and exclude manufacturer
.
planes %>% select(type:speed, -manufacturer)
## # A tibble: 3,322 × 5
## type model engines seats speed
## <chr> <chr> <int> <int> <int>
## 1 Fixed wing multi engine EMB-145XR 2 55 NA
## 2 Fixed wing multi engine A320-214 2 182 NA
## 3 Fixed wing multi engine A320-214 2 182 NA
## 4 Fixed wing multi engine A320-214 2 182 NA
## 5 Fixed wing multi engine EMB-145LR 2 55 NA
## 6 Fixed wing multi engine A320-214 2 182 NA
## 7 Fixed wing multi engine A320-214 2 182 NA
## 8 Fixed wing multi engine A320-214 2 182 NA
## 9 Fixed wing multi engine A320-214 2 182 NA
## 10 Fixed wing multi engine A320-214 2 182 NA
## # ... with 3,312 more rows
Modify the previous statement to exclude manufacturer
and model
.
planes %>% select(type:speed, -c(manufacturer, model))
## # A tibble: 3,322 × 4
## type engines seats speed
## <chr> <int> <int> <int>
## 1 Fixed wing multi engine 2 55 NA
## 2 Fixed wing multi engine 2 182 NA
## 3 Fixed wing multi engine 2 182 NA
## 4 Fixed wing multi engine 2 182 NA
## 5 Fixed wing multi engine 2 55 NA
## 6 Fixed wing multi engine 2 182 NA
## 7 Fixed wing multi engine 2 182 NA
## 8 Fixed wing multi engine 2 182 NA
## 9 Fixed wing multi engine 2 182 NA
## 10 Fixed wing multi engine 2 182 NA
## # ... with 3,312 more rows
You can use the same helper function more than once.
planes %>% select(starts_with("m"), starts_with("s"))
## # A tibble: 3,322 × 4
## manufacturer model seats speed
## <chr> <chr> <int> <int>
## 1 EMBRAER EMB-145XR 55 NA
## 2 AIRBUS INDUSTRIE A320-214 182 NA
## 3 AIRBUS INDUSTRIE A320-214 182 NA
## 4 AIRBUS INDUSTRIE A320-214 182 NA
## 5 EMBRAER EMB-145LR 55 NA
## 6 AIRBUS INDUSTRIE A320-214 182 NA
## 7 AIRBUS INDUSTRIE A320-214 182 NA
## 8 AIRBUS INDUSTRIE A320-214 182 NA
## 9 AIRBUS INDUSTRIE A320-214 182 NA
## 10 AIRBUS INDUSTRIE A320-214 182 NA
## # ... with 3,312 more rows
You can use multiple helper functions together:
planes %>% select(starts_with("m"), ends_with("l"))
## # A tibble: 3,322 × 2
## manufacturer model
## <chr> <chr>
## 1 EMBRAER EMB-145XR
## 2 AIRBUS INDUSTRIE A320-214
## 3 AIRBUS INDUSTRIE A320-214
## 4 AIRBUS INDUSTRIE A320-214
## 5 EMBRAER EMB-145LR
## 6 AIRBUS INDUSTRIE A320-214
## 7 AIRBUS INDUSTRIE A320-214
## 8 AIRBUS INDUSTRIE A320-214
## 9 AIRBUS INDUSTRIE A320-214
## 10 AIRBUS INDUSTRIE A320-214
## # ... with 3,312 more rows