dplyrdplyr入门


备注

本节概述了dplyr是什么,以及开发人员可能想要使用它的原因。

它还应该提到dplyr中的任何大型主题,并链接到相关主题。由于dplyr的文档是新的,因此您可能需要创建这些相关主题的初始版本。

基本动词

library(dplyr)
library(nycflights13)

dplyr 最常用的几个动词用于修改数据集。

选择

从数据tailnum planes 选择tailnumtypemodel 变量:

select(planes, tailnum, type, model)
## # A tibble: 3,322 × 3
##    tailnum                    type     model
##      <chr>                   <chr>     <chr>
## 1   N10156 Fixed wing multi engine EMB-145XR
## 2   N102UW Fixed wing multi engine  A320-214
## 3   N103US Fixed wing multi engine  A320-214
## 4   N104UW Fixed wing multi engine  A320-214
## 5   N10575 Fixed wing multi engine EMB-145LR
## 6   N105UW Fixed wing multi engine  A320-214
## 7   N107US Fixed wing multi engine  A320-214
## 8   N108UW Fixed wing multi engine  A320-214
## 9   N109UW Fixed wing multi engine  A320-214
## 10  N110UW Fixed wing multi engine  A320-214
## # ... with 3,312 more rows
 

使用magrittr包中的forward-pipe运算符( %>% )重写上面的语句:

planes %>% select(tailnum, type, model)
## # A tibble: 3,322 × 3
##    tailnum                    type     model
##      <chr>                   <chr>     <chr>
## 1   N10156 Fixed wing multi engine EMB-145XR
## 2   N102UW Fixed wing multi engine  A320-214
## 3   N103US Fixed wing multi engine  A320-214
## 4   N104UW Fixed wing multi engine  A320-214
## 5   N10575 Fixed wing multi engine EMB-145LR
## 6   N105UW Fixed wing multi engine  A320-214
## 7   N107US Fixed wing multi engine  A320-214
## 8   N108UW Fixed wing multi engine  A320-214
## 9   N109UW Fixed wing multi engine  A320-214
## 10  N110UW Fixed wing multi engine  A320-214
## # ... with 3,312 more rows
 

过滤

根据crieria filter 行。

返回manufacturer 为“EMBRAER”的数据集:

planes %>% filter(manufacturer == "EMBRAER")
## # A tibble: 299 × 9
##    tailnum  year                    type manufacturer     model engines
##      <chr> <int>                   <chr>        <chr>     <chr>   <int>
## 1   N10156  2004 Fixed wing multi engine      EMBRAER EMB-145XR       2
## 2   N10575  2002 Fixed wing multi engine      EMBRAER EMB-145LR       2
## 3   N11106  2002 Fixed wing multi engine      EMBRAER EMB-145XR       2
## 4   N11107  2002 Fixed wing multi engine      EMBRAER EMB-145XR       2
## 5   N11109  2002 Fixed wing multi engine      EMBRAER EMB-145XR       2
## 6   N11113  2002 Fixed wing multi engine      EMBRAER EMB-145XR       2
## 7   N11119  2002 Fixed wing multi engine      EMBRAER EMB-145XR       2
## 8   N11121  2003 Fixed wing multi engine      EMBRAER EMB-145XR       2
## 9   N11127  2003 Fixed wing multi engine      EMBRAER EMB-145XR       2
## 10  N11137  2003 Fixed wing multi engine      EMBRAER EMB-145XR       2
## # ... with 289 more rows, and 3 more variables: seats <int>, speed <int>,
## #   engine <chr>
 

返回manufacturer 为“EMBRAER”且model 为“EMB-145XR”的数据集:

planes %>% 
  filter(manufacturer == "EMBRAER", model == "EMB-145XR")
## # A tibble: 104 × 9
##    tailnum  year                    type manufacturer     model engines
##      <chr> <int>                   <chr>        <chr>     <chr>   <int>
## 1   N10156  2004 Fixed wing multi engine      EMBRAER EMB-145XR       2
## 2   N11106  2002 Fixed wing multi engine      EMBRAER EMB-145XR       2
## 3   N11107  2002 Fixed wing multi engine      EMBRAER EMB-145XR       2
## 4   N11109  2002 Fixed wing multi engine      EMBRAER EMB-145XR       2
## 5   N11113  2002 Fixed wing multi engine      EMBRAER EMB-145XR       2
## 6   N11119  2002 Fixed wing multi engine      EMBRAER EMB-145XR       2
## 7   N11121  2003 Fixed wing multi engine      EMBRAER EMB-145XR       2
## 8   N11127  2003 Fixed wing multi engine      EMBRAER EMB-145XR       2
## 9   N11137  2003 Fixed wing multi engine      EMBRAER EMB-145XR       2
## 10  N11140  2003 Fixed wing multi engine      EMBRAER EMB-145XR       2
## # ... with 94 more rows, and 3 more variables: seats <int>, speed <int>,
## #   engine <chr>
 

上述陈述与写“AND”条件相同。

planes %>% filter(manufacturer == "EMBRAER" & model == "EMB-145XR")
## # A tibble: 104 × 9
##    tailnum  year                    type manufacturer     model engines
##      <chr> <int>                   <chr>        <chr>     <chr>   <int>
## 1   N10156  2004 Fixed wing multi engine      EMBRAER EMB-145XR       2
## 2   N11106  2002 Fixed wing multi engine      EMBRAER EMB-145XR       2
## 3   N11107  2002 Fixed wing multi engine      EMBRAER EMB-145XR       2
## 4   N11109  2002 Fixed wing multi engine      EMBRAER EMB-145XR       2
## 5   N11113  2002 Fixed wing multi engine      EMBRAER EMB-145XR       2
## 6   N11119  2002 Fixed wing multi engine      EMBRAER EMB-145XR       2
## 7   N11121  2003 Fixed wing multi engine      EMBRAER EMB-145XR       2
## 8   N11127  2003 Fixed wing multi engine      EMBRAER EMB-145XR       2
## 9   N11137  2003 Fixed wing multi engine      EMBRAER EMB-145XR       2
## 10  N11140  2003 Fixed wing multi engine      EMBRAER EMB-145XR       2
## # ... with 94 more rows, and 3 more variables: seats <int>, speed <int>,
## #   engine <chr>
 

对“OR”条件使用管道(|)字符:

planes %>% filter(manufacturer == "EMBRAER" | model == "EMB-145XR")
## # A tibble: 299 × 9
##    tailnum  year                    type manufacturer     model engines
##      <chr> <int>                   <chr>        <chr>     <chr>   <int>
## 1   N10156  2004 Fixed wing multi engine      EMBRAER EMB-145XR       2
## 2   N10575  2002 Fixed wing multi engine      EMBRAER EMB-145LR       2
## 3   N11106  2002 Fixed wing multi engine      EMBRAER EMB-145XR       2
## 4   N11107  2002 Fixed wing multi engine      EMBRAER EMB-145XR       2
## 5   N11109  2002 Fixed wing multi engine      EMBRAER EMB-145XR       2
## 6   N11113  2002 Fixed wing multi engine      EMBRAER EMB-145XR       2
## 7   N11119  2002 Fixed wing multi engine      EMBRAER EMB-145XR       2
## 8   N11121  2003 Fixed wing multi engine      EMBRAER EMB-145XR       2
## 9   N11127  2003 Fixed wing multi engine      EMBRAER EMB-145XR       2
## 10  N11137  2003 Fixed wing multi engine      EMBRAER EMB-145XR       2
## # ... with 289 more rows, and 3 more variables: seats <int>, speed <int>,
## #   engine <chr>
 

greplfilter 结合使用以实现模式匹配条件。

planes %>% filter(grepl("^172.", model))
## # A tibble: 3 × 9
##   tailnum  year                     type manufacturer model engines seats
##     <chr> <int>                    <chr>        <chr> <chr>   <int> <int>
## 1  N378AA  1963 Fixed wing single engine       CESSNA  172E       1     4
## 2  N621AA  1975 Fixed wing single engine       CESSNA  172M       1     4
## 3  N737MQ  1977 Fixed wing single engine       CESSNA  172N       1     4
## # ... with 2 more variables: speed <int>, engine <chr>
 

之间

返回year between 2004年和2005年between 所有行:

planes %>% filter(between(year, 2004, 2005))
## # A tibble: 354 × 9
##    tailnum  year                    type manufacturer     model engines
##      <chr> <int>                   <chr>        <chr>     <chr>   <int>
## 1   N10156  2004 Fixed wing multi engine      EMBRAER EMB-145XR       2
## 2   N11155  2004 Fixed wing multi engine      EMBRAER EMB-145XR       2
## 3   N11164  2004 Fixed wing multi engine      EMBRAER EMB-145XR       2
## 4   N11165  2004 Fixed wing multi engine      EMBRAER EMB-145XR       2
## 5   N11176  2004 Fixed wing multi engine      EMBRAER EMB-145XR       2
## 6   N11181  2005 Fixed wing multi engine      EMBRAER EMB-145XR       2
## 7   N11184  2005 Fixed wing multi engine      EMBRAER EMB-145XR       2
## 8   N11187  2005 Fixed wing multi engine      EMBRAER EMB-145XR       2
## 9   N11189  2005 Fixed wing multi engine      EMBRAER EMB-145XR       2
## 10  N11191  2005 Fixed wing multi engine      EMBRAER EMB-145XR       2
## # ... with 344 more rows, and 3 more variables: seats <int>, speed <int>,
## #   engine <chr>
 

切片

slice 仅返回给定索引的行。

返回前五行数据(与基本head 函数相同):

planes %>% slice(1:5)
## # A tibble: 5 × 9
##   tailnum  year                    type     manufacturer     model engines
##     <chr> <int>                   <chr>            <chr>     <chr>   <int>
## 1  N10156  2004 Fixed wing multi engine          EMBRAER EMB-145XR       2
## 2  N102UW  1998 Fixed wing multi engine AIRBUS INDUSTRIE  A320-214       2
## 3  N103US  1999 Fixed wing multi engine AIRBUS INDUSTRIE  A320-214       2
## 4  N104UW  1999 Fixed wing multi engine AIRBUS INDUSTRIE  A320-214       2
## 5  N10575  2002 Fixed wing multi engine          EMBRAER EMB-145LR       2
## # ... with 3 more variables: seats <int>, speed <int>, engine <chr>
 

返回第1行,第3行和第5行数据:

planes %>% slice(c(1, 3, 5))
## # A tibble: 3 × 9
##   tailnum  year                    type     manufacturer     model engines
##     <chr> <int>                   <chr>            <chr>     <chr>   <int>
## 1  N10156  2004 Fixed wing multi engine          EMBRAER EMB-145XR       2
## 2  N103US  1999 Fixed wing multi engine AIRBUS INDUSTRIE  A320-214       2
## 3  N10575  2002 Fixed wing multi engine          EMBRAER EMB-145LR       2
## # ... with 3 more variables: seats <int>, speed <int>, engine <chr>
 

返回第一行和最后一行:

planes %>% slice(c(1, nrow(planes)))
## # A tibble: 2 × 9
##   tailnum  year                    type                  manufacturer
##     <chr> <int>                   <chr>                         <chr>
## 1  N10156  2004 Fixed wing multi engine                       EMBRAER
## 2  N999DN  1992 Fixed wing multi engine MCDONNELL DOUGLAS CORPORATION
## # ... with 5 more variables: model <chr>, engines <int>, seats <int>,
## #   speed <int>, engine <chr>
 

变异

mutate 可以添加新变量或修改现有变量。

添加一个虚拟变量engine.dummy ,默认值为0:

planes %>% 
  mutate(engine.dummy = 0) %>%
  select(engine, engine.dummy)
## # A tibble: 3,322 × 2
##       engine engine.dummy
##        <chr>        <dbl>
## 1  Turbo-fan            0
## 2  Turbo-fan            0
## 3  Turbo-fan            0
## 4  Turbo-fan            0
## 5  Turbo-fan            0
## 6  Turbo-fan            0
## 7  Turbo-fan            0
## 8  Turbo-fan            0
## 9  Turbo-fan            0
## 10 Turbo-fan            0
## # ... with 3,312 more rows
 

使用dplyr::if_else ,如果engine ==“Turbo-fan”,则将engine.dummy 设置为1,否则将engine.dummy 设置为0:

planes %>% 
  mutate(engine.dummy = if_else(engine == "Turbo-fan", 1, 0)) %>%
  select(engine, engine.dummy)
## # A tibble: 3,322 × 2
##       engine engine.dummy
##        <chr>        <dbl>
## 1  Turbo-fan            1
## 2  Turbo-fan            1
## 3  Turbo-fan            1
## 4  Turbo-fan            1
## 5  Turbo-fan            1
## 6  Turbo-fan            1
## 7  Turbo-fan            1
## 8  Turbo-fan            1
## 9  Turbo-fan            1
## 10 Turbo-fan            1
## # ... with 3,312 more rows
 

planes$engine 转换为一个因子。

planes %>% 
  mutate(engine = as.factor(engine)) %>% 
  select(engine)
## # A tibble: 3,322 × 1
##       engine
##       <fctr>
## 1  Turbo-fan
## 2  Turbo-fan
## 3  Turbo-fan
## 4  Turbo-fan
## 5  Turbo-fan
## 6  Turbo-fan
## 7  Turbo-fan
## 8  Turbo-fan
## 9  Turbo-fan
## 10 Turbo-fan
## # ... with 3,312 more rows
 

安排

使用arrange 对数据框进行排序。

year 安排planes

planes %>% arrange(year)
## # A tibble: 3,322 × 9
##    tailnum  year                     type manufacturer       model engines
##      <chr> <int>                    <chr>        <chr>       <chr>   <int>
## 1   N381AA  1956  Fixed wing multi engine      DOUGLAS      DC-7BF       4
## 2   N201AA  1959 Fixed wing single engine       CESSNA         150       1
## 3   N567AA  1959 Fixed wing single engine  DEHAVILLAND OTTER DHC-3       1
## 4   N378AA  1963 Fixed wing single engine       CESSNA        172E       1
## 5   N575AA  1963 Fixed wing single engine       CESSNA  210-5(205)       1
## 6   N14629  1965  Fixed wing multi engine       BOEING     737-524       2
## 7   N615AA  1967  Fixed wing multi engine        BEECH      65-A90       2
## 8   N425AA  1968 Fixed wing single engine        PIPER   PA-28-180       1
## 9   N383AA  1972  Fixed wing multi engine        BEECH        E-90       2
## 10  N364AA  1973  Fixed wing multi engine       CESSNA        310Q       2
## # ... with 3,312 more rows, and 3 more variables: seats <int>,
## #   speed <int>, engine <chr>
 

arrange planesyear desc

planes %>% arrange(desc(year))
## # A tibble: 3,322 × 9
##    tailnum  year                    type manufacturer    model engines
##      <chr> <int>                   <chr>        <chr>    <chr>   <int>
## 1   N150UW  2013 Fixed wing multi engine       AIRBUS A321-211       2
## 2   N151UW  2013 Fixed wing multi engine       AIRBUS A321-211       2
## 3   N152UW  2013 Fixed wing multi engine       AIRBUS A321-211       2
## 4   N153UW  2013 Fixed wing multi engine       AIRBUS A321-211       2
## 5   N154UW  2013 Fixed wing multi engine       AIRBUS A321-211       2
## 6   N155UW  2013 Fixed wing multi engine       AIRBUS A321-211       2
## 7   N156UW  2013 Fixed wing multi engine       AIRBUS A321-211       2
## 8   N157UW  2013 Fixed wing multi engine       AIRBUS A321-211       2
## 9   N198UW  2013 Fixed wing multi engine       AIRBUS A321-211       2
## 10  N199UW  2013 Fixed wing multi engine       AIRBUS A321-211       2
## # ... with 3,312 more rows, and 3 more variables: seats <int>,
## #   speed <int>, engine <chr>
 

通过...分组

group_by 允许您通过子集对数据帧执行操作,而无需提取子集。

df <- planes %>% group_by(manufacturer, model)

返回的数据框可能不会显示为分组。但是,数据框的classattributes 将确认它。

class(df)
## [1] "grouped_df" "tbl_df"     "tbl"        "data.frame"
 
attributes(df)$vars
## [[1]]
## manufacturer
## 
## [[2]]
## model
 
head(attributes(df)$labels, n = 5L)
##   manufacturer    model
## 1   AGUSTA SPA    A109E
## 2       AIRBUS A319-112
## 3       AIRBUS A319-114
## 4       AIRBUS A319-115
## 5       AIRBUS A319-131
 

如果要在不删除现有分组元素的情况下将分组元素添加到数据框,请使用add 参数设置为TRUE(默认设置为FALSE):

df <- df %>% group_by(type, year, add = TRUE)
class(df)
## [1] "grouped_df" "tbl_df"     "tbl"        "data.frame"
 
attributes(df)$vars
## [[1]]
## manufacturer
## 
## [[2]]
## model
## 
## [[3]]
## type
## 
## [[4]]
## year
 
head(attributes(df)$labels, n = 5L)
##   manufacturer    model                    type year
## 1   AGUSTA SPA    A109E              Rotorcraft 2001
## 2       AIRBUS A319-112 Fixed wing multi engine 2002
## 3       AIRBUS A319-112 Fixed wing multi engine 2005
## 4       AIRBUS A319-112 Fixed wing multi engine 2006
## 5       AIRBUS A319-112 Fixed wing multi engine 2007
 

如果要删除分组,请使用ungroup

df <- df %>% ungroup()
class(df)
## [1] "tbl_df"     "tbl"        "data.frame"
 
attributes(df)$vars
## NULL
 
attributes(df)$labels
## NULL
 

总结

summarise 用于对数据集进行整体计算或按组进行计算。

找出每个manufacturermean seats 数?

planes %>% 
  group_by(manufacturer) %>% 
  summarise(Mean = mean(seats))
## # A tibble: 35 × 2
##              manufacturer     Mean
##                     <chr>    <dbl>
## 1              AGUSTA SPA   8.0000
## 2                  AIRBUS 221.2024
## 3        AIRBUS INDUSTRIE 187.4025
## 4   AMERICAN AIRCRAFT INC   2.0000
## 5      AVIAT AIRCRAFT INC   2.0000
## 6  AVIONS MARCEL DASSAULT  12.0000
## 7           BARKER JACK L   2.0000
## 8                   BEECH   9.5000
## 9                    BELL   8.0000
## 10                 BOEING 175.1877
## # ... with 25 more rows
 

summarise 不会返回未明确分组或包含在汇总函数中的变量。如果你想添加另一个变量,您必须把它作为一个谓词group_bysummarise

planes %>% 
  group_by(year, manufacturer) %>% 
  summarise(Mean = mean(seats))
## Source: local data frame [164 x 3]
## Groups: year [?]
## 
##     year manufacturer  Mean
##    <int>        <chr> <dbl>
## 1   1956      DOUGLAS   102
## 2   1959       CESSNA     2
## 3   1959  DEHAVILLAND    16
## 4   1963       CESSNA     5
## 5   1965       BOEING   149
## 6   1967        BEECH     9
## 7   1968        PIPER     4
## 8   1972        BEECH    10
## 9   1973       CESSNA     6
## 10  1974 CANADAIR LTD     2
## # ... with 154 more rows
 

改名

rename 变量:

planes %>% 
  rename(Mfr = manufacturer) %>% 
  names()
## [1] "tailnum" "year"    "type"    "Mfr"     "model"   "engines" "seats"  
## [8] "speed"   "engine"
 

助手功能

辅助函数与select 结合使用以识别要返回的变量。除非另有说明,否则这些函数需要将字符串作为第一个参数match 。传递矢量或其他对象将产生错误。

library(dplyr)
library(nycflights13)

以。。开始

starts_with 允许我们识别名称以字符串开头的变量。

返回以字母“e”开头的所有变量。

planes %>% select(starts_with("e"))
## # A tibble: 3,322 × 2
##    engines    engine
##      <int>     <chr>
## 1        2 Turbo-fan
## 2        2 Turbo-fan
## 3        2 Turbo-fan
## 4        2 Turbo-fan
## 5        2 Turbo-fan
## 6        2 Turbo-fan
## 7        2 Turbo-fan
## 8        2 Turbo-fan
## 9        2 Turbo-fan
## 10       2 Turbo-fan
## # ... with 3,312 more rows
 

对于严格的套管,将ignore.case 参数设置为FALSE。

planes %>% select(starts_with("E", ignore.case = FALSE))
## # A tibble: 3,322 × 0
 

以。。结束

返回以字母“e”结尾的所有变量。

planes %>% select(ends_with("e"))
## # A tibble: 3,322 × 2
##                       type    engine
##                      <chr>     <chr>
## 1  Fixed wing multi engine Turbo-fan
## 2  Fixed wing multi engine Turbo-fan
## 3  Fixed wing multi engine Turbo-fan
## 4  Fixed wing multi engine Turbo-fan
## 5  Fixed wing multi engine Turbo-fan
## 6  Fixed wing multi engine Turbo-fan
## 7  Fixed wing multi engine Turbo-fan
## 8  Fixed wing multi engine Turbo-fan
## 9  Fixed wing multi engine Turbo-fan
## 10 Fixed wing multi engine Turbo-fan
## # ... with 3,312 more rows
 

对于严格的套管,将ignore.case 参数设置为FALSE。

planes %>% select(ends_with("E", ignore.case = FALSE))
## # A tibble: 3,322 × 0
 

包含

contains 允许您查找包含给定字符串的任何变量。

planes %>% select(contains("ea"))
## # A tibble: 3,322 × 2
##     year seats
##    <int> <int>
## 1   2004    55
## 2   1998   182
## 3   1999   182
## 4   1999   182
## 5   2002    55
## 6   1999   182
## 7   1999   182
## 8   1999   182
## 9   1999   182
## 10  1999   182
## # ... with 3,312 more rows
 

对于严格的套管,将ignore.case 参数设置为FALSE。

planes %>% select(contains("EA", ignore.case = FALSE))
## # A tibble: 3,322 × 0
 

火柴

matches 是唯一允许使用正则表达式的辅助函数。

返回名称至少为六个字母字符的所有变量:

planes %>% select(matches("[[:alpha:]]{6,}"))
## # A tibble: 3,322 × 4
##    tailnum     manufacturer engines    engine
##      <chr>            <chr>   <int>     <chr>
## 1   N10156          EMBRAER       2 Turbo-fan
## 2   N102UW AIRBUS INDUSTRIE       2 Turbo-fan
## 3   N103US AIRBUS INDUSTRIE       2 Turbo-fan
## 4   N104UW AIRBUS INDUSTRIE       2 Turbo-fan
## 5   N10575          EMBRAER       2 Turbo-fan
## 6   N105UW AIRBUS INDUSTRIE       2 Turbo-fan
## 7   N107US AIRBUS INDUSTRIE       2 Turbo-fan
## 8   N108UW AIRBUS INDUSTRIE       2 Turbo-fan
## 9   N109UW AIRBUS INDUSTRIE       2 Turbo-fan
## 10  N110UW AIRBUS INDUSTRIE       2 Turbo-fan
## # ... with 3,312 more rows
 

对于严格的套管,将ignore.case 参数设置为FALSE。

num_range

对于此示例,我将生成具有随机值和顺序变量名称的虚拟数据帧。

set.seed(1)
df <- data.frame(x1 = runif(10), 
                 x2 = runif(10), 
                 x3 = runif(10), 
                 x4 = runif(10), 
                 x5 = runif(10))

num_range 可用于在给定一致prefix 选择一系列变量。

df 选择变量2:4:

df %>% select(num_range('x', range = 2:4))
##           x2         x3        x4
## 1  0.2059746 0.93470523 0.4820801
## 2  0.1765568 0.21214252 0.5995658
## 3  0.6870228 0.65167377 0.4935413
## 4  0.3841037 0.12555510 0.1862176
## 5  0.7698414 0.26722067 0.8273733
## 6  0.4976992 0.38611409 0.6684667
## 7  0.7176185 0.01339033 0.7942399
## 8  0.9919061 0.38238796 0.1079436
## 9  0.3800352 0.86969085 0.7237109
## 10 0.7774452 0.34034900 0.4112744
 

one_of

one_of 可以将向量作为match 参数并返回每个变量。

planes %>% select(one_of(c("tailnum", "model")))
## # A tibble: 3,322 × 2
##    tailnum     model
##      <chr>     <chr>
## 1   N10156 EMB-145XR
## 2   N102UW  A320-214
## 3   N103US  A320-214
## 4   N104UW  A320-214
## 5   N10575 EMB-145LR
## 6   N105UW  A320-214
## 7   N107US  A320-214
## 8   N108UW  A320-214
## 9   N109UW  A320-214
## 10  N110UW  A320-214
## # ... with 3,312 more rows
 

一切

everything 都可以用来重新定位数据框中的变量。

使manufacturer 成为第一个变量,然后是所有剩余变量。

planes %>% select(manufacturer, everything())
## # A tibble: 3,322 × 9
##        manufacturer tailnum  year                    type     model
##               <chr>   <chr> <int>                   <chr>     <chr>
## 1           EMBRAER  N10156  2004 Fixed wing multi engine EMB-145XR
## 2  AIRBUS INDUSTRIE  N102UW  1998 Fixed wing multi engine  A320-214
## 3  AIRBUS INDUSTRIE  N103US  1999 Fixed wing multi engine  A320-214
## 4  AIRBUS INDUSTRIE  N104UW  1999 Fixed wing multi engine  A320-214
## 5           EMBRAER  N10575  2002 Fixed wing multi engine EMB-145LR
## 6  AIRBUS INDUSTRIE  N105UW  1999 Fixed wing multi engine  A320-214
## 7  AIRBUS INDUSTRIE  N107US  1999 Fixed wing multi engine  A320-214
## 8  AIRBUS INDUSTRIE  N108UW  1999 Fixed wing multi engine  A320-214
## 9  AIRBUS INDUSTRIE  N109UW  1999 Fixed wing multi engine  A320-214
## 10 AIRBUS INDUSTRIE  N110UW  1999 Fixed wing multi engine  A320-214
## # ... with 3,312 more rows, and 4 more variables: engines <int>,
## #   seats <int>, speed <int>, engine <chr>
 

其他助手

虽然:- 运算符不是dplyr 包的一部分,但我们仍然可以使用它们来识别要返回的变量。

定义要返回的包含范围的变量。

将每个变量从year 返回给manufacturer

planes %>% select(year:manufacturer)
## # A tibble: 3,322 × 3
##     year                    type     manufacturer
##    <int>                   <chr>            <chr>
## 1   2004 Fixed wing multi engine          EMBRAER
## 2   1998 Fixed wing multi engine AIRBUS INDUSTRIE
## 3   1999 Fixed wing multi engine AIRBUS INDUSTRIE
## 4   1999 Fixed wing multi engine AIRBUS INDUSTRIE
## 5   2002 Fixed wing multi engine          EMBRAER
## 6   1999 Fixed wing multi engine AIRBUS INDUSTRIE
## 7   1999 Fixed wing multi engine AIRBUS INDUSTRIE
## 8   1999 Fixed wing multi engine AIRBUS INDUSTRIE
## 9   1999 Fixed wing multi engine AIRBUS INDUSTRIE
## 10  1999 Fixed wing multi engine AIRBUS INDUSTRIE
## # ... with 3,312 more rows
 

返回多个变量范围:

planes %>% select(c(year:manufacturer, seats:engine))
## # A tibble: 3,322 × 6
##     year                    type     manufacturer seats speed    engine
##    <int>                   <chr>            <chr> <int> <int>     <chr>
## 1   2004 Fixed wing multi engine          EMBRAER    55    NA Turbo-fan
## 2   1998 Fixed wing multi engine AIRBUS INDUSTRIE   182    NA Turbo-fan
## 3   1999 Fixed wing multi engine AIRBUS INDUSTRIE   182    NA Turbo-fan
## 4   1999 Fixed wing multi engine AIRBUS INDUSTRIE   182    NA Turbo-fan
## 5   2002 Fixed wing multi engine          EMBRAER    55    NA Turbo-fan
## 6   1999 Fixed wing multi engine AIRBUS INDUSTRIE   182    NA Turbo-fan
## 7   1999 Fixed wing multi engine AIRBUS INDUSTRIE   182    NA Turbo-fan
## 8   1999 Fixed wing multi engine AIRBUS INDUSTRIE   182    NA Turbo-fan
## 9   1999 Fixed wing multi engine AIRBUS INDUSTRIE   182    NA Turbo-fan
## 10  1999 Fixed wing multi engine AIRBUS INDUSTRIE   182    NA Turbo-fan
## # ... with 3,312 more rows
 

-

- 运算符将从结果集中删除变量。

返回除type 外的所有变量:

planes %>% select(-type)
## # A tibble: 3,322 × 8
##    tailnum  year     manufacturer     model engines seats speed    engine
##      <chr> <int>            <chr>     <chr>   <int> <int> <int>     <chr>
## 1   N10156  2004          EMBRAER EMB-145XR       2    55    NA Turbo-fan
## 2   N102UW  1998 AIRBUS INDUSTRIE  A320-214       2   182    NA Turbo-fan
## 3   N103US  1999 AIRBUS INDUSTRIE  A320-214       2   182    NA Turbo-fan
## 4   N104UW  1999 AIRBUS INDUSTRIE  A320-214       2   182    NA Turbo-fan
## 5   N10575  2002          EMBRAER EMB-145LR       2    55    NA Turbo-fan
## 6   N105UW  1999 AIRBUS INDUSTRIE  A320-214       2   182    NA Turbo-fan
## 7   N107US  1999 AIRBUS INDUSTRIE  A320-214       2   182    NA Turbo-fan
## 8   N108UW  1999 AIRBUS INDUSTRIE  A320-214       2   182    NA Turbo-fan
## 9   N109UW  1999 AIRBUS INDUSTRIE  A320-214       2   182    NA Turbo-fan
## 10  N110UW  1999 AIRBUS INDUSTRIE  A320-214       2   182    NA Turbo-fan
## # ... with 3,312 more rows
 

您还可以传递变量名称向量以从结果集中排除。

planes %>% select(-c(type, engines:engine))
## # A tibble: 3,322 × 4
##    tailnum  year     manufacturer     model
##      <chr> <int>            <chr>     <chr>
## 1   N10156  2004          EMBRAER EMB-145XR
## 2   N102UW  1998 AIRBUS INDUSTRIE  A320-214
## 3   N103US  1999 AIRBUS INDUSTRIE  A320-214
## 4   N104UW  1999 AIRBUS INDUSTRIE  A320-214
## 5   N10575  2002          EMBRAER EMB-145LR
## 6   N105UW  1999 AIRBUS INDUSTRIE  A320-214
## 7   N107US  1999 AIRBUS INDUSTRIE  A320-214
## 8   N108UW  1999 AIRBUS INDUSTRIE  A320-214
## 9   N109UW  1999 AIRBUS INDUSTRIE  A320-214
## 10  N110UW  1999 AIRBUS INDUSTRIE  A320-214
## # ... with 3,312 more rows
 

辅助函数的任意组合

选择typespeed (包括)之间的所有变量并排除manufacturer

planes %>% select(type:speed, -manufacturer)
## # A tibble: 3,322 × 5
##                       type     model engines seats speed
##                      <chr>     <chr>   <int> <int> <int>
## 1  Fixed wing multi engine EMB-145XR       2    55    NA
## 2  Fixed wing multi engine  A320-214       2   182    NA
## 3  Fixed wing multi engine  A320-214       2   182    NA
## 4  Fixed wing multi engine  A320-214       2   182    NA
## 5  Fixed wing multi engine EMB-145LR       2    55    NA
## 6  Fixed wing multi engine  A320-214       2   182    NA
## 7  Fixed wing multi engine  A320-214       2   182    NA
## 8  Fixed wing multi engine  A320-214       2   182    NA
## 9  Fixed wing multi engine  A320-214       2   182    NA
## 10 Fixed wing multi engine  A320-214       2   182    NA
## # ... with 3,312 more rows
 

修改上一个语句以排除manufacturermodel

planes %>% select(type:speed, -c(manufacturer, model))
## # A tibble: 3,322 × 4
##                       type engines seats speed
##                      <chr>   <int> <int> <int>
## 1  Fixed wing multi engine       2    55    NA
## 2  Fixed wing multi engine       2   182    NA
## 3  Fixed wing multi engine       2   182    NA
## 4  Fixed wing multi engine       2   182    NA
## 5  Fixed wing multi engine       2    55    NA
## 6  Fixed wing multi engine       2   182    NA
## 7  Fixed wing multi engine       2   182    NA
## 8  Fixed wing multi engine       2   182    NA
## 9  Fixed wing multi engine       2   182    NA
## 10 Fixed wing multi engine       2   182    NA
## # ... with 3,312 more rows
 

您可以多次使用相同的辅助函数。

planes %>% select(starts_with("m"), starts_with("s"))
## # A tibble: 3,322 × 4
##        manufacturer     model seats speed
##               <chr>     <chr> <int> <int>
## 1           EMBRAER EMB-145XR    55    NA
## 2  AIRBUS INDUSTRIE  A320-214   182    NA
## 3  AIRBUS INDUSTRIE  A320-214   182    NA
## 4  AIRBUS INDUSTRIE  A320-214   182    NA
## 5           EMBRAER EMB-145LR    55    NA
## 6  AIRBUS INDUSTRIE  A320-214   182    NA
## 7  AIRBUS INDUSTRIE  A320-214   182    NA
## 8  AIRBUS INDUSTRIE  A320-214   182    NA
## 9  AIRBUS INDUSTRIE  A320-214   182    NA
## 10 AIRBUS INDUSTRIE  A320-214   182    NA
## # ... with 3,312 more rows
 

您可以一起使用多个辅助函数:

planes %>% select(starts_with("m"), ends_with("l"))
## # A tibble: 3,322 × 2
##        manufacturer     model
##               <chr>     <chr>
## 1           EMBRAER EMB-145XR
## 2  AIRBUS INDUSTRIE  A320-214
## 3  AIRBUS INDUSTRIE  A320-214
## 4  AIRBUS INDUSTRIE  A320-214
## 5           EMBRAER EMB-145LR
## 6  AIRBUS INDUSTRIE  A320-214
## 7  AIRBUS INDUSTRIE  A320-214
## 8  AIRBUS INDUSTRIE  A320-214
## 9  AIRBUS INDUSTRIE  A320-214
## 10 AIRBUS INDUSTRIE  A320-214
## # ... with 3,312 more rows
 

安装或设置

要安装dplyr 只需在R控制台中键入即可。

install.packages("dplyr") 
 

然后加载dplyr ,输入

library("dplyr")
 

也可以从Github安装最新的开发版本:

if (packageVersion("devtools") < 1.6) {
  install.packages("devtools")
}
devtools::install_github("hadley/lazyeval")
devtools::install_github("hadley/dplyr")
 

您可能希望安装大多数示例中使用的数据包: install.packages(c("nycflights13", "Lahman"))