每天 5 分钟,轻轻松松上手 R 语言(五)

每天 5 分钟,轻轻松松上手 R 语言(五)

今天我们依旧利用 msleep 数据集来探讨 dplyr 的列筛选,并在最后补充几个行筛选的例子。

切片选择

  • 选择某列到某列的数据
R
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
msleep %>%
select(name:order)
#> # A tibble: 83 x 4
#> name genus vore order
#> <chr> <chr> <chr> <chr>
#> 1 Cheetah Acinonyx carni Carnivora
#> 2 Owl monkey Aotus omni Primates
#> 3 Mountain beaver Aplodontia herbi Rodentia
#> 4 Greater short-tailed shrew Blarina omni Soricomorpha
#> 5 Cow Bos herbi Artiodactyla
#> 6 Three-toed sloth Bradypus herbi Pilosa
#> 7 Northern fur seal Callorhinus carni Carnivora
#> 8 Vesper mouse Calomys NA Rodentia
#> 9 Dog Canis carni Carnivora
#> 10 Roe deer Capreolus herbi Artiodactyla
#> # … with 73 more rows
  • 去除某列到某列数据

去除 sleep_total 到 awake 列:

R
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
msleep %>% select(-(sleep_total:awake))
#> # A tibble: 83 x 7
#> name genus vore order conservation brainwt bodywt
#> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl>
#> 1 Cheetah Acinonyx carni Carnivora lc NA 50
#> 2 Owl monkey Aotus omni Primates NA 0.0155 0.48
#> 3 Mountain beaver Aplodont… herbi Rodentia nt NA 1.35
#> 4 Greater short-tailed… Blarina omni Soricomor… lc 0.00029 0.019
#> 5 Cow Bos herbi Artiodact… domesticated 0.423 600
#> 6 Three-toed sloth Bradypus herbi Pilosa NA NA 3.85
#> 7 Northern fur seal Callorhi… carni Carnivora vu NA 20.5
#> 8 Vesper mouse Calomys NA Rodentia NA NA 0.045
#> 9 Dog Canis carni Carnivora domesticated 0.07 14
#> 10 Roe deer Capreolus herbi Artiodact… lc 0.0982 14.8
#> # … with 73 more rows
  • 删除 sleep_total 到 awake|的数据,但保留 sleep_rem。
R
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
msleep %>% select(-(sleep_total:awake),sleep_rem)
#> # A tibble: 83 x 8
#> name genus vore order conservation brainwt bodywt sleep_rem
#> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
#> 1 Cheetah Acinon… carni Carnivo… lc NA 50 NA
#> 2 Owl monkey Aotus omni Primates NA 0.0155 0.48 1.8
#> 3 Mountain beaver Aplodo… herbi Rodentia nt NA 1.35 2.4
#> 4 Greater short-… Blarina omni Soricom… lc 0.00029 0.019 2.3
#> 5 Cow Bos herbi Artioda… domesticated 0.423 600 0.7
#> 6 Three-toed slo… Bradyp… herbi Pilosa NA NA 3.85 2.2
#> 7 Northern fur s… Callor… carni Carnivo… vu NA 20.5 1.4
#> 8 Vesper mouse Calomys NA Rodentia NA NA 0.045 NA
#> 9 Dog Canis carni Carnivo… domesticated 0.07 14 2.9
#> 10 Roe deer Capreo… herbi Artioda… lc 0.0982 14.8 NA
#> # … with 73 more rows

基于模式匹配选择

select() 语法 : select(data , ….)
data : Data Frame
…. : 变量名或者是 function

前面的基本都是变量名,下面我们来看几个 function 的例子

  • 选择以 sleep 开头的列
R
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
msleep %>% select(name,starts_with('sleep'))
#> # A tibble: 83 x 4
#> name sleep_total sleep_rem sleep_cycle
#> <chr> <dbl> <dbl> <dbl>
#> 1 Cheetah 12.1 NA NA
#> 2 Owl monkey 17 1.8 NA
#> 3 Mountain beaver 14.4 2.4 NA
#> 4 Greater short-tailed shrew 14.9 2.3 0.133
#> 5 Cow 4 0.7 0.667
#> 6 Three-toed sloth 14.4 2.2 0.767
#> 7 Northern fur seal 8.7 1.4 0.383
#> 8 Vesper mouse 7 NA NA
#> 9 Dog 10.1 2.9 0.333
#> 10 Roe deer 3 NA NA
#> # … with 73 more rows

类似的 function 还有

函数 解释
starts_with() Starts with a prefix
ends_with() Ends with a prefix
contains() Contains a literal string
matches() Matches a regular expression
num_range() Numerical range like x01, x02, x03.
one_of() Variables in character vector.
everything() All variables.

我们再来看几个例子

选择列名中含有正则 o.+er模式的, . 代表任意字符,+ 表示一个或多个

R
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
msleep %>% select(matches('o.+er'))
#> # A tibble: 83 x 2
#> order conservation
#> <chr> <chr>
#> 1 Carnivora lc
#> 2 Primates NA
#> 3 Rodentia nt
#> 4 Soricomorpha lc
#> 5 Artiodactyla domesticated
#> 6 Pilosa NA
#> 7 Carnivora vu
#> 8 Rodentia NA
#> 9 Carnivora domesticated
#> 10 Artiodactyla lc
#> # … with 73 more rows
  • 选择包含字符串 serv 的列
R
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
msleep %>% select(contains('serv'))
#> #> A tibble: 83 x 1
#> conservation
#> <chr>
#> 1 lc
#> 2 NA
#> 3 nt
#> 4 lc
#> 5 domesticated
#> 6 NA
#> 7 vu
#> 8 NA
#> 9 domesticated
#> 10 lc
#> # … with 73 more rows
  • 选择所有列并重新排序

将 awake 列放在第一列

R
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
msleep %>% select(awake,everything())
#> # A tibble: 83 x 11
#> awake name genus vore order conservation sleep_total sleep_rem sleep_cycle
#> <dbl> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
#> 1 11.9 Chee… Acin… carni Carn… lc 12.1 NA NA
#> 2 7 Owl … Aotus omni Prim… NA 17 1.8 NA
#> 3 9.6 Moun… Aplo… herbi Rode… nt 14.4 2.4 NA
#> 4 9.1 Grea… Blar… omni Sori… lc 14.9 2.3 0.133
#> 5 20 Cow Bos herbi Arti… domesticated 4 0.7 0.667
#> 6 9.6 Thre… Brad… herbi Pilo… NA 14.4 2.2 0.767
#> 7 15.3 Nort… Call… carni Carn… vu 8.7 1.4 0.383
#> 8 17 Vesp… Calo… NA Rode… NA 7 NA NA
#> 9 13.9 Dog Canis carni Carn… domesticated 10.1 2.9 0.333
#> 10 21 Roe … Capr… herbi Arti… lc 3 NA NA
#> # … with 73 more rows, and 2 more variables: brainwt <dbl>, bodywt <dbl>

补充几个行筛选

  • 随机选择5个样本
R
1
2
3
4
5
6
7
8
9
10
msleep %>% sample_n(5)
#> # A tibble: 5 x 11
#> name genus vore order conservation sleep_total sleep_rem sleep_cycle awake
#> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 Star… Cond… omni Sori… lc 10.3 2.2 NA 13.7
#> 2 Donk… Equus herbi Peri… domesticated 3.1 0.4 NA 20.9
#> 3 Musk… Sunc… NA Sori… NA 12.8 2 0.183 11.2
#> 4 Pig Sus omni Arti… domesticated 9.1 2.4 0.5 14.9
#> 5 Hous… Mus herbi Rode… nt 12.5 1.4 0.183 11.5
#> # … with 2 more variables: brainwt <dbl>, bodywt <dbl>
  • 随机选择 10% 的样本
R
1
2
3
4
5
6
7
8
9
10
11
12
13
msleep %>% sample_frac(0.1)
#> # A tibble: 8 x 11
#> name genus vore order conservation sleep_total sleep_rem sleep_cycle awake
#> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 Big … Epte… inse… Chir… lc 19.7 3.9 0.117 4.3
#> 2 East… Tami… herbi Rode… NA 15.8 NA NA 8.2
#> 3 Braz… Tapi… herbi Peri… vu 4.4 1 0.9 19.6
#> 4 Pilo… Glob… carni Ceta… cd 2.7 0.1 NA 21.4
#> 5 Musk… Sunc… NA Sori… NA 12.8 2 0.183 11.2
#> 6 Chim… Pan omni Prim… NA 9.7 1.4 1.42 14.3
#> 7 Slow… Nyct… carni Prim… NA 11 NA NA 13
#> 8 Red … Vulp… carni Carn… NA 9.8 2.4 0.35 14.2
#> # … with 2 more variables: brainwt <dbl>, bodywt <dbl>
  • 去除重复的观测值

没有完全重复的值,所以所有的值都选到了。

R
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
msleep %>% distinct()
#> # A tibble: 83 x 11
#> name genus vore order conservation sleep_total sleep_rem sleep_cycle awake
#> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 Chee… Acin… carni Carn… lc 12.1 NA NA 11.9
#> 2 Owl … Aotus omni Prim… NA 17 1.8 NA 7
#> 3 Moun… Aplo… herbi Rode… nt 14.4 2.4 NA 9.6
#> 4 Grea… Blar… omni Sori… lc 14.9 2.3 0.133 9.1
#> 5 Cow Bos herbi Arti… domesticated 4 0.7 0.667 20
#> 6 Thre… Brad… herbi Pilo… NA 14.4 2.2 0.767 9.6
#> 7 Nort… Call… carni Carn… vu 8.7 1.4 0.383 15.3
#> 8 Vesp… Calo… NA Rode… NA 7 NA NA 17
#> 9 Dog Canis carni Carn… domesticated 10.1 2.9 0.333 13.9
#> 10 Roe … Capr… herbi Arti… lc 3 NA NA 21
#> # … with 73 more rows, and 2 more variables: brainwt <dbl>, bodywt <dbl>
  • 去除 sleep_total 重复的观测值

设置 .keep_all 将保留所有其他变量

R
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
msleep %>% distinct(sleep_total,.keep_all = TRUE)
#> # A tibble: 65 x 11
#> name genus vore order conservation sleep_total sleep_rem sleep_cycle awake
#> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 Chee… Acin… carni Carn… lc 12.1 NA NA 11.9
#> 2 Owl … Aotus omni Prim… NA 17 1.8 NA 7
#> 3 Moun… Aplo… herbi Rode… nt 14.4 2.4 NA 9.6
#> 4 Grea… Blar… omni Sori… lc 14.9 2.3 0.133 9.1
#> 5 Cow Bos herbi Arti… domesticated 4 0.7 0.667 20
#> 6 Nort… Call… carni Carn… vu 8.7 1.4 0.383 15.3
#> 7 Vesp… Calo… NA Rode… NA 7 NA NA 17
#> 8 Dog Canis carni Carn… domesticated 10.1 2.9 0.333 13.9
#> 9 Roe … Capr… herbi Arti… lc 3 NA NA 21
#> 10 Goat Capri herbi Arti… lc 5.3 0.6 NA 18.7
#> # … with 55 more rows, and 2 more variables: brainwt <dbl>, bodywt <dbl>
# R

评论

Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now

×