pivot()、group_by() 和 summarise() 以及嵌套数据

问题描述 投票:0回答:1

模拟数据:

df <- structure(list(stop1 = c("New York", "Milwaukee", "New York",
                               "Los Angeles", NA, "Milwaukee"),
                     stop2 = c(NA, "New York", "Los Angeles", 
                               "New York", NA, "New York"),
                     stop1_apple = c("apple", "apple", NA, "apple", NA, "apple"),
                     stop1_pear = c("pear", "pear", "pear", NA, NA, "pear"),
                     stop2_apple = c(NA, "apple", "apple", NA, NA, "apple"), 
                     stop2_pear = c(NA, "pear", "pear", "pear", NA, NA)),
                class = "data.frame", row.names = c(NA, -6L))

df
        stop1       stop2 stop1_apple stop1_pear stop2_apple stop2_pear
1    New York        <NA>       apple       pear        <NA>       <NA>
2   Milwaukee    New York       apple       pear       apple       pear
3    New York Los Angeles        <NA>       pear       apple       pear
4 Los Angeles    New York       apple       <NA>        <NA>       pear
5        <NA>        <NA>        <NA>       <NA>        <NA>       <NA>
6   Milwaukee    New York       apple       pear       apple       <NA>

如何阅读:

每一排都是一个旅行者。例如,第三排是一位在纽约和洛杉矶停留的旅行者。在纽约的第一站,她吃了一个梨。在洛杉矶的第二站,她吃了一个苹果和一个梨。

我想做的事:

首先,我想计算每个城市过境的总人数。例如,这里共有 5 人在纽约转机(2 人在 stop1,3 人在 stop2)。其次,我想计算每个城市吃掉的苹果和梨的数量。例如,在纽约,吃了3个苹果和4个梨。

所需输出:
df
Location     NStops  NApples   NPears
Los Angeles  2       2         1
Milwaukee    2       2         2
New York     5       3         4 

我尝试过的:

library(tidyverse)
df %>% pivot_longer(c(stop1, stop2)) %>% 
  rename(Location = value, Stop = name) %>% 
  add_count(Location, name = "NStops") %>% 
  pivot_longer(c(stop1_apple, stop1_pear, stop2_apple, stop2_pear)) %>% 
  group_by(Location, value, NStops) %>% 
  summarise(NFruits = n()) %>% 
  pivot_wider(names_from = value, values_from = NFruits) %>% 
  rename(NApples = apple, NPears = pear) %>% 
  select(-`NA`) %>% 
  filter(!is.na(Location))
  

Output:
Location    NStops NApples NPears
Los Angeles      2       2      3
Milwaukee        2       4      3
New York         5       7      7

每个城市的停靠次数是正确的,但是吃到的水果数量不符合预期,见上面的期望输出。另请注意,实际上,我有超过 2 个站点、超过 2 个水果和数十个城市。最后,如果可能的话,我更喜欢

tidyverse
解决方案。

r nested
1个回答
0
投票
library(dplyr)
library(tidyr)

df <- structure(list(stop1 = c("New York", "Milwaukee", "New York",
                               "Los Angeles", NA, "Milwaukee"),
                     stop2 = c(NA, "New York", "Los Angeles", 
                               "New York", NA, "New York"),
                     stop1_apple = c("apple", "apple", NA, "apple", NA, "apple"),
                     stop1_pear = c("pear", "pear", "pear", NA, NA, "pear"),
                     stop2_apple = c(NA, "apple", "apple", NA, NA, "apple"), 
                     stop2_pear = c(NA, "pear", "pear", "pear", NA, NA)),
                class = "data.frame", row.names = c(NA, -6L))

df %>% 
  rename(
    stop1_stop = stop1, 
    stop2_stop = stop2
  ) %>% 
  pivot_longer(
    everything(), 
    names_pattern = "stop\\d_(.*)", 
    names_to = ".value"
  ) %>% 
  summarize(
    n_stop = n(),
    n_apple = sum(apple == "apple", na.rm = TRUE),
    n_pear = sum(pear == "pear", na.rm = TRUE),
    .by = stop
  ) %>% 
  filter(!is.na(stop))
#> # A tibble: 3 × 4
#>   stop        n_stop n_apple n_pear
#>   <chr>        <int>   <int>  <int>
#> 1 New York         5       3      4
#> 2 Milwaukee        2       2      2
#> 3 Los Angeles      2       2      1

创建于 2024-10-09,使用 reprex v2.1.1

最新问题
© www.soinside.com 2019 - 2024. All rights reserved.