用长度不均匀的组完成并填充缺失的行

问题描述 投票:0回答:1

我有一个县行政长官的数据框架以及他们就职的年份。

我正在进行一项以县年为分析单位的小组研究。日期范围是 2000 年至 2004 年。

我想扩展 df,使其列出 2000 年至 2004 年间每年担任县行政长官的人。

有什么收获?有些地区是在我的分析期间创建的。

我的出发点是这样的:

df <- data.frame(year= c(2000, 2001, 2003, 2000, 2002, 2004, 2003),
                  executive.name= c("Johnson", "Smith", "Alleghany", "Roberts", "Clarke", "Tollson", "Roland"),
                 party= c("PartyRed", "PartyYellow", "PartyGreen", "PartyYellow", "PartyOrange", "PartyRed", "PartyPurple"),
                  district= c(1001, 1001, 1001, 1002, 1002, 1002, 1003))

  year executive.name       party district
1 2000        Johnson    PartyRed     1001
2 2001          Smith PartyYellow     1001
3 2003      Alleghany  PartyGreen     1001
4 2000        Roberts PartyYellow     1002
5 2002         Clarke PartyOrange     1002
6 2004        Tollson    PartyRed     1002
7 2003         Roland PartyPurple     1003

我希望我的 df 看起来像这样:

df.neat <- data.frame(year= c(2000, 2001, 2002, 2003, 2004, 2000, 2001, 2002, 2003, 2004, 2003, 2004),
                  executive.name= c("Johnson", "Smith", "Smith", "Alleghany", "Alleghany", "Roberts", "Roberts", "Clarke", "Clarke", "Tollson", "Roland", "Roland"),
                  party= c("PartyRed", "PartyYellow", "PartyYellow", "PartyGreen", "PartyGreen", "PartyYellow", "PartyYellow", "PartyOrange", "PartyOrange", "PartyRed", "PartyPurple", "PartyPurple"),
                  district= c(1001, 1001, 1001, 1001, 1001, 1002, 1002, 1002, 1002, 1002, 1003, 1003))

> df.neat
   year executive.name       party district
1  2000        Johnson    PartyRed     1001
2  2001          Smith PartyYellow     1001
3  2002          Smith PartyYellow     1001
4  2003      Alleghany  PartyGreen     1001
5  2004      Alleghany  PartyGreen     1001
6  2000        Roberts PartyYellow     1002
7  2001        Roberts PartyYellow     1002
8  2002         Clarke PartyOrange     1002
9  2003         Clarke PartyOrange     1002
10 2004        Tollson    PartyRed     1002
11 2003         Roland PartyPurple     1003
12 2004         Roland PartyPurple     1003

注意 1003 区是如何在 2003 年建立的。如果我运行

complete
命令,则假定 2000、2001 和 2002 是隐式 NA。然后
fill
向下拖动 1002 区的最后一个结果来填充这些新行。 换句话说,我的 df 看起来像这样:

df |>
  tidyr::complete(district, year) |>
  tidyr::fill(executive.name, party)

# A tibble: 15 × 4
   district  year executive.name party      
      <dbl> <dbl> <chr>          <chr>      
 1     1001  2000 Johnson        PartyRed   
 2     1001  2001 Smith          PartyYellow
 3     1001  2002 Smith          PartyYellow
 4     1001  2003 Alleghany      PartyGreen 
 5     1001  2004 Alleghany      PartyGreen 
 6     1002  2000 Roberts        PartyYellow
 7     1002  2001 Roberts        PartyYellow
 8     1002  2002 Clarke         PartyOrange
 9     1002  2003 Clarke         PartyOrange
10     1002  2004 Tollson        PartyRed   
11     1003  2000 Tollson        PartyRed   
12     1003  2001 Tollson        PartyRed   
13     1003  2002 Tollson        PartyRed   
14     1003  2003 Roland         PartyPurple
15     1003  2004 Roland         PartyPurple
r database dataframe tidyverse data-wrangling
1个回答
0
投票

这会起作用:

library(dplyr)
library(tidyr)

df <- data.frame(year= c(2000, 2001, 2003, 2000, 2002, 2004, 2003),
                 executive.name= c("Johnson", "Smith", "Alleghany", "Roberts", "Clarke", "Tollson", "Roland"),
                 party= c("PartyRed", "PartyYellow", "PartyGreen", "PartyYellow", "PartyOrange", "PartyRed", "PartyPurple"),
                 district= c(1001, 1001, 1001, 1002, 1002, 1002, 1003))

df.neat <- df %>%
  complete(year = 2000:2004, district = distinct(., district) |> 
             pull(district)) |>
  group_by(district) |>
  fill(executive.name, party, .direction = "down") |>
  ungroup() |>
  arrange(district, year) |>
  filter(!is.na(executive.name))

df.neat
# A tibble: 12 × 4
#     year district executive.name party      
#    <dbl>    <dbl> <chr>          <chr>      
#  1  2000     1001 Johnson        PartyRed   
#  2  2001     1001 Smith          PartyYellow
#  3  2002     1001 Smith          PartyYellow
#  4  2003     1001 Alleghany      PartyGreen 
#  5  2004     1001 Alleghany      PartyGreen 
#  6  2000     1002 Roberts        PartyYellow
#  7  2001     1002 Roberts        PartyYellow
#  8  2002     1002 Clarke         PartyOrange
#  9  2003     1002 Clarke         PartyOrange
# 10  2004     1002 Tollson        PartyRed   
# 11  2003     1003 Roland         PartyPurple
# 12  2004     1003 Roland         PartyPurple
© www.soinside.com 2019 - 2024. All rights reserved.