我有一个县行政长官的数据框架以及他们就职的年份。
我正在进行一项以县年为分析单位的小组研究。日期范围是 2000 年至 2004 年。
我想扩展 df,使其列出 2000 年至 2004 年间每年担任县行政长官的人。
有什么收获?有些地区是在我的分析期间创建的。
我的出发点是这样的:
df <- data.frame(year= c(2000, 2001, 2003, 2000, 2002, 2004, 2003),
executive.name= c("Johnson", "Smith", "Alleghany", "Roberts", "Clarke", "Tollson", "Roland"),
party= c("PartyRed", "PartyYellow", "PartyGreen", "PartyYellow", "PartyOrange", "PartyRed", "PartyPurple"),
district= c(1001, 1001, 1001, 1002, 1002, 1002, 1003))
year executive.name party district
1 2000 Johnson PartyRed 1001
2 2001 Smith PartyYellow 1001
3 2003 Alleghany PartyGreen 1001
4 2000 Roberts PartyYellow 1002
5 2002 Clarke PartyOrange 1002
6 2004 Tollson PartyRed 1002
7 2003 Roland PartyPurple 1003
我希望我的 df 看起来像这样:
df.neat <- data.frame(year= c(2000, 2001, 2002, 2003, 2004, 2000, 2001, 2002, 2003, 2004, 2003, 2004),
executive.name= c("Johnson", "Smith", "Smith", "Alleghany", "Alleghany", "Roberts", "Roberts", "Clarke", "Clarke", "Tollson", "Roland", "Roland"),
party= c("PartyRed", "PartyYellow", "PartyYellow", "PartyGreen", "PartyGreen", "PartyYellow", "PartyYellow", "PartyOrange", "PartyOrange", "PartyRed", "PartyPurple", "PartyPurple"),
district= c(1001, 1001, 1001, 1001, 1001, 1002, 1002, 1002, 1002, 1002, 1003, 1003))
> df.neat
year executive.name party district
1 2000 Johnson PartyRed 1001
2 2001 Smith PartyYellow 1001
3 2002 Smith PartyYellow 1001
4 2003 Alleghany PartyGreen 1001
5 2004 Alleghany PartyGreen 1001
6 2000 Roberts PartyYellow 1002
7 2001 Roberts PartyYellow 1002
8 2002 Clarke PartyOrange 1002
9 2003 Clarke PartyOrange 1002
10 2004 Tollson PartyRed 1002
11 2003 Roland PartyPurple 1003
12 2004 Roland PartyPurple 1003
注意 1003 区是如何在 2003 年建立的。如果我运行
complete
命令,则假定 2000、2001 和 2002 是隐式 NA。然后 fill
向下拖动 1002 区的最后一个结果来填充这些新行。
换句话说,我的 df 看起来像这样:
df |>
tidyr::complete(district, year) |>
tidyr::fill(executive.name, party)
# A tibble: 15 × 4
district year executive.name party
<dbl> <dbl> <chr> <chr>
1 1001 2000 Johnson PartyRed
2 1001 2001 Smith PartyYellow
3 1001 2002 Smith PartyYellow
4 1001 2003 Alleghany PartyGreen
5 1001 2004 Alleghany PartyGreen
6 1002 2000 Roberts PartyYellow
7 1002 2001 Roberts PartyYellow
8 1002 2002 Clarke PartyOrange
9 1002 2003 Clarke PartyOrange
10 1002 2004 Tollson PartyRed
11 1003 2000 Tollson PartyRed
12 1003 2001 Tollson PartyRed
13 1003 2002 Tollson PartyRed
14 1003 2003 Roland PartyPurple
15 1003 2004 Roland PartyPurple
这会起作用:
library(dplyr)
library(tidyr)
df <- data.frame(year= c(2000, 2001, 2003, 2000, 2002, 2004, 2003),
executive.name= c("Johnson", "Smith", "Alleghany", "Roberts", "Clarke", "Tollson", "Roland"),
party= c("PartyRed", "PartyYellow", "PartyGreen", "PartyYellow", "PartyOrange", "PartyRed", "PartyPurple"),
district= c(1001, 1001, 1001, 1002, 1002, 1002, 1003))
df.neat <- df %>%
complete(year = 2000:2004, district = distinct(., district) |>
pull(district)) |>
group_by(district) |>
fill(executive.name, party, .direction = "down") |>
ungroup() |>
arrange(district, year) |>
filter(!is.na(executive.name))
df.neat
# A tibble: 12 × 4
# year district executive.name party
# <dbl> <dbl> <chr> <chr>
# 1 2000 1001 Johnson PartyRed
# 2 2001 1001 Smith PartyYellow
# 3 2002 1001 Smith PartyYellow
# 4 2003 1001 Alleghany PartyGreen
# 5 2004 1001 Alleghany PartyGreen
# 6 2000 1002 Roberts PartyYellow
# 7 2001 1002 Roberts PartyYellow
# 8 2002 1002 Clarke PartyOrange
# 9 2003 1002 Clarke PartyOrange
# 10 2004 1002 Tollson PartyRed
# 11 2003 1003 Roland PartyPurple
# 12 2004 1003 Roland PartyPurple