我的数据对每个 ID 有多个观察结果。在 ID 级别,我想将所有值转换为最新的非缺失值。我尝试过使用 mutate、group_by(id) 和which.max(year),但没有成功。
数据:
data <- data.frame(
id=c(1,1,2,2,3,3,4,4,5,5),
year=rep(c(2010, 2011), 5),
employ=c("yes", "yes", "no", "yes", "yes", "no", NA, "yes", "no", NA))
> data
id year employ
1 1 2010 yes
2 1 2011 yes
3 2 2010 no
4 2 2011 yes
5 3 2010 yes
6 3 2011 no
7 4 2010 <NA>
8 4 2011 yes
9 5 2010 no
10 5 2011 <NA>
所需输出:
data2 <- data.frame(
id=c(1,1,2,2,3,3,4,4,5,5),
year=c(2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2010, 2010),
employ=c("yes", "yes", "yes", "yes", "no", "no","yes", "yes","no", "no"))
> data2
id year employ
1 1 2011 yes
2 1 2011 yes
3 2 2011 yes
4 2 2011 yes
5 3 2011 no
6 3 2011 no
7 4 2011 yes
8 4 2011 yes
9 5 2010 no
10 5 2010 no
一个
data.table
选项
setDT(data)[, employ := last(na.omit(employ[order(year)])), id]
给予
id year employ
1: 1 2010 yes
2: 1 2011 yes
3: 2 2010 yes
4: 2 2011 yes
5: 3 2010 no
6: 3 2011 no
7: 4 2010 yes
8: 4 2011 yes
9: 5 2010 no
10: 5 2011 no
一种
dplyr
方式可能是
data %>%
group_by(id) %>%
mutate(employ = last(na.omit(employ[order(year)])))
这给出了
id year employ
<dbl> <dbl> <chr>
1 1 2010 yes
2 1 2011 yes
3 2 2010 yes
4 2 2011 yes
5 3 2010 no
6 3 2011 no
7 4 2010 yes
8 4 2011 yes
9 5 2010 no
10 5 2011 no