我的数据集中有一列如下所示:
cluster_id
1
1
1
1
NA
1
NA
NA
2
NA
2
Na
3
NA
NA
3
cluster_id <- c("1","1","1","1","NA","1","NA","NA","2","NA","2","NA","3","NA","NA","3")
在使用时间列之前已经预先定义了顺序。我想要的是替换每个簇 ID 内的 NA,即,如果有一行包含 2,然后是 NA,然后再次是 2,我希望 NA 变为 2。数字之间的 NA 保持为 NA。示例:
cluster_id cluster_id_new
1 1
1 1
1 1
1 1
NA 1
1 1
NA NA
NA NA
2 2
NA 2
2 2
NA NA
3 3
NA 3
NA 3
3 3
我在
这篇文章中找到了
zoo::na.locf
函数,这似乎很接近我想要的,但我还需要考虑NA之后之后的值。
有什么想法吗?
cluster_id <- c("1","1","1","1","NA","1","NA","NA","2","NA","2","NA","3","NA","NA","3")
for (i in seq_along(cluster_id[-1])) {
if(cluster_id[i + 1] == "NA") {
for(j in (i + 1):length(cluster_id)) {
if(cluster_id[i] == cluster_id[j]) {
cluster_id[i + 1] <- cluster_id[j]
}
}
}
}
cluster_id
#> [1] "1" "1" "1" "1" "1" "1" "NA" "NA" "2" "2" "2" "NA" "3" "3" "3"
#> [16] "3"