我希望重复值,直到按组显示新值。我有一个功能,我在网上发现了一段时间,几乎完成了我正在寻找的,但不完全。这是这个功能:
repeat.before <- function(x) {
ind <- which(!is.na(x))
ind_rep <- ind
if (is.na(x[1])) {
ind_rep <- c(min(ind), ind)
ind <- c(1, ind)
}
rep(x[ind_rep], times = diff(c(ind, length(x) + 1)))
}
此功能将成功重复该值,直到按组显示新值。问题是,如果列以NA开头,则在第一个值之前存在的下列行将最终获取第一个值,而不是保留NA。我将用这个例子来说明我的意思:
group location
A NA
A NA
A New York
A NA
A NA
B Chicago
B NA
B Philly
B NA
上面的代码将输出:
group location
A New York
A New York
A New York
A New York
A New York
B Chicago
B Chicago
B Philly
B Philly
这又与我正在寻找的非常接近,但并不完全。这是我正在寻求的输出:
group location
A NA
A NA
A New York
A New York
A New York
B Chicago
B Chicago
B Philly
B Philly
基本上,我不希望“重复”代码开始工作,直到找到它的第一个值。直到它这样做,我希望行保持NA。目的是使得行不会被错误分类,即在上面的示例中,前两个A行不应该被标记为纽约。
一个选项是fill
经过'group'分组后。使用fill
和.direction
指定为'up'或'down'(默认选项)。在这里,我们只需要基于预期输出的“向下”选项
library(dplyr)
library(tidyr)
df1 %>%
group_by(group) %>%
fill(location)
# A tibble: 9 x 2
# Groups: group [2]
# group location
# <chr> <chr>
#1 A <NA>
#2 A <NA>
#3 A New York
#4 A New York
#5 A New York
#6 B Chicago
#7 B Chicago
#8 B Philly
#9 B Philly
df1 <- structure(list(group = c("A", "A", "A", "A", "A", "B", "B", "B",
"B"), location = c(NA, NA, "New York", NA, NA, "Chicago", NA,
"Philly", NA)), class = "data.frame", row.names = c(NA, -9L))
您也可以使用zoo
函数使用na.locf
包。
library(zoo)
df1 <-
structure(list(
group = c("A", "A", "A", "A", "A", "B", "B", "B",
"B"),
location = c(NA, NA, "New York", NA, NA, "Chicago", NA,
"Philly", NA)
),
class = "data.frame",
row.names = c(NA,-9L))
df1$location2 <- na.locf(df1$location, na.rm = F)
df1
group location location2
1 A <NA> <NA>
2 A <NA> <NA>
3 A New York New York
4 A <NA> New York
5 A <NA> New York
6 B Chicago Chicago
7 B <NA> Chicago
8 B Philly Philly
9 B <NA> Philly
基地R.
transform(df1,
loc2 = ave(df1$location,
cumsum(!is.na(df1$location)),
FUN = function(x) x[1]))
# group location loc2
#1 A <NA> <NA>
#2 A <NA> <NA>
#3 A New York New York
#4 A <NA> New York
#5 A <NA> New York
#6 B Chicago Chicago
#7 B <NA> Chicago
#8 B Philly Philly
#9 B <NA> Philly