我有一个像下面这样的数据集
ID. Invoice. Date of Invoice. paid or not.
1 1 09/30/2019 no
1 2 10/30/2019 no
1 3 11/30/2019 yes
2 1 10/31/2019 yes
2 1 10/31/2019 yes
2 2 11/30/2019 no
2 3 12/31/2019 no
3 1 7/31/2019 no
3 2 9/30/2019 yes
3 3 12/31/2019 no
我想知道客户是否愿意付款。只要客户支付了新发票而未支付的旧发票,我就会给他一个很好的分数。因此对于客户1和客户3,我给的评价是“好”,客户2的评价是“差”。
因此,最终数据将再增加一列,其值为好和坏。
不清楚逻辑。可能是,我们可以按“ ID”分组后在第一行以外的任何行中检查“是”]
library(dplyr)
library(lubridate)
df1 %>%
mutate(Date_of_Invoice = mdy(Date_of_Invoice)) %>%
arrange(ID, Date_of_Invoice) %>%
group_by(ID) %>%
mutate(flag = c('bad', 'good')[1 + any(paid_or_not[-1] == "yes")])
# A tibble: 9 x 5
# Groups: ID [3]
# ID Invoice Date_of_Invoice paid_or_not flag
# <int> <int> <date> <chr> <chr>
#1 1 1 2019-09-30 no good
#2 1 2 2019-10-30 no good
#3 1 3 2019-11-30 yes good
#4 2 1 2019-10-31 yes bad
#5 2 2 2019-11-30 no bad
#6 2 3 2019-12-31 no bad
#7 3 1 2019-07-31 no good
#8 3 2 2019-09-30 yes good
#9 3 3 2019-12-31 no good
df1 <- structure(list(ID = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L), Invoice = c(1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), Date_of_Invoice = c("09/30/2019",
"10/30/2019", "11/30/2019", "10/31/2019", "11/30/2019", "12/31/2019",
"7/31/2019", "9/30/2019", "12/31/2019"), paid_or_not = c("no",
"no", "yes", "yes", "no", "no", "no", "yes", "no")), class = "data.frame", row.names = c(NA,
-9L))
假设您已经订购了Date of Invoice.
,那么这里是使用ave
的基本R解决方案>
df$`good or band.` <- ave(df$`paid or not.`,df$ID., FUN = function(v) ifelse(which(v=="yes")==1,"bad","good"))
诸如此类
> df ID. Invoice. Date of Invoice. paid or not. good or band. 1 1 1 09/30/2019 no good 2 1 2 10/30/2019 no good 3 1 3 11/30/2019 yes good 4 2 1 10/31/2019 yes bad 5 2 2 11/30/2019 no bad 6 2 3 12/31/2019 no bad 7 3 1 7/31/2019 no good 8 3 2 9/30/2019 yes good 9 3 3 12/31/2019 no good
DATA
df <- structure(list(ID. = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L), Invoice. = c(1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), `Date of Invoice.` = c("09/30/2019",
"10/30/2019", "11/30/2019", "10/31/2019", "11/30/2019", "12/31/2019",
"7/31/2019", "9/30/2019", "12/31/2019"), `paid or not.` = c("no",
"no", "yes", "yes", "no", "no", "no", "yes", "no")), class = "data.frame", row.names = c(NA,
-9L))