这是我的数据的一个例子。列“first_tx”是我想要的输出:
ID first_date dates txtype first_tx
11 2015-12-23 2015-12-23 A A
11 2015-12-23 2016-12-23 A A
11 2015-12-23 2017-12-23 B A
22 2015-11-01 2015-11-01 B B
22 2015-11-01 2016-11-01 C B
22 2015-11-01 2016-12-01 C B
当“first_date”等于“dates”时,我试图按“txtype”的因子级别按组创建“first_tx”
我试过了
data$first_tx[which(data$first_date==data$dates)] <- as.character(data$txtype)[which(data$first_date==data$dates)]
这给了我以下输出:
ID first_date dates txtype first_tx
11 2015-12-23 2015-12-23 A A
11 2015-12-23 2016-12-23 A NA
11 2015-12-23 2017-12-23 B NA
22 2015-11-01 2015-11-01 B B
22 2015-11-01 2016-11-01 C NA
22 2015-11-01 2016-12-01 C NA
但是,我希望每个ID的所有行都具有相同的对应“txtype”级别,而不是NA。
通过使用dplyr
和tidyr
,我可以创建您的预期输出。
library(dplyr)
library(tidyr)
df %>%
mutate(first_tx = ifelse(first_date == dates, txtype, NA)) %>%
fill(first_tx)
ID first_date dates txtype first_tx
1 11 2015-12-23 2015-12-23 A A
2 11 2015-12-23 2016-12-23 A A
3 11 2015-12-23 2017-12-23 B A
4 22 2015-11-01 2015-11-01 B B
5 22 2015-11-01 2016-11-01 C B
6 22 2015-11-01 2016-12-01 C B
数据:
df <- structure(list(ID = c(11L, 11L, 11L, 22L, 22L, 22L),
first_date = c("2015-12-23", "2015-12-23", "2015-12-23", "2015-11-01", "2015-11-01", "2015-11-01"),
dates = c("2015-12-23", "2016-12-23", "2017-12-23", "2015-11-01", "2016-11-01", "2016-12-01"),
txtype = c("A", "A", "B", "B", "C", "C")),
.Names = c("ID", "first_date", "dates", "txtype"),
row.names = c(NA, -6L),
class = "data.frame")
你在尝试这样的事吗?
library(data.table)
data <- data.table(
ID = c('11', '11', '11', '22', '22', '22'),
first_date = c('2015-12-23', '2015-12-23', '2015-12-23', '2015-11-01', '2015-11-01', '2015-11-01'),
dates = c('2015-12-23', '2016-12-23', '2017-12-23', '2015-11-01', '2016-11-01', '2016-12-01'),
txtype = c('A', 'A', 'B', 'B', 'C', 'C')
)
data[first_date == dates,
':='(first_tx = txtype),
by = .(txtype)]
我玩了它,这工作:
data <- data %>% group_by(ID) %>% mutate(first_tx = {if (first_date == dates) txtype[min(which(first_date == dates))] else NA})
谢谢@phiver的帮助!