以下是两个表格
Table1
Date OldPrice NewPrice
2014-06-12 09:32:56 0 10
2014-06-27 16:13:36 10 12
2014-08-12 22:41:47 12 13
Table2
Date Qty
2014-06-15 18:09:23 5
2014-06-19 12:04:29 4
2014-06-22 13:21:34 3
2014-06-29 19:01:22 6
2014-07-01 18:02:33 3
2014-09-29 22:41:47 6
我想以这种方式显示结果
Date OldPrice NewPrice Qty
2014-06-12 09:32:56 0 10 0
2014-06-27 16:13:36 10 12 12
2014-08-12 22:41:47 12 13 15
我用了这个命令
for(i in 1:nrow(Table1)){
startDate = Table1$Date[i]
endDate = Table1$Date[i+1]
code=aggregate(list(Table2$Qty),
by=list(Table1$Date, Table1$OldPrice, Table1$NewPrice, Date = Table2$Date > startDate & Table2$Date <= endDate), FUN=sum)
}
我希望数量在第一个表中的给定日期之间聚合,即在第一个和第二个日期之间,第二个和第三个日期之间等等。
提前致谢!
我们可以和data.table
一起加入
library(data.table)
res <- setDT(df1)[df2, roll = -Inf, on = .(Date)][, .(Qty = sum(Qty)),
.(OldPrice, NewPrice)][df1, on = .(OldPrice, NewPrice)][is.na(Qty), Qty := 0]
setcolorder(res, c(names(df1), "Qty"))
res
# Date OldPrice NewPrice Qty
#1: 2014-06-12 09:32:56 0 10 0
#2: 2014-06-27 16:13:36 10 12 12
#3: 2014-08-12 22:41:47 12 13 9
与dplyr
和tidyr
有点冗长的想法:
library(dplyr)
library(tidyr)
full_join(Table1, Table2, by = "Date") %>%
arrange(Date) %>%
fill(OldPrice, NewPrice, .direction = "up") %>%
group_by(OldPrice, NewPrice) %>%
summarize(Qty = sum(Qty, na.rm = TRUE)) %>%
ungroup() %>%
select(Qty) %>%
bind_cols(Table1, .)
# Date OldPrice NewPrice Qty
# 1 2014-06-12 09:32:56 0 10 0
# 2 2014-06-27 16:13:36 10 12 12
# 3 2014-08-12 22:41:47 12 13 9
你开始使用for循环因此你可以执行以下for循环方式:
df1 <- read.table(text=
"'Date' 'OldPrice' 'NewPrice'
'2014-06-12 09:32:56' '0' '10'
'2014-06-27 16:13:36' '10' '12'
'2014-08-12 22:41:47' '12' '13'", stringsAsFactors=F,header=T)
df2 <- read.table(text=
"'Date' 'Qty'
'2014-06-15 18:09:23' '5'
'2014-06-19 12:04:29' '4'
'2014-06-22 13:21:34' '3'
'2014-06-29 19:01:22' '6'
'2014-07-01 18:02:33' '3'" , stringsAsFactors=F, header=T)
df1 <- df1[with(df1, order(Date)),] #order df1 by Date
df1$Date <- as.POSIXct(df1$Date); df2$Date <- as.POSIXct(df2$Date) #convert into datetime formats
values <- vector("list", length = nrow(df1)) #declare a list of specific length of df1
for(i in 1:nrow(df1)){
for(j in 1:nrow(df2)){
if(df2$Date[j]>df1$Date[i] & df2$Date[j]<df1$Date[i+1]){
values[[i]] <- append(values[[i]], df2$Qty[j])
}
}
}
df1$Quantity <- c(0, sapply(values, sum)[1:(nrow(df1)-1)]) #replace the leading quantity value with 0 (as per your example)
# Date OldPrice NewPrice Quantity
#1 2014-06-12 09:32:56 0 10 0
#2 2014-06-27 16:13:36 10 12 12
#3 2014-08-12 22:41:47 12 13 9
显然,更多的工作,但如果你被困在for循环中它可能会有所帮助。