如何根据另一个数据框中的值定义数据框中的计算？

Question

我必须根据存储在大小为4936 obs的数据框（A）中的数据集来计算系数。 x 1025 var。

在第一行[1]中，呈现以秒为单位的时间，每行是从不同地方收集的样本。数据框A的样本：

#        V1   V2   V3   V4
# [1,] 26.4 26.5 26.6 26.7
# [2,]  -15   -5    2    3
# [3,]    6   -7    5    8
# [4,]    9    4    4   -2

在另一个数据框（B）中，我存储了我应该开始计算A中每行的时间。数据框B的一个例子：

#      time
# [1,] 26.4
# [2,] 26.6
# [3,] 26.5

让我们简化系数是在一个地方（数据框A）收集的数据的总和，这取决于它们收集的时间（数据框B）。对于上面的示例，计算应该像这样工作：

sum1=-15+(-5)+2+3
sum2=5+8
sum3=4+4+(-2)

我希望将计算结果存储在一个新的数据框中，如下所示：

#       Sum
# [1,]  -15
# [2,]   13
# [3,]    6

如何根据存储在第二个数据帧中的值链接两个数据帧之间的计算？

Answer 1

使用sapply根据收集时间迭代和选择列的解决方案：

# Time from original table
foo <- df1[1, ]
# Time from table B
time <- c(26.4, 26.6, 26.5)

# Remove time row from original table
df1 <- df1[-1, ]

# Iterate over and select columns with foo >= time
sapply(1:length(time), function(x)
    sum(df1[x, which(foo >= time[x])])
)

# [1] -15  13   6

Answer 2

我遇到了这个已经回答的问题，并且感到有兴趣提出另一种解决方案。

立即阅读标题让我想到了加入或合并。
OP声称使用数据帧，但打印输出似乎来自矩阵。
数据存储转置：时间序列水平存储，第一行不包含观察值，但以秒为单位。这被认为是不整洁的。

尽管他们使提出的解决方案更加复杂，但其他任何答案都不会质疑这些奇怪的问题。

Reshaping the data

粗略猜测，数据似乎是在Excel工作表中收集的。但是，为了进行有效的处理，我们需要将数据存储为column-wise，最好是长格式：

library(data.table)
long <- as.data.table(t(A))[
  , setnames(.SD, "V1", "time")][
    , melt(.SD, id.vars = "time", variable.name = "site_id")][
      , site_id := as.integer(site_id)][]

long

    time site_id value
 1: 26.4       1   -15
 2: 26.5       1    -5
 3: 26.6       1     2
 4: 26.7       1     3
 5: 26.4       2     6
 6: 26.5       2    -7
 7: 26.6       2     5
 8: 26.7       2     8
 9: 26.4       3     9
10: 26.5       3     4
11: 26.6       3     4
12: 26.7       3    -2

Aggregating in a non-equi join

现在，OP要求汇总每个站点的观测结果，但只需要包括特定time以上的观测值。提供具有每个站点的开始时间的数据帧B。

long中的观测值可以与B的起始时间相结合，如下所示：

B <- data.table(
  site_id = 1:3,
  time = c(26.4, 26.6, 26.5))

B

   site_id time
1:       1 26.4
2:       2 26.6
3:       3 26.5

# aggregating in a non-equi join grouped by the join conditions
long[B, on = .(site_id, time >= time), by = .EACHI, sum(value)]

   site_id time  V1
1:       1 26.4 -15
2:       2 26.6  13
3:       3 26.5   6

Edit: Limit the number of observations in the aggregation

OP有asked in a comment和in another question如何限制在开始时间之后聚合的观察数量。这可以通过稍作修改来实现：

max_values <- 2L
long[B, on = .(site_id, time >= time), by = .EACHI, sum(value[1:max_values])]

   site_id time  V1
1:       1 26.4 -20
2:       2 26.6  13
3:       3 26.5   8

请注意，max_values在此处设置为2L以供说明。

Answer 3

使用简单的for循环解决方案：

# recreate your data
V1 <- c(26.4, -15, 6, 9)
V2 <- c(26.5, -5, -7, 4)
V3 <- c(26.6, 2, 5, 4)
V4 <- c(26.7, 3, 8, -2)

A <- data.frame(V1, V2, V3, V4)
B <- data.frame(time = c(26.4, 26.6, 26.5))

#initialize empty variable to store sums in
sum_frame <- numeric()

# calculating sums
for (i in 1:NROW(B)) {
  sum_frame[i] <- sum(A[(i + 1), (which(A[1, ] == B$time[i])):NCOL(A)])
}

# turning sum-vector into a dataframe
sum_frame <- data.frame(sums = sum_frame)

输出：

> sum_frame
  sum_frame
1       -15
2        13
3         6

如何根据另一个数据框中的值定义数据框中的计算？

问题描述投票：3回答：3

3个回答

Reshaping the data

Aggregating in a non-equi join

Edit: Limit the number of observations in the aggregation

最新问题

如何根据另一个数据框中的值定义数据框中的计算？

问题描述 投票：3回答：3

3个回答

Reshaping the data

Aggregating in a non-equi join

Edit: Limit the number of observations in the aggregation

最新问题

问题描述投票：3回答：3