R 中的滞后 cumsum

问题描述 投票:0回答:1

我的数据如下所示:

df <- tibble(
      date= seq.Date(as.Date("2021-01-01"), as.Date("2022-02-01"), by = "month"),
      val1 = c(100, 100, 105, 125, 125, 125, 125, 132, 132, 132, 135, 150, 150, 150),
      val2 = c(100, 100, 100, 125, 125, 125, 125, 125, 125, 125, 125, 150, 150, 150),
      diff = val1-val2)

       date        val1  val2  diff
       <date>     <dbl> <dbl> <dbl>
     1 2021-01-01   100   100     0
     2 2021-02-01   100   100     0
     3 2021-03-01   105   100     5
     4 2021-04-01   125   125     0
     5 2021-05-01   125   125     0
     6 2021-06-01   125   125     0
     7 2021-07-01   125   125     0
     8 2021-08-01   132   125     7
     9 2021-09-01   132   125     7
    10 2021-10-01   132   125     7
    11 2021-11-01   135   125    10
    12 2021-12-01   150   150     0
    13 2022-01-01   150   150     0
    14 2022-02-01   150   150     0

我正在尝试产生以下输出:

output <- tibble(
  date= seq.Date(as.Date("2021-01-01"), as.Date("2022-02-01"), by = "month"),
  val1 = c(100, 100, 105, 125, 125, 125, 125, 132, 132, 132, 135, 150, 150, 150),
  val2 = c(100, 100, 100, 125, 125, 125, 125, 125, 125, 125, 125, 150, 150, 150),
  diff = val1-val2,
  diff_calc = c(0, 0, 0, 5, 5, 5, 5, 5, 5, 5, 5, 15, 15, 15))

 date        val1  val2  diff diff_calc
 <date>     <dbl> <dbl> <dbl>     <dbl>
 1 2021-01-01   100   100     0         0
 2 2021-02-01   100   100     0         0
 3 2021-03-01   105   100     5         0
 4 2021-04-01   125   125     0         5
 5 2021-05-01   125   125     0         5
 6 2021-06-01   125   125     0         5
 7 2021-07-01   125   125     0         5
 8 2021-08-01   132   125     7         5
 9 2021-09-01   132   125     7         5
10 2021-10-01   132   125     7         5
11 2021-11-01   135   125    10         5
12 2021-12-01   150   150     0        15
13 2022-01-01   150   150     0        15
14 2022-02-01   150   150     0        15

其中

diff_calc
diff
中先前唯一值的累积和,除非连续出现多个唯一
diff
值,否则它应该是最大值加上任何先前
diff
值的累积和,使用同样的逻辑。

这是我之前问过的这个问题的派生,但我意识到我没有提供最好的示例或描述来满足我在这里需要的内容,因此作为一个新问题发布。谢谢!

r date lag cumsum
1个回答
0
投票

试试这个:

library(dplyr)

output |>
  group_by(val2) |>
  mutate(tmp = max(diff),
         tmp = replace(tmp, 1:n()-1, 0)) |>
  ungroup() |>
  mutate(diff_calc = lag(cumsum(tmp), default = 0)) |>
  select(-tmp)
 
# # A tibble: 14 × 5
#    date        val1  val2  diff diff_calc
#    <date>     <dbl> <dbl> <dbl>     <dbl>
#  1 2021-01-01   100   100     0         0
#  2 2021-02-01   100   100     0         0
#  3 2021-03-01   105   100     5         0
#  4 2021-04-01   125   125     0         5
#  5 2021-05-01   125   125     0         5
#  6 2021-06-01   125   125     0         5
#  7 2021-07-01   125   125     0         5
#  8 2021-08-01   132   125     7         5
#  9 2021-09-01   132   125     7         5
# 10 2021-10-01   132   125     7         5
# 11 2021-11-01   135   125    10         5
# 12 2021-12-01   150   150     0        15
# 13 2022-01-01   150   150     0        15
# 14 2022-02-01   150   150     0        15
© www.soinside.com 2019 - 2024. All rights reserved.