假设我有以下数据框:
df = tibble(
quarter = c(seq.Date(as.Date("2022-03-01"), as.Date("2023-12-01"), "quarter")),
value = c(rnorm(3), 0, rnorm(4))
)
# A tibble: 8 × 2
quarter value
<date> <dbl>
1 2022-03-01 -0.670
2 2022-06-01 -0.00760
3 2022-09-01 1.78
4 2022-12-01 0
5 2023-03-01 -1.14
6 2023-06-01 1.37
7 2023-09-01 1.33
8 2023-12-01 0.336
我想找到当值的符号发生变化时开始和结束的序列中每个正值和负值的位置;对于
0
的特殊情况,应分配“零”。所以我最终想要的输出是这样的:
# A tibble: 8 × 3
quarter value position
<date> <dbl> <chr>
1 2022-03-01 -0.670 1st negative
2 2022-06-01 -0.00760 2nd negative
3 2022-09-01 1.78 1st positive
4 2022-12-01 0 zero
5 2023-03-01 -1.14 1st negative
6 2023-06-01 1.37 1st positive
7 2023-09-01 1.33 2nd positive
8 2023-12-01 0.336 3nd positive
绞尽脑汁我想我找到了一个不太优雅的解决方案:
df %>%
mutate(position = coalesce(accumulate(ifelse(sign(value) - lag(sign(value)) != 0 | row_number() == 1, 1, NA),
~ ifelse(is.na(.y), .x + 1, .y))),
position = ifelse(sign(value) > 0, paste(position, "pos"),
ifelse(sign(value) < 0, paste(position, "neg"), 'zero')))
# A tibble: 8 × 3
quarter value position
<date> <dbl> <chr>
1 2022-03-01 -0.670 1 neg
2 2022-06-01 -0.00760 2 neg
3 2022-09-01 1.78 1 pos
4 2022-12-01 0 zero
5 2023-03-01 -1.14 1 neg
6 2023-06-01 1.37 1 pos
7 2023-09-01 1.33 2 pos
8 2023-12-01 0.336 3 pos
我的问题:有更有效的方法来解决这个问题吗?提前致谢!
这里有一个更简洁的方法:
首先,我们识别 x 中的所有更改 +/-,然后按 x 分组并分配行号,最后将所有更改粘贴到
case_when
:
library(dplyr)
df %>%
mutate(x = cumsum(c(1, diff(sign(value)) != 0))) %>%
mutate(y = row_number(),
position = case_when(
value > 0 ~ paste(y, "positive"),
value < 0 ~ paste(y, "negative"),
.default = 'zero'), .by=x) %>%
select(-c(x,y))
quarter value position
<date> <dbl> <chr>
1 2022-03-01 -0.556 1 negative
2 2022-06-01 1.79 1 positive
3 2022-09-01 0.498 2 positive
4 2022-12-01 0 zero
5 2023-03-01 -1.97 1 negative
6 2023-06-01 0.701 1 positive
7 2023-09-01 -0.473 1 negative
8 2023-12-01 -1.07 2 negative