我正在处理演讲稿:
Utterance Starttime_ms Endtime_ms
<chr> <dbl> <dbl>
1 on this 210 780
2 okay 3403 3728
3 cool thanks everyone um 4221 5880
4 so yes in terms of our projects 5910 11960
5 let's have a look so the 11980 13740
6 LGBTQ plus 13813 16110
并且希望在每个
Utterance
之后插入一个新行,指示与前一个 Utterance
相比的时间间隙。 所需的输出看起来有点像这样:
Utterance Starttime_ms Endtime_ms
<chr> <dbl> <dbl>
1 on this 210 780
NA 780 3403
2 okay 3403 3728
NA 3728 4221
3 cool thanks everyone um 4221 5880
NA 5880 5910
4 so yes in terms of our projects 5910 11960
NA 11960 11980
5 let's have a look so the 11980 13740
NA 13740 13813
6 LGBTQ plus 13813 16110
我知道怎么做
data.table
:
library(data.table)
unq <- c(0, sort(unique(setDT(df)[, c(Starttime_ms, Endtime_ms)])))
df <- df[.(unq[-length(unq)], unq[-1]), on=c("Starttime_ms", "Endtime_ms")]
但我正在寻找
dplyr
解决方案。
数据:
df <- structure(list(Utterance = c("on this", "okay", "cool thanks everyone um",
"so yes in terms of our projects",
"let's have a look so the", "LGBTQ plus"), Starttime_ms = c(210,
3403, 4221, 5910, 11980, 13813), Endtime_ms = c(780, 3728, 5880,
11960, 13740, 16110)), row.names = c(NA, -6L), class = c("tbl_df",
"tbl", "data.frame"))
library(dplyr)
df |>
mutate(Utterance = NA,
local({
data.frame(Starttime_ms = lag(Endtime_ms), Endtime_ms = Starttime_ms)
})) |>
filter(!is.na(Starttime_ms)) |>
bind_rows(df) |>
arrange(Starttime_ms)
输出
Utterance Starttime_ms Endtime_ms
<chr> <dbl> <dbl>
1 on this 210 780
2 NA 780 3403
3 okay 3403 3728
4 NA 3728 4221
5 cool thanks everyone um 4221 5880
6 NA 5880 5910
7 so yes in terms of our projects 5910 11960
8 NA 11960 11980
9 let's have a look so the 11980 13740
10 NA 13740 13813
11 LGBTQ plus 13813 16110