我有大的数据框架,包括位置,时间戳,旅行ID等。
我想以一种简单的方式,避免双循环,过滤掉并仅保存一些行。
因此,对于具有相同trip_id和stop_id组合的所有行,我想保存速度首先等于零的行。要么是速度为零的最小时间戳,要么是速度为零时的简单时间戳,因为帧是按时间戳排序的。
所以在下面的例子中,我想找到三个顶行(在实际数据帧中有更多行)并且只保存速度优先为零的第二行。
有没有办法在没有任何循环的情况下做到这一点?
trip_id.x stop_id latitude.x longitude.x bearing speed timestamp vehicle id
55700000048910944 9022005000050006 58.416879999999999 15.624510000000001 30 0.2 1541399400 9031005990005424
55700000048910944 9022005000050006 58.416879999999999 15.624510000000001 0 0 1541399401 9031005990005424
55700000048910944 9022005000050006 58.416879999999999 15.624510000000001 0 0 1541399402 9031005990005424
55700000048910300 9022005000050006 58.416879999999999 15.624510000000001 30 0.5 1541400000 9031005990005424
编辑:这是一个更长的例子的dput(),它有一个更简单的数据格式:
structure(list(trip_id = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3,
3, 3, 3), stop_id = c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 1,
1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3,
3, 3), speed = c(5, 0, 0, 5, 2, 0, 0, 2, 4, 0, 0, 4, 5, 0, 0,
5, 2, 0, 0, 2, 4, 0, 0, 4, 5, 0, 0, 5, 2, 0, 0, 2, 4, 0, 0, 4
), timestamp = c(1, 2, 3, 4, 101, 102, 103, 104, 201, 202, 203,
204, 301, 302, 303, 304, 401, 402, 403, 404, 501, 502, 503, 504,
601, 602, 603, 604, 701, 702, 703, 704, 801, 802, 803, 804)), row.names = c(NA,
-36L), class = c("tbl_df", "tbl", "data.frame"))
想要的输出:
structure(list(trip_id = c(1, 1, 2, 2, 2, 3, 3, 3), stop_id = c(1,
3, 1, 2, 3, 1, 2, 3), speed = c(0, 0, 0, 0, 0, 0, 0, 0), timestamp = c(2,
202, 302, 402, 502, 602, 702, 802)), row.names = c(NA, -8L), class = c("tbl_df",
"tbl", "data.frame"))
编辑:尝试更改为代码以在其中包含条件。尝试使用case_when并且如果但是无法使其工作:
df_arrival_z <- df %>%
group_by(trip_id, stop_id) %>%
filter(speed == 0)
# Check if there is any rows where speed is zero
if (nrow(filter(speed == 0)) > 0){
# Take the first row if there is rows with zero
filter(speed == 0) %>% slice(1)
}
if (nrow(filter(speed == 0)) == 0){
# Take the middle point if there is no rows with speed = 0
slice(nrow%/%2)
}
没有所需的输出我不能确定你的期望,但试试这个让我知道:
library(dplyr)
df %>%
group_by(trip_id, stop_id) %>%
filter(speed == 0) %>%
slice(1)