我的数据帧来自高频浮标,在湖中的离散深度进行测量。
library(rLakeAnalyzer)
library(tidyverse)
library(plyr)
PFL_counter = c(2,2,2,3,3,3,4,4)
depth = c(0.5,1.0,1.5,0.5,1.0,1.5,0.5,1.0)
temp_C = c(14.27,14.22,14.20,14.23,14.23,14.22,14.23,14.22)
datetime = c("5/11/23 17:01","5/11/23 17:02","5/11/23 17:04",
"5/11/23 18:01","5/11/23 18:22","5/11/23 19:14",
"5/11/23 19:14","5/11/23 19:16")
Carmi_23 = data.frame(PFL_counter, depth ,temp_C, datetime)
Carmi_23$datetime = mdy_hm(Carmi_23$datetime)
Carmi_23
实际数据有 58522 个观测值,深度从 0.5 增加到 9 MAX(例如有时只会增加到 7),增量为 0.5,每完成一次深度剖面,计数器就会增加 1(每次深度回到 0.5)。
每小时应有 1 个深度剖面。我的目标是将每个日期时间的深度剖面四舍五入到小时(您可以将其视为其 PFL_counter 组)。像这样:
PFL_counter = c(2,2,2,3,3,3,4,4)
depth = c(0.5,1.0,1.5,0.5,1.0,1.5,0.5,1.0)
temp_C = c(14.27,14.22,14.20,14.23,14.23,14.22,14.23,14.22)
datetime = c("5/11/23 17:00","5/11/23 17:00","5/11/23 17:00",
"5/11/23 18:00","5/11/23 18:00","5/11/23 18:00",
"5/11/23 19:00","5/11/23 19:00")
Carmi_23 = data.frame(PFL_counter, depth ,temp_C, datetime)
Carmi_23$datetime = mdy_hm(Carmi_23$datetime)
Carmi_23
问题是,对于某些深度剖面图,它需要一个多小时,因此最后几个深度与剖面图开始时(0.5 深度)不在同一小时内。在上面第 6 行第一个代码块中重新创建了问题。
我的第一个方法:
Carmi_23$datetime = floor_date(Carmi_23$datetime, "hour") # tried this
Carmi_23$datetime = round_date(Carmi_23$datetime, "10 mins") # also this
Carmi_23$datetime = floor_date(Carmi_23$datetime, "hour")
当我需要pivot_wider时,这会成为一个问题,因为这样我一个小时会得到半行,另一个小时会得到半行(而且我无法为rLakeAnalyzer热图提供NA)。在示例数据中:
testingpivot = Carmi_23 %>%
select(datetime=datetime, depth=depth, temp_C=temp_C) %>%
pivot_wider(id_cols=datetime,
names_from=depth,
values_from=temp_C,
values_fn=function(x) mean(x, na.rm=TRUE)
) %>%
rename_with(~ paste0("wtr_", .), -datetime)
#%>% na.omit() leave until figured out
testingpivot
我也研究过露天
library(openair)
Carmi_23$date = Carmi_23$datetime
Carmi_openair = timeAverage(Carmi_23, avg.time = "hour", type = "PFL_counter") # datetime needs to be called "date"
但是这会平均所有列值,以便时间被折叠成平均每小时一行......我的下一个尝试是尝试分组、平均和总结(通过 PFL_counter 取平均值,然后向下舍入),但关键是我需要日期时间在其各自的深度剖面内重复所有深度以及每个深度的长度PFL_counter 是不可预测的(在示例数据中,我通过为两个深度剖面设置 3 行和为一个深度剖面设置 2 行来复制此数据)。
有人有什么想法吗?我可以在旋转后忽略行,因为这使得时间被分成两半,但我内心非常顽固的一部分想弄清楚它。
R
解决方案:
Carmi_23$datetime <- ave(
Carmi_23$datetime,
Carmi_23$PFL_counter,
FUN = function (x)
min(lubridate::floor_date(x, "hour"))
)
dplyr
解决方案:
library(dplyr)
Carmi_23 |>
mutate(datetime = min(lubridate::floor_date(datetime, "hour")), .by = PFL_counter)
结果:> Carmi_23
PFL_counter depth temp_C datetime
1 2 0.5 14.27 2023-05-11 17:00:00
2 2 1.0 14.22 2023-05-11 17:00:00
3 2 1.5 14.20 2023-05-11 17:00:00
4 3 0.5 14.23 2023-05-11 18:00:00
5 3 1.0 14.23 2023-05-11 18:00:00
6 3 1.5 14.22 2023-05-11 18:00:00
7 4 0.5 14.23 2023-05-11 19:00:00
8 4 1.0 14.22 2023-05-11 19:00:00
数据:> dput(Carmi_23)
structure(list(PFL_counter = c(2, 2, 2, 3, 3, 3, 4, 4), depth = c(0.5,
1, 1.5, 0.5, 1, 1.5, 0.5, 1), temp_C = c(14.27, 14.22, 14.2,
14.23, 14.23, 14.22, 14.23, 14.22), datetime = structure(c(1683824460,
1683824520, 1683824640, 1683828060, 1683829320, 1683832440, 1683832440,
1683832560), class = c("POSIXct", "POSIXt"), tzone = "UTC")), class = "data.frame", row.names = c(NA,
-8L))
round()
。
> round(Carmi_23$datetime, 'hours')
[1] "2005-11-23 17:00:00 UTC" "2005-11-23 17:00:00 UTC" "2005-11-23 17:00:00 UTC"
[4] "2005-11-23 18:00:00 UTC" "2005-11-23 18:00:00 UTC" "2005-11-23 19:00:00 UTC"
[7] "2005-11-23 19:00:00 UTC" "2005-11-23 19:00:00 UTC"