我是 R 的初学者,但我一直在分析大型 GPS 数据数据集,该数据集由独特的个人(名称)(大约 100 个独特的名称)和 1,000,000 多行数据组成。每个唯一的名称都有不同数量的坐标(纬度和经度)。每个唯一名称属于 a 组或 b 组。到目前为止,我已经完成了多边形点计数分析,以分析 a 组和 b 组之间的站点使用情况。我想在每个组 a 和组 b 内进行层次聚类分析,以分析每个组内的相互作用,然后分析组 a 和 b 之间的相互作用。
有人建议我做一个 for 循环来获取每个唯一“名称”的坐标平均值,然后我想我可以使用这些数据进行分层聚类分析(使用 R 或 QGIS?)。我的数据如下。
structure(list(lat = c(50.39761959, 50.39757382, 50.39760433,
50.39742123, 50.39768063, 50.39740597, 50.39757382, 50.39769589,
50.39763485, 50.39763485), lng = c(-4.888685435, -4.888639658,
-4.888685435, -4.888746471, -4.88860914, -4.888883803, -4.888670176,
-4.88860914, -4.888563363, -4.888181888), time_stamp = c("15/10/2021 00:21",
"15/10/2021 00:50", "15/10/2021 01:51", "15/10/2021 02:21", "15/10/2021 02:51",
"15/10/2021 03:21", "15/10/2021 03:51", "15/10/2021 04:21", "15/10/2021 04:51",
"15/10/2021 05:21"), name = c("300005", "300005", "300005", "300005",
"300005", "300005", "300005", "B100", "B100", "B100"),
breed = c("a", "a", "a", "a", "a", "a", "a", "b", "b", "b"
)), row.names = c(NA, -10L), class = c("data.table", "data.frame"
))
我特别努力使用 for 循环来获取平均坐标。
您不需要循环进行夸张。从使用的标签来看,不太清楚您是否更喜欢基于
data.table
的解决方案(可能对 1,000,000 多条记录有意义),但使用 dplyr
您可以按名称分组并用类似这样的内容进行总结:
df <- structure(list(lat = c(50.39761959, 50.39757382, 50.39760433,
50.39742123, 50.39768063, 50.39740597, 50.39757382, 50.39769589,
50.39763485, 50.39763485), lng = c(-4.888685435, -4.888639658,
-4.888685435, -4.888746471, -4.88860914, -4.888883803, -4.888670176,
-4.88860914, -4.888563363, -4.888181888), time_stamp = c("15/10/2021 00:21",
"15/10/2021 00:50", "15/10/2021 01:51", "15/10/2021 02:21", "15/10/2021 02:51",
"15/10/2021 03:21", "15/10/2021 03:51", "15/10/2021 04:21", "15/10/2021 04:51",
"15/10/2021 05:21"), name = c("300005", "300005", "300005", "300005",
"300005", "300005", "300005", "B100", "B100", "B100"),
breed = c("a", "a", "a", "a", "a", "a", "a", "b", "b", "b"
)), row.names = c(NA, -10L), class = c("data.table", "data.frame"
))
df
#> lat lng time_stamp name breed
#> 1 50.39762 -4.888685 15/10/2021 00:21 300005 a
#> 2 50.39757 -4.888640 15/10/2021 00:50 300005 a
#> 3 50.39760 -4.888685 15/10/2021 01:51 300005 a
#> 4 50.39742 -4.888746 15/10/2021 02:21 300005 a
#> 5 50.39768 -4.888609 15/10/2021 02:51 300005 a
#> 6 50.39741 -4.888884 15/10/2021 03:21 300005 a
#> 7 50.39757 -4.888670 15/10/2021 03:51 300005 a
#> 8 50.39770 -4.888609 15/10/2021 04:21 B100 b
#> 9 50.39763 -4.888563 15/10/2021 04:51 B100 b
#> 10 50.39763 -4.888182 15/10/2021 05:21 B100 b
dplyr::summarise(df, lat = mean(lat), lng = mean(lng), .by = name)
#> name lat lng
#> 1 300005 50.39755 -4.888703
#> 2 B100 50.39766 -4.888451
创建于 2024-04-29,使用 reprex v2.1.0