从包含坐标/邮政编码和 ID 的 data.frame 中绘制等值线图

问题描述 投票:0回答:1

我正在分析一些北美城市的房地产销售情况,并对数据使用 k 均值聚类。我有七个聚类,对于聚类中的每个观测值,我都有

latitude
longitude
zipcode
cluster_id
。我想将其绘制在地图上,以更好地可视化集群 - 我不确定这样的图叫什么 - Choropleth?多边形?

大多数示例都使用 geoJSON 文件,但我只有一个来自 k-means 聚类的

data.frame
对象。

实际数据:

https://www.kaggle.com/threnjen/portland-housing-prices-sales-jul-2020-jul-2021

样本数据:

> dput(dt[runif(n = 10,min = 1,max = 25000)])
structure(list(id = c(23126L, 15434L, 5035L, 19573L, NA, 24486L, 
NA, 14507L, 3533L, 20192L), zipcode = c(97224L, 97211L, 97221L, 
97027L, NA, 97078L, NA, 97215L, 97124L, 97045L), latitude = c(45.40525436, 
45.55965805, 45.4983139, 45.39398956, NA, 45.47454071, NA, 45.50736618, 
45.52812958, 45.34381485), longitude = c(-122.7599182, -122.6500015, 
-122.7288742, -122.591217, NA, -122.8898392, NA, -122.6084061, 
-122.91745, -122.5948334), lastSoldPrice = c(469900L, 599000L, 
2280000L, 555000L, NA, 370000L, NA, 605000L, 474900L, 300000L
), lotSize = c(5227L, 4791L, 64904L, 9147L, NA, 2178L, NA, 4356L, 
2613L, 6969L), livingArea = c(1832L, 2935L, 5785L, 2812L, NA, 
1667L, NA, 2862L, 1844L, 742L), cluster_id = c(7, 7, 2, 7, NA, 
4, NA, 7, 7, 4)), row.names = c(NA, -10L), class = c("data.table", 
"data.frame"), .internal.selfref = <pointer: 0x7faa8000fee0>)

我按照 https://gist.github.com/josecarlosgonz/8565908 上的示例尝试创建一个 geoJSON 文件以便能够绘制此数据,但没有成功。

我没有使用标记,因为我有大约 25,000 个观察结果 - 很难将它们全部绘制出来,而且文件将需要永远加载。

编辑:

按邮政编码观察:

> dput(dat[, .N, by = .(`address/zipcode`)][(order(`address/zipcode`))])
structure(list(`address/zipcode` = c(7123L, 97003L, 97004L, 97005L, 
97006L, 97007L, 97008L, 97009L, 97015L, 97019L, 97023L, 97024L, 
97027L, 97030L, 97034L, 97035L, 97038L, 97045L, 97056L, 97060L, 
97062L, 97068L, 97070L, 97078L, 97080L, 97086L, 97089L, 97113L, 
97123L, 97124L, 97132L, 97140L, 97201L, 97202L, 97203L, 97204L, 
97205L, 97206L, 97209L, 97210L, 97211L, 97212L, 97213L, 97214L, 
97215L, 97216L, 97217L, 97218L, 97219L, 97220L, 97221L, 97222L, 
97223L, 97224L, 97225L, 97227L, 97229L, 97230L, 97231L, 97232L, 
97233L, 97236L, 97239L, 97266L, 97267L), N = c(1L, 352L, 9L, 
252L, 421L, 1077L, 357L, 1L, 31L, 2L, 4L, 159L, 239L, 525L, 640L, 
548L, 1L, 1064L, 5L, 353L, 471L, 736L, 6L, 403L, 866L, 913L, 
8L, 5L, 1113L, 776L, 3L, 543L, 219L, 684L, 463L, 1L, 57L, 809L, 
189L, 216L, 688L, 510L, 504L, 330L, 318L, 177L, 734L, 195L, 832L, 
305L, 276L, 589L, 688L, 716L, 286L, 83L, 1307L, 475L, 77L, 150L, 
382L, 444L, 290L, 423L, 430L)), row.names = c(NA, -65L), class = c("data.table", 
"data.frame"), .internal.selfref = <pointer: 0x7f904781a6e0>)
r plotly geojson r-leaflet choropleth
1个回答
0
投票

我在一台简单的笔记本电脑(i3 第 8 代)上使用 Kaggle 数据生成一个

ggplot2
对象,其中集群 ID 随机采样并通过
ggplotly()
函数对其进行转换...生成的
plotly
对象似乎可以正常工作用于分析,但我不知道你的性能要求:

library(dplyr)
library(ggplot2)
library(plotly)
library(rnaturalearth) # here we get the basic map data from

# read in data from zip, select minimal number of columns and sample cluster_id
df <- readr::read_csv(unzip("path_to_zip/portland_housing.csv.zip"))%>% 
    dplyr::select(az = `address/zipcode`, latitude, longitude) %>%              
    dplyr::mutate(cluster_id = sample(1:7, n(), replace = TRUE))
# get the map data
world <- rnaturalearth::ne_countries(scale = "medium", returnclass = "sf")
# build the ggplot2 object (note that I use rings as shapes and alpha parameter to reduce the over plotting
plt <- ggplot2::ggplot(data = world) +
    ggplot2::geom_sf() +
    ggplot2::geom_point(data = df, aes(x = longitude, y = latitude, color = factor(cluster_id)), size = 1, shape = 21, alpha = .7) + 
    ggplot2::coord_sf(xlim = c(-124.5, -122), ylim = c(45, 46), expand = FALSE)
# plot it:
plt

enter image description here

# plotly auto transform from ggplot2 object
plotly::ggplotly(plt)

enter image description here

编辑

要包含地图,您可以使用例如

ggmap
包,而不是来自
rnaturalearth
的地图数据...我只会显示
plotly
结果:

library(ggmap)

# https://stackoverflow.com/questions/23130604/plot-coordinates-on-map
sbbox <- ggmap::make_bbox(lon = c(-124.5, -122), lat = c(45, 46), f = .1)
myarea <- ggmap::get_map(location=sbbox, zoom=10, maptype="terrain")
myarea <- ggmap::ggmap(myarea)

plt2 <- myarea +
    ggplot2::geom_point(data = df, mapping = aes(x = longitude, y = latitude, color = factor(cluster_id)), shape = 21, alpha = .7) 

plotly::ggplotly(plt2)

enter image description here

还有许多其他有关地图数据的方法,例如使用 mapbox-api

© www.soinside.com 2019 - 2024. All rights reserved.