如何在没有外部文件且不使用 zipcodeR 的情况下将邮政编码前缀映射到 R 中的州

问题描述 投票:0回答:1

我正在 R 中开展一个空间分析项目,其中我需要将邮政编码(邮政编码前缀)的前三位数字映射到相应的美国州。由于限制,我无法使用需要下载数据的外部文件或包。由于我必须使用云服务,

zipcodeR
包无法加载。 enter image description here 不尝试解决这个问题,而是寻找
zipcodeR
的替代方案来做到这一点,最终我可以对计数数据进行一些空间分析。

zipcode=c("014**", "018**", "119**", "165**", "165**", "240**")

get_state_from_zip_prefix <- function(zip_code) {
  # Extract the first three digits
  zip_prefix <- substr(zip_code, 1, 3)
  # Use reverse_zipcode to get state information
  zip_info <- reverse_zipcode(paste0(zip_prefix, "00"))
  # Return the state if available
  if (!is.null(zip_info)) {
    return(zip_info$state)
  } else {
    return(NA)
  }
}

返回为 NA

r spatial zipcoder
1个回答
0
投票

我认为一个简单的查找函数就可以解决问题。我们可以使用公共邮政编码数据(例如通过此链接中找到的免费 csv 文件)来构建最小查找表:

lookup <- data.frame(
  State = c("NY", "PR", "VI", "PR", "MA", "RI", "NH", "ME", "VT", "MA", "VT", 
            "CT", "NY", "CT", "NJ", "AE", "NY", "PA", "DE", "DC", "VA", "DC",
            "MD", "DC", "VA", "DC", "MD", "VA", "WV", "NC", "SC", "GA", "FL", 
            "AA", "FL", "AL", "TN", "MS", "GA", "KY", "OH", "IN", "MI", "IA", 
            "WI", "MN", "DC", "SD", "ND", "MT", "IL", "MO", "KS", "NE", "LA", 
            "AR", "MO", "AR", "OK", "TX", "OK", "TX", "OK", "TX", "CO", "WY", 
            "ID", "WY", "ID", "UT", "AZ", "NM", "TX", "DC", "NV", "CA", "AP", 
            "HI", "AS", "HI", "GU", "PW", "FM", "MP", "MH", "OR", "WA", "AK"),
  min = c(501L, 601L, 801L, 901L, 1001L, 2801L, 3031L, 3901L, 5001L, 5501L, 
          5601L, 6001L, 6390L, 6401L, 7001L, 9001L, 10001L, 15001L, 19701L, 
          20001L, 20101L, 20201L, 20588L, 20590L, 20598L, 20599L, 20601L, 
          22003L, 24701L, 27006L, 29001L, 30002L, 32003L, 34001L, 34101L, 
          35004L, 37010L, 38601L, 39813L, 40003L, 43001L, 46001L, 48001L, 
          50001L, 53001L, 55001L, 56901L, 57001L, 58001L, 59001L, 60001L, 
          63001L, 66002L, 68001L, 70001L, 71601L, 72643L, 72644L, 73001L,
          73301L, 73401L, 73960L, 74001L, 75001L, 80001L, 82001L, 83201L, 
          83414L, 83415L, 84001L, 85001L, 87001L, 88510L, 88888L, 88901L, 
          90001L, 96201L, 96701L, 96799L, 96801L, 96910L, 96939L, 96941L, 
          96950L, 96960L, 97001L, 98001L, 99501L), 
  max = c(544L, 795L, 851L, 988L, 2791L, 2940L, 3897L, 4992L, 5495L, 5544L, 
          5907L, 6389L, 6390L, 6928L, 8989L, 9978L, 14925L, 19640L, 19980L, 
          20098L, 20199L, 20586L, 20588L, 20597L, 20598L, 20599L, 21930L, 
          24658L, 26886L, 28909L, 29945L, 31999L, 33994L, 34099L, 34997L, 
          36925L, 38589L, 39776L, 39901L, 42788L, 45999L, 47997L, 49971L, 
          52809L, 54990L, 56763L, 56999L, 57799L, 58856L, 59937L, 62999L, 
          65899L, 67954L, 69367L, 71497L, 72642L, 72643L, 72959L, 73199L, 
          73344L, 73951L, 73960L, 74966L, 79999L, 81658L, 83128L, 83406L, 
          83414L, 83877L, 84791L, 86556L, 88439L, 88595L, 88888L, 89883L, 
          96162L, 96698L, 96797L, 96799L, 96898L, 96932L, 96940L, 96944L, 
          96952L, 96970L, 97920L, 99403L, 99950L))

现在我们使用一个小包装函数来一次轻松查找多个邮政编码:

state_from_zip <- function(zips, lookup) {
    
  unlist(lapply(zips, function(zip) {
    zip <- as.numeric(zip)
    val <- which(lookup$min <= zip & lookup$max >= zip)
    if(length(val) == 0) NA else lookup$State[val]
  }))
}

测试一些著名的邮政编码,我们得到:

zips <- c(WhiteHouse = "20500", EmpireState = "10118", Hollywood = "90068")

state_from_zip(zips, lookup)
#>  WhiteHouse EmpireState   Hollywood 
#>        "DC"        "NY"        "CA" 
© www.soinside.com 2019 - 2024. All rights reserved.