R, stringr - 替换数据框中行的多个字符。

问题描述 投票:0回答:1

我在存储数据框架的 "地址 "列中存储了地址,我想在现有的地址上创建一个新的列,并进行以下修正。

{"ST": "STREET",
  "RD": "ROAD",
  "AVE": "AVENUE",
  "N": "NORTH",
  "W": "WEST",
  "S": "SOUTH",
  "E": "EAST",
  "STE": "SUITE",
  "HWY": "HIGHWAY",
  "DR": "DRIVE",
  "NW": "NORTH WEST",
  "NE": "NORTH EAST",
  "SW": "SOUTH WEST",
  "SE": "SOUTH EAST",
  "LN": "LANE",
  "WAY": "WAY"}

我应该如何推进这个工作?

预期的输出。

101 ST LN ->101 STREET LANE。

r dataframe str-replace stringr
1个回答
1
投票

解决这个问题的一个方法是使用 stri_replace_all_regexstringi. 它接受矢量化模式和替换。

我们可以使用 \b 字界的通配符,它本身需要转义为 \\b. 当缩写以""结尾时,要处理好以下情况 .,我们可以匹配一个字面的 .\b(\\.|\\b).

我从你的数据中做出模式和替换向量,在答案的最后。

library(stringi)
stri_replace_all_regex("101 ST. LN",pattern = terms[[1]], replacement = terms[[2]],vectorize_all = FALSE)
[1] "101 STREET LANE"

同样的工作,对于字符串的向量也要进行替换。

data <- data.frame(address = c("1 N ST", "2 E AVE", "3 S RD", "4 SE LN"))
stri_replace_all_regex(data$address,pattern = terms[[1]], replacement = terms[[2]],vectorize_all = FALSE)
#[1] "1 NORTH STREET"    "2 EAST AVENUE"     "3 SOUTH ROAD"      "4 SOUTH EAST LANE"

数据

terms <- c("ST", "STREET", "RD", "ROAD", "AVE", "AVENUE", "N", "NORTH", 
"W", "WEST", "S", "SOUTH", "E", "EAST", "STE", "SUITE", "HWY", 
"HIGHWAY", "DR", "DRIVE", "NW", "NORTH WEST", "NE", "NORTH EAST", 
"SW", "SOUTH WEST", "SE", "SOUTH EAST", "LN", "LANE", "WAY", 
"WAY")
terms <- split(terms,rep(1:2,times = length(terms) / 2))
terms[[1]] <- paste0("\\b",terms[[1]],"(\\.|\\b)")
terms[[1]]
# [1] "\\bST(\\.|\\b)"  "\\bRD(\\.|\\b)"  "\\bAVE(\\.|\\b)" "\\bN(\\.|\\b)"   "\\bW(\\.|\\b)"   "\\bS(\\.|\\b)"   "\\bE(\\.|\\b)"  
# [8] "\\bSTE(\\.|\\b)" "\\bHWY(\\.|\\b)" "\\bDR(\\.|\\b)"  "\\bNW(\\.|\\b)"  "\\bNE(\\.|\\b)"  "\\bSW(\\.|\\b)"  "\\bSE(\\.|\\b)" 
#[15] "\\bLN(\\.|\\b)"  "\\bWAY(\\.|\\b)"
terms[[2]]
# [1] "STREET"     "ROAD"       "AVENUE"     "NORTH"      "WEST"       "SOUTH"      "EAST"       "SUITE"      "HIGHWAY"    "DRIVE"     
#[11] "NORTH WEST" "NORTH EAST" "SOUTH WEST" "SOUTH EAST" "LANE"       "WAY"  

1
投票

这应该是可行的,与 str_replace_all从包 stringr:

df <- data.frame(address = c("12 ST W", "333 AVE", "45 RD", "666 STE E"))

str_replace_all(df$address,c("\\bST\\b" = "STREET",
                             "\\bRD\\b" = "ROAD",
                             "\\bAVE\\b" = "AVENUE",
                             "\\bN\\b" = "NORTH",
                             "\\bW\\b" = "WEST",
                             "\\bE\\b" = "EAST",
                             "\\bSTE\\b" = "SUITE"))
[1] "12 STREET WEST" "333 AVENUE"     "45 ROAD"        "666 SUITE EAST"
© www.soinside.com 2019 - 2024. All rights reserved.