我在存储数据框架的 "地址 "列中存储了地址,我想在现有的地址上创建一个新的列,并进行以下修正。
{"ST": "STREET",
"RD": "ROAD",
"AVE": "AVENUE",
"N": "NORTH",
"W": "WEST",
"S": "SOUTH",
"E": "EAST",
"STE": "SUITE",
"HWY": "HIGHWAY",
"DR": "DRIVE",
"NW": "NORTH WEST",
"NE": "NORTH EAST",
"SW": "SOUTH WEST",
"SE": "SOUTH EAST",
"LN": "LANE",
"WAY": "WAY"}
我应该如何推进这个工作?
预期的输出。
101 ST LN ->101 STREET LANE。
解决这个问题的一个方法是使用 stri_replace_all_regex
从 stringi
. 它接受矢量化模式和替换。
我们可以使用 \b
字界的通配符,它本身需要转义为 \\b
. 当缩写以""结尾时,要处理好以下情况 .
,我们可以匹配一个字面的 .
或 \b
与 (\\.|\\b)
.
我从你的数据中做出模式和替换向量,在答案的最后。
library(stringi)
stri_replace_all_regex("101 ST. LN",pattern = terms[[1]], replacement = terms[[2]],vectorize_all = FALSE)
[1] "101 STREET LANE"
同样的工作,对于字符串的向量也要进行替换。
data <- data.frame(address = c("1 N ST", "2 E AVE", "3 S RD", "4 SE LN"))
stri_replace_all_regex(data$address,pattern = terms[[1]], replacement = terms[[2]],vectorize_all = FALSE)
#[1] "1 NORTH STREET" "2 EAST AVENUE" "3 SOUTH ROAD" "4 SOUTH EAST LANE"
数据
terms <- c("ST", "STREET", "RD", "ROAD", "AVE", "AVENUE", "N", "NORTH",
"W", "WEST", "S", "SOUTH", "E", "EAST", "STE", "SUITE", "HWY",
"HIGHWAY", "DR", "DRIVE", "NW", "NORTH WEST", "NE", "NORTH EAST",
"SW", "SOUTH WEST", "SE", "SOUTH EAST", "LN", "LANE", "WAY",
"WAY")
terms <- split(terms,rep(1:2,times = length(terms) / 2))
terms[[1]] <- paste0("\\b",terms[[1]],"(\\.|\\b)")
terms[[1]]
# [1] "\\bST(\\.|\\b)" "\\bRD(\\.|\\b)" "\\bAVE(\\.|\\b)" "\\bN(\\.|\\b)" "\\bW(\\.|\\b)" "\\bS(\\.|\\b)" "\\bE(\\.|\\b)"
# [8] "\\bSTE(\\.|\\b)" "\\bHWY(\\.|\\b)" "\\bDR(\\.|\\b)" "\\bNW(\\.|\\b)" "\\bNE(\\.|\\b)" "\\bSW(\\.|\\b)" "\\bSE(\\.|\\b)"
#[15] "\\bLN(\\.|\\b)" "\\bWAY(\\.|\\b)"
terms[[2]]
# [1] "STREET" "ROAD" "AVENUE" "NORTH" "WEST" "SOUTH" "EAST" "SUITE" "HIGHWAY" "DRIVE"
#[11] "NORTH WEST" "NORTH EAST" "SOUTH WEST" "SOUTH EAST" "LANE" "WAY"
这应该是可行的,与 str_replace_all
从包 stringr
:
df <- data.frame(address = c("12 ST W", "333 AVE", "45 RD", "666 STE E"))
str_replace_all(df$address,c("\\bST\\b" = "STREET",
"\\bRD\\b" = "ROAD",
"\\bAVE\\b" = "AVENUE",
"\\bN\\b" = "NORTH",
"\\bW\\b" = "WEST",
"\\bE\\b" = "EAST",
"\\bSTE\\b" = "SUITE"))
[1] "12 STREET WEST" "333 AVENUE" "45 ROAD" "666 SUITE EAST"