使用gsub仅提取一定长度的大写字母

Question

我有一个字符串，我希望在其中提取国家/地区代码，该字符串始终采用包含 3 个字符的大写字母的形式。

mystring
"Bloggs, Joe GBR London (1)/Bloggs, Joe London (2)" 
"Bloggs, Joe London (1)/Bloggs, Joe  GBR London (2)"  
"Bloggs, Joe London (1)/Bloggs, Joe London (2)" 
"Bloggs, Joe GBR London (1)/Bloggs, Joe GBR London (2)" 
 "Bloggs, J-S GBR London (1)/Bloggs, J-S GBR London (2)"

我想得到什么

mystring
GBR/
/GBR
/
GBR/GBR
GBR/GBR

Blanks are fine if there is no country, I can deal with them

我尝试了一些我在这里看到的东西，其中一个尝试删除所有非大写字符，但随后留下了我不想要的其他字母，例如名称和位置中的大写字母。然后我尝试做类似的事情，尝试删除所有不以大写开头和结尾的字母（由于名称问题也没有任何乐趣）；

gsub("[^A-Z$]", "", mystring)

如果我只保留所有大写字母，其中有 3 个字母可能有效，但我无法完全正确地获得代码，我认为如果有人知道甚至知道更强大的方法，它会如下所示；

gsub("[^A-Z$]{3}", "", mystring)

Answer 1

我喜欢

stringr::str_extract

从字符串中提取模式。这让您只需输入您想要的模式，而不是尝试替换其他所有内容：

mystring = c("Bloggs, Joe GBR London (1)/Bloggs, Joe London (2)", 
"Bloggs, Joe London (1)/Bloggs, Joe  GBR London (2)"  ,
"Bloggs, Joe London (1)/Bloggs, Joe London (2)" ,
"Bloggs, Joe GBR London (1)/Bloggs, Joe GBR London (2)", 
 "Bloggs, J-S GBR London (1)/Bloggs, J-S GBR London (2)" 
)

stringr::str_extract(mystring, "[A-Z]{3}")
# [1] "GBR" "GBR" NA    "GBR" "GBR"

## or get all matches with `str_extract_all`
stringr::str_extract_all(mystring, "[A-Z]{3}")
# [[1]]
# [1] "GBR"
# 
# [[2]]
# [1] "GBR"
# 
# [[3]]
# character(0)
# 
# [[4]]
# [1] "GBR" "GBR"
# 
# [[5]]
# [1] "GBR" "GBR"

使用gsub仅提取一定长度的大写字母

问题描述投票：0回答：1

1个回答

最新问题

使用gsub仅提取一定长度的大写字母

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1