从字符串中提取“单词”

Question

我有一些格式如下的字符串：

"John Smith The Last"
"Jane Smith The Best"

从每个字符串中，我想提取“名字”（即“John Smith”和“Jane Smith”）以及“尊称”（即“The Last”、“The Best”），但找不到实现这一目标的方法。

我尝试使用

str_extract_all()

包中的

str_split()

和

stringr

函数，如下：

library(stringr)
name <- str_extract_all(my_str, boundary("word"))

这仅返回一个元素的列表

("Jane Smith The Best")

。

我也尝试过：

name <- str_split(my_str, " ", n=3)

这似乎也返回一个元素的列表

("Jane" "Smith" "The Best")

。

我正在寻找基本 R 或

stringr

解决方案。

Answer 1

提取“名称”和“敬语”的一种方法是使用

gsub()

并使用标识敬语名称的第一个单词，例如

the

、

The

、

der

作为

gsub() 中的模式

。这是一个例子：

strings  = c("John Smith The Last",
           "Jane Smith The Best",
           "Alfonso the Warrior", 
           "Ferdinand the Artist-King", 
           "Ludwig der Bayer"
)

honors = gsub(".*(the|The|der)", "\\1", strings, perl = TRUE)
names = gsub("(?= the| The| der).*", "\\1", strings, perl = TRUE)
data.frame(names, honors)
#       names          honors
#1 John Smith        The Last
#2 Jane Smith        The Best
#3    Alfonso     the Warrior
#4  Ferdinand the Artist-King
5     Ludwig       der Bayer

从字符串中提取“单词”

问题描述投票：0回答：1

1个回答

最新问题

从字符串中提取“单词”

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1