RegEx 用于匹配逗号前面的单词，但有例外

Question

我定位的文本部分始终以“Also There is”开头，并以句点结尾。逗号之间的单个名称是我想要定位的目标（即下面示例中的“随机人”。这些名称总是不同的。这会变得棘手，因为存在其他不是单个单词“名称”的东西。也许仅当它是单个单词/名称时，我才能匹配逗号之间的所有内容，但我似乎无法弄清楚名称列表可能会更长或更短，因此表达式必须是动态的，而不仅仅是匹配一个。设置名称数量。

目标文本：

还有钢筋石墙、木墙、石墙、
随机人，一个笨重的土元素，随机人，随机人，
随机的人。

（为了可读性分成多行）

如何解决这个问题？

Answer 1

代码

sed -r ':a
s/, ([a-zA-Z]*)([,\.])/\n##\1\n\2/
ta
' | sed -n 's/##//gp'

输出

randomperson
randomperson
randomperson
randomperson

说明：

开始循环

sed -r ':a

查找所有出现的“, oneword”或“, oneword”。并替换为 ##oneword 或 ##oneword。 ## 是一个神奇的标记，用于稍后识别提取的名称

s/, ([a-zA-Z]*)([,\.])/\n##\1\n\2/

结束循环

ta

根据##过滤行以仅提取一个单词

' | sed -n 's/##//gp'

Answer 2

在节目中

my $text = "Also there is a reinforced stone wall, a wooden wall, a stone wall, "
    . "randomperson, a lumbering earth elemental, randomperson, "
    . "randomperson, randomperson."

my @single_words = 
    grep { split == 1 } 
    split /\s*,|\.|\!|;\s*/, 
        ($text =~ /Also there is (.*)/)[0];

$text

上的正则表达式在初始短语之后获取文本，然后split 返回逗号（或其他标点符号）之间的字符串列表，并且

grep

过滤掉包含多个单词的字符串^†。

在命令行上

echo "Also there is a reinforced stone wall, a wooden wall,..., randomperson,..."
| perl -wnE'say for 
    grep { split  == 1 } 
    split /\s*,|\.|\!|;\s*/, (/Also there is (.*)/)[0]'

同上。

请向我们展示您尝试过的其他解释和评论。

^† 单独的

split

使用默认值

split ' ', $_

，其中

' '

是一种特殊模式，它在

\s+

上拆分并丢弃前导和尾随空格。但在表达式

split == 1

中，

split

位于标量 context 中（由运算符

==

强加，两侧都需要单个值），因此它返回列表中的元素数量，然后与

。

Answer 3

下面的代码假设单个单词不会位于列表中的第一个，因为它会在每个单词前面查找逗号。

while (<DATA>) {
  if (/^Also there is\b/) {
    while (/.*?,\s+(?<singleword>\S+)(?=[,.])/g) {
      print "Found [$+{singleword}]\n";
    }
  }
}
__DATA__
Ignore
Also there is a reinforced stone wall, a wooden wall, a stone wall, randomperson1, a lumbering earth elemental, randomperson2, randomperson3, randomperson4.
Also there was nada.

在线尝试。

输出：

Found [randomperson1]
Found [randomperson2]
Found [randomperson3]
Found [randomperson4]

RegEx 用于匹配逗号前面的单词，但有例外

问题描述投票：0回答：3

目标文本：

3个回答

最新问题

RegEx 用于匹配逗号前面的单词，但有例外

问题描述 投票：0回答：3

目标文本：

3个回答

最新问题

问题描述投票：0回答：3