对于所有正则表达式专家来说,这可能非常简单,但我已经花了足够的时间试图自己找到答案。
我使用 Doc Parser,它可以让您创建文本解析规则。您可以使用正则表达式进行搜索。文档说支持 PERL 正则表达式,并且 Regex 101 站点是测试表达式的好地方,但我过去发现在 Regex 101 中工作的表达式似乎并不总是在 Doc Parser 中工作。
我正在尝试创建一个表达式来搜索三个字符串之一的最后一个实例。这三个字符串是:
i am sitting with after this meeting are
won't be included in your published notes
Single Signal
输入文本可以看起来三种不同的方式,这就是我正在寻找三个字符串之一的原因。以下是三个例子:
例1:
Single Signal
Two things I am sitting with after this meeting are...
- Words words words
例2:
Single Signal
- words words words
例3:
Single Signal
words words that end in won't be included in your published notes.
- words
我捕捉到的三个短语最终成为我真正从文本中提取内容的起点。
我用它作为我的核心/根表达:
(?i)(i am sitting with after this meeting are|This is for internal
use and won't be included in your published notes|Single Signal)
并在表达式末尾尝试了各种方法来指示匹配文本中最后/最新出现的内容。
(?i)(i am sitting with after this meeting are|This is for internal
use and won't be included in your published notes|Single Signal).*?
(?i)(i am sitting with after this meeting are|This is for internal
use and won't be included in your published notes|Single Signal)+
(?i)(i am sitting with after this meeting are|This is for internal
use and won't be included in your published notes|Single Signal){1}
这在 Regex 101、PCRE2 中有效,但在 Doc Parser (Perl) 中不起作用:
(?i)[^(i am sitting with after this meeting are|won't be included in your published notes|Single Signal)]+$
非常感谢所有帮助。谢谢!
全局匹配并捕获模式,您将以最后一场匹配结束
/(one|two|three)/g
最终将捕获上一场比赛中的三种模式中的任意一种。
我不知道你的文档解析器怎么样。有效,但正则表达式是
/(?i)(i am sitting with after this meeting are|won't be included in your published notes|Single Signal)/