正则表达式检测空行为结束[重复]

问题描述 投票:0回答:1

这个问题在这里已有答案:

我想从一些文本中提取序列。

序列以Diagnostic-Code:开头,中间部分甚至可以是多行的任何字符,并且末尾用空行标记(之后文本继续,但这不是所需序列的一部分)。

这确实适用于开头和中间部分,但结尾发现太晚了:

(?s)Diagnostic-Code: (.+)\n\n

字符串看起来像这样:

...
Status: 5.0.0
Diagnostic-Code: X-Postfix; test.com
*this*
*should*
*be included too*

--EA7634814EFB9.1516804532/mail.example.com
Content-Description: Undelivered Message
...

---------编辑---------

谢谢你@werur的答案!

但java.util.regex的行为方式与regex101.com不同

Action: failed
Status: 5.1.1
Remote-MTA: dns; gmail-smtp-in.l.google.com
Diagnostic-Code: smtp; 550-5.1.1 The email account that you tried to reach does
    not exist. Please try 550-5.1.1 double-checking the recipient's email
    address for typos or 550-5.1.1 unnecessary spaces. Learn more at 550 5.1.1
    https://support.google.com/mail/?p=NoSuchUser u11si15276978wru.314 - gsmtp

--E8A363093CEC.1520529178/proxy03.hostname.net
Content-Description: Undelivered Message
Content-Type: message/rfc822

Return-Path: <[email protected]>

该模式匹配regex101上的整个多行诊断代码,但java仅将第一行与组1匹配:

smtp; 550-5.1.1 The email account that you tried to reach does

java代码:

diagnosticCodePatter = Pattern.compile("(?i)diagnostic[-| ]Code: ([\\s\\S]*?[\\r\\n]{2})");
matcher = diagnosticCodePatter.matcher(message);
    if (matcher.find()) {
        diagnosticCode = matcher.group(0);
java regex
1个回答
3
投票

试试这个正则表达式:

Diagnostic-Code[\s\S]*?[\r\n]{2}

Click for Demo

别忘了在Java面前用另一个\逃离\

说明

  • Diagnostic-Code - 匹配文本Diagnostic-Code
  • [\s\S]*? - 尽可能少地匹配任何字符的出现次数(包括换行符)
  • [\r\n]{2} - 匹配换行符或回车符的两次出现。
© www.soinside.com 2019 - 2024. All rights reserved.