我正在尝试从转发的电子邮件中获取电子邮件并抄送,当正文如下所示:
$body = '-------
Begin forwarded message:
From: Sarah Johnson <[email protected]>
Subject: email subject
Date: February 22, 2013 3:48:12 AM
To: Email Recipient <[email protected]>
Cc: Ralph Johnson <[email protected]>
Hi,
hello, thank you and goodbye!
[email protected]'
现在,当我执行以下操作时:
$body = strtolower($body);
$pattern = '#from: \D*\S([\w-\.]+)@((?:[\w]+\.)+)([a-zA-Z]{2,4})\S#';
if (preg_match($pattern, $body, $arr_matches)) {
echo htmlentities($arr_matches[0]);
die();
}
我正确地得到:
from: sarah johnson <[email protected]>
现在,为什么 cc 不起作用?我做了一些非常相似的事情,只是从 改为 cc:
$body = strtolower($body);
$pattern = '#cc: \D*\S([\w-\.]+)@((?:[\w]+\.)+)([a-zA-Z]{2,4})\S#';
if (preg_match($pattern, $body, $arr_matches)) {
echo htmlentities($arr_matches[0]);
die();
}
我得到:
cc: ralph johnson <[email protected]> hi, hello, thank you and goodbye! [email protected]
如果我从原始正文页脚中删除电子邮件(删除 [电子邮件受保护]),那么我会正确得到:
cc: ralph johnson <[email protected]>
看起来该电子邮件正在影响正则表达式。但它是如何以及为什么不影响它呢?我该如何解决这个问题?
问题是,
\D*
匹配太多,即它也匹配换行符。我在这里会更加严格。你为什么使用 \D
(不是数字)?
例如
[^@]*
它正在工作
cc: [^@]*\S([\w-\.]+)@((?:[\w]+\.)+)([a-zA-Z]{2,4})\S
请参阅 这里的 Regexr。
这样,您就可以确定第一部分与电子邮件地址之外的内容不匹配。
这
\D
也是原因,它适用于第一个“From”案例。 “日期”行中有数字,因此与该行不匹配。
尝试这样
$body = '-------
Begin forwarded message:
From: Sarah Johnson <[email protected]>
Subject: email subject
Date: February 22, 2013 3:48:12 AM
To: Email Recipient <[email protected]>
Cc: Ralph Johnson <[email protected]>
Hi,
hello, thank you and goodbye!
[email protected]';
$pattern = '#(?:from|Cc):\s+[^<>]+<([^@]+@[^>\s]+)>#is';
preg_match_all($pattern, $body, $arr_matches);
echo '<pre>' . htmlspecialchars(print_r($arr_matches, 1)) . '</pre>';
输出
Array
(
[0] => Array
(
[0] => From: Sarah Johnson <[email protected]>
[1] => Cc: Ralph Johnson <[email protected]>
)
[1] => Array
(
[0] => [email protected]
[1] => [email protected]
)
)
$arr_matches[1][0] - "From" email
$arr_matches[1][1] - "Cc" email