使用元数据解析电子邮件字符串并获取发件人和抄送值

Question

我正在尝试从转发的电子邮件中获取电子邮件并抄送，当正文如下所示：

$body = '-------
Begin forwarded message:


From: Sarah Johnson <[email protected]>

Subject: email subject

Date: February 22, 2013 3:48:12 AM

To: Email Recipient <[email protected]>

Cc: Ralph Johnson <[email protected]>


Hi,


hello, thank you and goodbye!

 [email protected]'

现在，当我执行以下操作时：

$body = strtolower($body);
$pattern = '#from: \D*\S([\w-\.]+)@((?:[\w]+\.)+)([a-zA-Z]{2,4})\S#';
if (preg_match($pattern, $body, $arr_matches)) {
     echo htmlentities($arr_matches[0]);
     die();
}

我正确地得到：

from: sarah johnson <[email protected]>

现在，为什么 cc 不起作用？我做了一些非常相似的事情，只是从改为 cc:

$body = strtolower($body);
$pattern = '#cc: \D*\S([\w-\.]+)@((?:[\w]+\.)+)([a-zA-Z]{2,4})\S#';
if (preg_match($pattern, $body, $arr_matches)) {
     echo htmlentities($arr_matches[0]);
     die();
}

我得到：

cc: ralph johnson <[email protected]> hi, hello, thank you and goodbye! [email protected]

如果我从原始正文页脚中删除电子邮件（删除 [电子邮件受保护]），那么我会正确得到：

cc: ralph johnson <[email protected]>

看起来该电子邮件正在影响正则表达式。但它是如何以及为什么不影响它呢？我该如何解决这个问题？

Answer 1

问题是，

\D*

匹配太多，即它也匹配换行符。我在这里会更加严格。你为什么使用

\D

（不是数字）？

例如

[^@]*

它正在工作

cc: [^@]*\S([\w-\.]+)@((?:[\w]+\.)+)([a-zA-Z]{2,4})\S

请参阅这里的 Regexr。

这样，您就可以确定第一部分与电子邮件地址之外的内容不匹配。

这

\D

也是原因，它适用于第一个“From”案例。 “日期”行中有数字，因此与该行不匹配。

Answer 2

尝试这样

$body = '-------
Begin forwarded message:


From: Sarah Johnson <[email protected]>

Subject: email subject

Date: February 22, 2013 3:48:12 AM

To: Email Recipient <[email protected]>

Cc: Ralph Johnson <[email protected]>


Hi,


hello, thank you and goodbye!

 [email protected]';

$pattern = '#(?:from|Cc):\s+[^<>]+<([^@]+@[^>\s]+)>#is';
preg_match_all($pattern, $body, $arr_matches);
echo '<pre>' . htmlspecialchars(print_r($arr_matches, 1)) . '</pre>';

输出

Array
(
    [0] => Array
        (
            [0] => From: Sarah Johnson <[email protected]>
            [1] => Cc: Ralph Johnson <[email protected]>
        )

    [1] => Array
        (
            [0] => [email protected]
            [1] => [email protected]
        )

)

$arr_matches[1][0] - "From" email
$arr_matches[1][1] - "Cc" email

使用元数据解析电子邮件字符串并获取发件人和抄送值

问题描述投票：0回答：2

2个回答

最新问题

使用元数据解析电子邮件字符串并获取发件人和抄送值

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2