如何从multipart/alternative的纯文本部分提取文本?

问题描述 投票:0回答:1
# main.py
import email
from email.iterators import _structure
import sys
msg = email.message_from_string(sys.stdin.read())
_structure(msg)
./main.py <<EOF
From:  Nathaniel Borenstein <[email protected]>
To: Ned Freed <[email protected]>
Subject: Formatted text mail
MIME-Version: 1.0
Content-Type: multipart/alternative; boundary=boundary42


--boundary42
Content-Type: text/plain; charset=us-ascii

...plain text version of message goes here....

--boundary42
Content-Type: text/richtext

.... richtext version of same message goes here ...
--boundary42
Content-Type: text/x-whatever

.... fanciest formatted version of same  message  goes  here
...
--boundary42--
EOF

输出

multipart/alternative
    text/plain
    text/richtext
    text/x-whatever

我可以调用电子邮件模块来获取像上面这样的多部分电子邮件的结构。如何提取电子邮件的文本/纯文本部分? (在这个特定的例子中,它应该是“......消息的纯文本版本在这里......”。)

python email multipart
1个回答
0
投票

您调用

msg.get_payload()
获取消息的有效负载,然后遍历各个部分,直到找到
text/plain
部分:

# main.py
import email
import sys

msg = email.message_from_string(sys.stdin.read())

for part in msg.get_payload():
    if part.get_content_type() == 'text/plain':
        print(part.get_payload())

给定您的样本输入,上面的代码产生输出:

...plain text version of message goes here....

你可以改用

email.iterators.typed_subpart_iterator
,像这样:

# main.py
import email
import email.iterators
import sys

msg = email.message_from_string(sys.stdin.read())

for part in email.iterators.typed_subpart_iterator(msg, maintype='text', subtype="plain"):
    print(part.get_payload())

这会产生与前面示例相同的输出。

© www.soinside.com 2019 - 2024. All rights reserved.