我写了一个Python脚本来获取我所有的gmail。我有数十万封旧电子邮件,其中约 10,000 封未读。
成功获取所有电子邮件后,我发现 gmail 已将所有获取的电子邮件标记为“已读”。这对我来说是灾难性的,因为我只需要检查所有未读的电子邮件。
如何恢复未读邮件的信息?我将每个邮件对象转储到文件中,我的代码的核心如下所示:
m = imaplib.IMAP4_SSL("imap.gmail.com")
m.login(user,pwd)
m.select("[Gmail]/All Mail")
resp, items = m.uid('search', None, 'ALL')
uids = items[0].split()
for uid in uids:
resp, data = m.uid('fetch', uid, "(RFC822)")
email_body = data[0][1]
mail = email.message_from_string(email_body)
dumbobj(uid, mail)
我希望有一个选项可以在 gmail 中撤消此操作,或者在存储的邮件对象内有一个反映已看到状态信息的成员。
对于任何想要预防这种头痛的人,请考虑这个答案这里。然而,这对我不起作用,因为损坏已经造成了。
编辑: 我编写了以下函数来递归地“grep”对象中的所有字符串,并使用以下关键字将其应用于转储的电子邮件对象:
regex = "(?i)((marked)|(seen)|(unread)|(read)|(flag)|(delivered)|(status)|(sate))"
到目前为止,没有结果(只有一个不相关的“Delivered-To”)。我还可以尝试哪些其他关键字?
def grep_object (obj, regex , cycle = set(), matched = set()):
import re
if id(obj) in cycle:
return
cycle.update([id(obj)])
if isinstance(obj, basestring):
if re.search(regex, obj):
matched.update([obj])
def grep_dict (adict ):
try:
[ [ grep_object(a, regex, cycle, matched ) for a in ab ] for ab in adict.iteritems() ]
except:pass
grep_dict(obj)
try:grep_dict(obj.__dict__)
except:pass
try:
[ grep_object(elm, regex, cycle, matched ) for elm in obj ]
except: pass
return matched
grep_object(mail_object, regex)
我遇到了类似的问题(不是gmail),对我来说最大的问题是制作一个可重现的测试用例;我终于成功制作了一个(见下文)。
就
Seen
标志而言,我现在认为它是这样的:
\Seen
标志将返回空(即,与电子邮件消息相关,它将不存在)。UNSEEN
,其中包含该文件夹中新电子邮件的 ID(或 UID)列表(没有 \Seen
标志) BODY.PEEK
获取消息的标头,则不会设置消息上的 \Seen
;如果您使用 BODY
获取它们,则 \Seen
已设置(RFC822)
也不会设置 \Seen
(与 Gmail 的情况不同)在测试用例中,我尝试执行
pprint.pprint(inspect.getmembers(mail))
(代替你的 dumpobj(uid, mail)
) - 但只有在我确定 \Seen
已设置之后。我得到的输出发布在 mail_object_inspect.txt 中 - 据我所知,在任何可读字段中都没有提及“new/read/seen”等;此外mail.as_string()
打印:
'发件人:[电子邮件受保护] 收件人:[电子邮件受保护] 主题:这是一条测试消息! 你好。我是董事的执行助理 贝尔斯登,一家失败的投资银行。我有 获得 6,000,000 美元。 ... '
更糟糕的是,
imaplib
代码中的任何地方都没有提及“字段”(如果文件名在任何地方都不包含不区分大小写的“字段”,则会打印下面的文件名):
$ grep -L -i field /usr/lib/python{2.7,3.2}/imaplib.py
/usr/lib/python2.7/imaplib.py
/usr/lib/python3.2/imaplib.py
...所以我猜该信息没有与您的转储一起保存。
这里有一些关于重建测试用例的内容。最难的是找到一个小型 IMAP 服务器,它可以快速运行一些任意用户和电子邮件,但无需在系统上安装大量内容。最后我找到了一个:trivial-server.pl,Perl的Net::IMAP::Server的示例文件;在 Ubuntu 11.04 上测试。
测试用例粘贴在这个要点中,有两个文件(有很多评论),我将尝试发布删节版:
Net::IMAP::Server
服务器(在文件末尾有一个带有 telnet 客户端会话的终端输出粘贴)imaplib
首先,确保您有
Net::IMAP::Server
- 注意,它有很多依赖项,因此以下命令可能需要一段时间才能安装:
sudo perl -MCPAN -e 'install Net::IMAP::Server'
然后,在您获得
trivial-serverB.pl
的目录中,创建一个包含SSL证书的子目录:
mkdir certs
openssl req \
-x509 -nodes -days 365 \
-subj '/C=US/ST=Oregon/L=Portland/CN=localhost' \
-newkey rsa:1024 -keyout certs/server-key.pem -out certs/server-cert.pem
最后使用管理属性运行服务器:
sudo perl trivial-serverB.pl
请注意,
trivial-serverB.pl
有一个 hack,可以让客户端在没有 SSL 的情况下进行连接。这是trivial-serverB.pl
:
#!/usr/bin/perl
use v5.10.1;
use feature qw(say);
use Net::IMAP::Server;
package Demo::IMAP::Hack;
$INC{'Demo/IMAP/Hack.pm'} = 1;
sub capabilityb {
my $self = shift;
print STDERR "Capabilitin'\n";
my $base = $self->server->capability;
my @words = split " ", $base;
@words = grep {$_ ne "STARTTLS"} @words
if $self->is_encrypted;
unless ($self->auth) {
my $auth = $self->auth || $self->server->auth_class->new;
my @auth = $auth->sasl_provides;
# hack:
#unless ($self->is_encrypted) {
# # Lack of encrpytion makes us turn off all plaintext auth
# push @words, "LOGINDISABLED";
# @auth = grep {$_ ne "PLAIN"} @auth;
#}
push @words, map {"AUTH=$_"} @auth;
}
return join(" ", @words);
}
package Demo::IMAP::Auth;
$INC{'Demo/IMAP/Auth.pm'} = 1;
use base 'Net::IMAP::Server::DefaultAuth';
sub auth_plain {
my ( $self, $user, $pass ) = @_;
# XXX DO AUTH CHECK
$self->user($user);
return 1;
}
package Demo::IMAP::Model;
$INC{'Demo/IMAP/Model.pm'} = 1;
use base 'Net::IMAP::Server::DefaultModel';
sub init {
my $self = shift;
$self->root( Demo::IMAP::Mailbox->new() );
$self->root->add_child( name => "INBOX" );
}
###########################################
package Demo::IMAP::Mailbox;
use base qw/Net::IMAP::Server::Mailbox/;
use Data::Dumper;
my $data = <<'EOF';
From: [email protected]
To: [email protected]
Subject: This is a test message!
Hello. I am executive assistant to the director of
Bear Stearns, a failed investment Bank. I have
access to USD6,000,000. ...
EOF
my $msg = Net::IMAP::Server::Message->new($data);
sub load_data {
my $self = shift;
$self->add_message($msg);
}
my %ports = ( port => 143, ssl_port => 993 );
$ports{$_} *= 10 for grep {$> > 0} keys %ports;
$myserv = Net::IMAP::Server->new(
auth_class => "Demo::IMAP::Auth",
model_class => "Demo::IMAP::Model",
user => 'nobody',
log_level => 3, # at least 3 to output 'CONNECT TCP Peer: ...' message; 4 to output IMAP commands too
%ports,
);
# apparently, this overload MUST be after the new?! here:
{
no strict 'refs';
*Net::IMAP::Server::Connection::capability = \&Demo::IMAP::Hack::capabilityb;
}
# https://stackoverflow.com/questions/27206371/printing-addresses-of-perl-object-methods
say " -", $myserv->can('validate'), " -", $myserv->can('capability'), " -", \&Net::IMAP::Server::Connection::capability, " -", \&Demo::IMAP::Hack::capabilityb;
$myserv->run();
上面的服务器在一个终端中运行,在另一个终端中你可以这样做:
python testimap.py
代码将简单地从上面服务器呈现的一条(也是唯一的)消息中读取字段和内容,并最终恢复(删除)
\Seen
字段。
import sys
if sys.version_info[0] < 3: # python 2.7
def uttc(x):
return x
else: # python 3+
def uttc(x):
return x.decode("utf-8")
import imaplib
import email
import pprint,inspect
imap_user = 'nobody'
imap_password = 'whatever'
imap_server = 'localhost'
conn = imaplib.IMAP4(imap_server)
conn.debug = 3
try:
(retcode, capabilities) = conn.login(imap_user, imap_password)
except:
print(sys.exc_info()[1])
sys.exit(1)
# not conn.select(readonly=1), else we cannot modify the \Seen flag later
conn.select() # Select inbox or default namespace
(retcode, messages) = conn.search(None, '(UNSEEN)')
if retcode == 'OK':
for num in uttc(messages[0]).split(' '):
if not(num):
print("No messages available: num is `{0}`!".format(num))
break
print('Processing message: {0}'.format(num))
typ, data = conn.fetch(num,'(FLAGS)')
isSeen = ( "Seen" in uttc(data[0]) )
print('Got flags: {2}: {0} .. {1}'.format(typ,data, # NEW: OK .. ['1 (FLAGS ())']
"Seen" if isSeen else "NEW"))
print('Peeking headers, message: {0} '.format(num))
typ, data = conn.fetch(num,'(BODY.PEEK[HEADER])')
pprint.pprint(data)
typ, data = conn.fetch(num,'(FLAGS)')
isSeen = ( "Seen" in uttc(data[0]) )
print('Got flags: {2}: {0} .. {1}'.format(typ,data, # NEW: OK .. ['1 (FLAGS ())']
"Seen" if isSeen else "NEW"))
print('Get RFC822 body, message: {0} '.format(num))
typ, data = conn.fetch(num,'(RFC822)')
mail = email.message_from_string(uttc(data[0][1]))
#pprint.pprint(inspect.getmembers(mail))
typ, data = conn.fetch(num,'(FLAGS)')
isSeen = ( "Seen" in uttc(data[0]) )
print('Got flags: {2}: {0} .. {1}'.format(typ,data, # NEW: OK .. ['1 (FLAGS ())']
"Seen" if isSeen else "NEW"))
print('Get headers, message: {0} '.format(num))
typ, data = conn.fetch(num,'(BODY[HEADER])') # note, FLAGS (\\Seen) is now in data, even if not explicitly requested!
pprint.pprint(data)
print('Get RFC822 body, message: {0} '.format(num))
typ, data = conn.fetch(num,'(RFC822)')
mail = email.message_from_string(uttc(data[0][1]))
pprint.pprint(inspect.getmembers(mail)) # this is in mail_object_inspect.txt
pprint.pprint(mail.as_string())
typ, data = conn.fetch(num,'(FLAGS)')
isSeen = ( "Seen" in uttc(data[0]) )
print('Got flags: {2}: {0} .. {1}'.format(typ,data, # Seen: OK .. ['1 (FLAGS (\\Seen))']
"Seen" if isSeen else "NEW"))
conn.select() # select again, to see flags server side
# * OK [UNSEEN 0] # no more unseen messages (if there was only one msg in folder)
print('Restoring flag to unseen/new, message: {0} '.format(num))
ret, data = conn.store(num,'-FLAGS','\\Seen')
if ret == 'OK':
print("Set back to unseen; Got OK: {0}{1}{2}".format(data,'\n',30*'-'))
print(mail)
typ, data = conn.fetch(num,'(FLAGS)')
isSeen = ( "Seen" in uttc(data[0]) )
print('Got flags: {2}: {0} .. {1}'.format(typ,data, # NEW: OK .. [b'1 (FLAGS ())']
"Seen" if isSeen else "NEW"))
conn.close()
我将留下有效的内容,它是其他线程中留下的几个答案的组合。看来APPEND中的flags不太好理解。
import imaplib
mail.select("Inbox", readonly=False)
status, messages = mail.uid('SEARCH', None, '(UNSEEN)')
message_ids = messages[0].split()
for message_id in message_ids:
status, message_data = mail.uid('FETCH', message_id, '(RFC822)')
email_from = message.get("From")
email_subject = message.get("Subject")
mail.uid('STORE', message_id, '-FLAGS', '\\SEEN')