我需要附加到一个pickle文件(因为我没有一次性拥有整个字典)。因此,为了做同样的事情,我编写了以下代码:
import pickle
p = {}
p[1] = 2
q = {}
q['a'] = p
p = {}
p[2] = 0
pickle.dump(q, open("save12.p", "ab"))
f = {}
f['b'] = p
pickle.dump(f, open("save12.p", "ab"))
但是,当我加载pickle文件时,我没有找到字典
f
的值。
我应该如何添加到 pickle 文件中?
此外,像“dbm”这样的数据库无法满足我的需要,因为我在 Windows 上工作。
Pickle 流是完全独立的,因此 unpickling 一次只会 unpickle 一个对象。
因此,要 unpickle 多个流,您应该重复 unpickle 文件,直到收到 EOFError:
>>> f=open('a.p', 'wb')
>>> pickle.dump({1:2}, f)
>>> pickle.dump({3:4}, f)
>>> f.close()
>>>
>>> f=open('a.p', 'rb')
>>> pickle.load(f)
{1: 2}
>>> pickle.load(f)
{3: 4}
>>> pickle.load(f)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
EOFError
所以你的 unpickle 代码可能看起来像
import pickle
objs = []
while 1:
try:
objs.append(pickle.load(f))
except EOFError:
break
#To append to a pickle file
import pickle
p={1:2}
q={3:4}
filename="picklefile"
with open(filename, 'a+') as fp:
pickle.dump(p,fp)
pickle.dump(q,fp)
#To load from pickle file
data = []
with open(filename, 'rb') as fr:
try:
while True:
data.append(pickle.load(fr))
except EOFError:
pass
如果有人好奇的话,保存加速确实是
n^2
,而加载速度却很小。
naive_append: 8.440 seconds
naive_load: 0.024 seconds
smart_append: 0.342 seconds
smart_load: 0.025 seconds
import os
import pickle
import time
import numpy as np
def naive_append(filename, obj):
objs = []
if os.path.exists(filename):
with open(filename, 'rb') as f:
objs = pickle.load(f)
objs.append(obj)
with open(filename, 'wb') as f:
pickle.dump(objs, f)
def naive_load(filename):
with open(filename, 'rb') as f:
return pickle.load(f)
def smart_append(filename, obj):
with open(filename, 'ab+') as f:
pickle.dump(obj, f)
def smart_load(filename):
with open(filename, 'rb') as f:
while True:
try:
yield pickle.load(f)
except EOFError:
break
filename="/home/ubuntu/tmp/test.log.pkl"
def bench(appender, loader):
if os.path.exists(filename):
os.remove(filename)
saved = []
started = time.time()
for _ in range(100):
t = np.random.randn(500, 500)
appender(filename, t)
saved.append(t)
elapsed = time.time() - started
print(f"{appender.__name__}:\t{elapsed:.3f} seconds")
started = time.time()
got_back = list(loader(filename))
elapsed = time.time() - started
print(f"{loader.__name__}:\t{elapsed:.3f} seconds")
assert len(saved) == len(got_back)
assert all(a.sum() == b.sum() for a, b in zip(saved, got_back))
bench(naive_append, naive_load)
bench(smart_append, smart_load)