numpy.save()里面的Pickle TypeError

问题描述 投票:0回答:1

我有一个功能来计算功能,然后将功能保存到pickle中。

test_knn_feats = NNF.predict(X_test) 
np.save('data/knn_feats_%s_test.npy' % metric , test_knn_feats)

在函数中,如果n_jobs大于1,则执行下面的代码。

fest_feats =[]
pool = Pool(processes = self.n_jobs) 
for i in range(X.shape[0]):
    test_feats.append(pool.apply_async(self.get_features_for_one(X[i:i+1])))
pool.close()
pool.join()

return np.vstack(test_feats)

但是,出现错误:

TypeError                                 Traceback (most recent call last)
<ipython-input-96-4f707b7cd533> in <module>()
     12     print(test_knn_feats)
     13     # Dump the features to disk
---> 14     np.save('data/knn_feats_%s_test.npy' % metric , test_knn_feats)

/opt/conda/lib/python3.6/site-packages/numpy/lib/npyio.py in save(file, arr, allow_pickle, fix_imports)
    507         arr = np.asanyarray(arr)
    508         format.write_array(fid, arr, allow_pickle=allow_pickle,
--> 509                            pickle_kwargs=pickle_kwargs)
    510     finally:
    511         if own_fid:

/opt/conda/lib/python3.6/site-packages/numpy/lib/format.py in write_array(fp, array, version, allow_pickle, pickle_kwargs)
    574         if pickle_kwargs is None:
    575             pickle_kwargs = {}
--> 576         pickle.dump(array, fp, protocol=2, **pickle_kwargs)
    577     elif array.flags.f_contiguous and not array.flags.c_contiguous:
    578         if isfileobj(fp):

函数get_features_for_one将返回一个列表,如下所示。

...
knn_feats = np.hstack(return_list)
assert knn_feats.shape == (239,) or knn_feats.shape == (239, 1)
return knn_feats

*更新:

test_feats =[]      
pool = Pool(processes = self.n_jobs) 
for i in range(X.shape[0]):
    test_feats.append(pool.apply_async(self.get_features_for_one, (X[i:i+1],)))
test_feats= [res.get() for res in test_feats]        
pool.close()
pool.join()
return np.vstack(test_feats)
python multithreading numpy
1个回答
0
投票

这里有两个主要的错误:

test_feats =[] # you called it fest_feats, I assume a typo
pool = Pool(processes = self.n_jobs) 
for i in range(X.shape[0]):
    test_feats.append(pool.apply_async(self.get_features_for_one(X[i:i+1])))
    pool.close()
    pool.join()

return np.vstack(test_feats)
  1. 首先,您创建一个池。然后为每个i提交一份工作,然后关闭并加入游泳池。您应该只关闭并加入池一次,最后,在循环外部。
  2. test_feats最终是一个“期货”列表,而不是实际数据。所以vstack()对他们毫无意义。您需要在每个未来调用get()以获得get_features_for_one()的结果,然后将该列表传递给vstack()。例如np.vstack([res.get() for res in test_feats])

简而言之,您的问题与最终从numpy.save()收到的TypeError无关 - 您的问题是您的逻辑完全被破坏而您的数据不是您认为的那样。

© www.soinside.com 2019 - 2024. All rights reserved.