目标是制作一个Pandas系列,其中每个元素都是一个可变长度的numpy数组。这些数组来自函数getContexts
,它获取一个数据帧cnv
的掩码结果并将其应用于另一个数据帧exp
。这样做了两次:一次用于True
(丢失),一次用于False
(no_loss)状态。我得到的错误是ValueError: setting an array element with a sequence
发生在getContexts
的第二行。
以下是一些试用的测试数据:
deldf = pd.DataFrame([[0,1,0,1],
[1,0,1,0],
[1,1,1,0]])
deldf.columns = ['a','b','c','d']
deldf['cnv'] = ['k','l','m']
deldf.set_index(deldf['cnv'], inplace=True, drop=True)
del deldf['cnv']
d_mask = deldf == 1
expdf = pd.DataFrame([[0,2,1,4,np.array([1,1,1])],
[10,0,12,1,np.array([2,2,2])],
[1,1,1,1,np.array([3,3,3])]])
expdf.columns = ['a','b','c','d','arr']
expdf['exp'] = ['x','y','Z']
expdf.set_index(expdf['exp'], inplace=True, drop=True)
del expdf['exp']
results = pd.DataFrame(dels.index)
results['exp'] = expdf.index
results.columns = ['cnv','exp']
以下是我对解决方案的尝试(请注意,d_mask是一个全局变量):
def getContexts(exp_g, cnv_gm):
lossTrue = d_mask.loc[cnv_g]
# error is thrown at line below
loss = np.array(expdf.loc[exp_g].where(lossTrue, np.nan).dropna())
no_loss = np.array(expdf.loc[exp_g].where(~lossTrue, np.nan).dropna())
return loss, no_loss
以下是我对getContexts
的致电:
results['loss'], results['no_loss'] = np.vectorize(getContexts)(results['exp'], results['cnv'])
您的代码似乎有一些参考错误。在我将dels更改为deldf并将cnv_g更改为cnv_gm后,它不再抛出错误。
deldf = pd.DataFrame([[0,1,0,1],
[1,0,1,0],
[1,1,1,0]])
deldf.columns = ['a','b','c','d']
deldf['cnv'] = ['k','l','m']
deldf.set_index(deldf['cnv'], inplace=True, drop=True)
del deldf['cnv']
d_mask = deldf == 1
expdf = pd.DataFrame([[0,2,1,4,np.array([1,1,1])],
[10,0,12,1,np.array([2,2,2])],
[1,1,1,1,np.array([3,3,3])]])
expdf.columns = ['a','b','c','d','arr']
expdf['exp'] = ['x','y','Z']
expdf.set_index(expdf['exp'], inplace=True, drop=True)
del expdf['exp']
results = pd.DataFrame(deldf.index)
results['exp'] = expdf.index
results.columns = ['cnv','exp']
def getContexts(exp_g, cnv_gm):
lossTrue = d_mask.loc[cnv_gm]
# error is thrown at line below
loss = np.array(expdf.loc[exp_g].where(lossTrue, np.nan).dropna())
no_loss = np.array(expdf.loc[exp_g].where(~lossTrue, np.nan).dropna())
return loss, no_loss
results['loss'], results['no_loss'] = np.vectorize(getContexts)(results['exp'], results['cnv'])
print(results)
cnv exp loss no_loss
0 k x [2, 4] [0, 1]
1 l y [10, 12] [0, 1]
2 m Z [1, 1, 1] [1]