我有一个数据框,我试图得到一个系列的形式:
col1 col2 col3
col1 1.0 0.20 0.70
col2 0.2 1.00 0.01
col3 0.7 0.01 1.00
目标:
col1Xcol1 1.0
col1Xcol2 0.2
col1Xcol3 0.7
col2Xcol1 0.2
...
我的代码到目前为止:
pvals2=pd.DataFrame({'col1': [1, .2,.7],
'col2': [.2, 1,.01],
'col3': [.7,.01,1]},
index = ['col1', 'col2', 'col3'])
print(pvals.transpose().join(pvals, how='outer',lsuffix='_left', rsuffix='_right'))
OUTPUT:
vote_left ballot1_left ballot1_x_left vote_right ballot1_right \
vote 0 0.0923 0.0521 0 0.0923
ballot1 0.0923 0 0.8213 0.0923 0
ballot1_x 0.0521 0.8213 0 0.0521 0.8213
ballot1_x_right
vote 0.0521
ballot1 0.8213
ballot1_x 0
首先堆叠数据帧
st = pvals2.stack()
通过将多索引添加在一起来创建新索引
newdex = st.index._get_level_values(0) + 'X' + st.index._get_level_values(1)
将newdex
设置为该系列的索引
st.set_axis(0,newdex)
全部一起
st = pvals2.stack()
st.set_axis(0,st.index._get_level_values(0) + 'X' + st.index._get_level_values(1))
col1Xcol1 1.00
col1Xcol2 0.20
col1Xcol3 0.70
col2Xcol1 0.20
col2Xcol2 1.00
col2Xcol3 0.01
col3Xcol1 0.70
col3Xcol2 0.01
col3Xcol3 1.00
考虑使用melt
为新索引分配列,然后选择值列,因为单个pandas DataFrame列是一个pandas系列:
数据
from io import StringIO
import pandas as pd
txt = ''' col1 col2 col3
col1 1.0 0.20 0.70
col2 0.2 1.00 0.01
col3 0.7 0.01 1.00'''
df = pd.read_table(StringIO(txt), sep="\s+")
系列搭建
mdf = pd.melt(df.reset_index(), id_vars='index')
mdf['s'] = mdf['index'] + 'X' + mdf['variable']
new_series = mdf.set_index('s').rename_axis(None)['value']
print(new_series)
# col1Xcol1 1.00
# col2Xcol1 0.20
# col3Xcol1 0.70
# col1Xcol2 0.20
# col2Xcol2 1.00
# col3Xcol2 0.01
# col1Xcol3 0.70
# col2Xcol3 0.01
# col3Xcol3 1.00
# Name: value, dtype: float64
concat
和设置新索引的工作原理:
>>> ser = pd.concat([pvals2[col] for col in pvals2.columns])
>>> ser.index = [pvals2[col].name + 'X' + x for col in pvals2.columns
for x in pvals2[col].index]
>>> ser
col1Xcol1 1.00
col1Xcol2 0.20
col1Xcol3 0.70
col2Xcol1 0.20
col2Xcol2 1.00
col2Xcol3 0.01
col3Xcol1 0.70
col3Xcol2 0.01
col3Xcol3 1.00
dtype: float64
以下代码:
pvals = pd.DataFrame({'col1': [1, .2,.7],
'col2': [.2, 1,.01],
'col3': [.7,.01,1]},
index = ['row1', 'row2', 'row3'])
values = []
ind = []
for i in range(len(pvals.index)):
for col in pvals:
row = pvals.index[i]
values.append(pvals[col][row])
ind.append("%sX%s" % (row, col))
newpvals = pd.Series(values, ind)
得到:
>>> newvals
row1Xcol1 1.00
row1Xcol2 0.20
row1Xcol3 0.70
row2Xcol1 0.20
row2Xcol2 1.00
row2Xcol3 0.01
row3Xcol1 0.70
row3Xcol2 0.01
row3Xcol3 1.00
dtype: float64
编辑:我误读了,所以变成了Series
。