通过迭代列将pandas数据帧转换为系列

Question

我有一个数据框，我试图得到一个系列的形式：

      col1  col2  col3
col1   1.0  0.20  0.70
col2   0.2  1.00  0.01
col3   0.7  0.01  1.00

目标：

col1Xcol1 1.0
col1Xcol2 0.2
col1Xcol3 0.7
col2Xcol1 0.2
...

我的代码到目前为止：

pvals2=pd.DataFrame({'col1': [1, .2,.7], 
                     'col2': [.2, 1,.01],
                     'col3': [.7,.01,1]},
                    index = ['col1', 'col2', 'col3'])

print(pvals.transpose().join(pvals, how='outer',lsuffix='_left', rsuffix='_right'))

OUTPUT：

          vote_left ballot1_left ballot1_x_left vote_right ballot1_right  \
vote              0       0.0923         0.0521          0        0.0923   
ballot1      0.0923            0         0.8213     0.0923             0   
ballot1_x    0.0521       0.8213              0     0.0521        0.8213   

          ballot1_x_right  
vote               0.0521  
ballot1            0.8213  
ballot1_x               0

Answer 1

首先堆叠数据帧

st = pvals2.stack()

通过将多索引添加在一起来创建新索引

newdex = st.index._get_level_values(0) + 'X' + st.index._get_level_values(1)

将newdex设置为该系列的索引

st.set_axis(0,newdex)

全部一起

st = pvals2.stack()
st.set_axis(0,st.index._get_level_values(0) + 'X' + st.index._get_level_values(1))

col1Xcol1    1.00
col1Xcol2    0.20
col1Xcol3    0.70
col2Xcol1    0.20
col2Xcol2    1.00
col2Xcol3    0.01
col3Xcol1    0.70
col3Xcol2    0.01
col3Xcol3    1.00

Answer 2

考虑使用melt为新索引分配列，然后选择值列，因为单个pandas DataFrame列是一个pandas系列：

数据

from io import StringIO
import pandas as pd

txt = '''      col1  col2  col3
col1   1.0  0.20  0.70
col2   0.2  1.00  0.01
col3   0.7  0.01  1.00'''

df = pd.read_table(StringIO(txt), sep="\s+")

系列搭建

mdf = pd.melt(df.reset_index(), id_vars='index')
mdf['s'] = mdf['index'] + 'X' + mdf['variable']

new_series = mdf.set_index('s').rename_axis(None)['value']

print(new_series)
# col1Xcol1    1.00
# col2Xcol1    0.20
# col3Xcol1    0.70
# col1Xcol2    0.20
# col2Xcol2    1.00
# col3Xcol2    0.01
# col1Xcol3    0.70
# col2Xcol3    0.01
# col3Xcol3    1.00
# Name: value, dtype: float64

Answer 3

concat和设置新索引的工作原理：

>>> ser = pd.concat([pvals2[col] for col in pvals2.columns])
>>> ser.index = [pvals2[col].name + 'X' + x for col in pvals2.columns 
                 for x in pvals2[col].index]
>>> ser
col1Xcol1    1.00
col1Xcol2    0.20
col1Xcol3    0.70
col2Xcol1    0.20
col2Xcol2    1.00
col2Xcol3    0.01
col3Xcol1    0.70
col3Xcol2    0.01
col3Xcol3    1.00
dtype: float64

Answer 4

以下代码：

pvals = pd.DataFrame({'col1': [1, .2,.7], 
                      'col2': [.2, 1,.01],
                      'col3': [.7,.01,1]},
                     index = ['row1', 'row2', 'row3'])

values = []
ind = []
for i in range(len(pvals.index)):
    for col in pvals:
        row = pvals.index[i]
        values.append(pvals[col][row])
        ind.append("%sX%s" % (row, col))

newpvals = pd.Series(values, ind)

得到：

>>> newvals
row1Xcol1    1.00
row1Xcol2    0.20
row1Xcol3    0.70
row2Xcol1    0.20
row2Xcol2    1.00
row2Xcol3    0.01
row3Xcol1    0.70
row3Xcol2    0.01
row3Xcol3    1.00
dtype: float64

编辑：我误读了，所以变成了Series。

通过迭代列将pandas数据帧转换为系列

问题描述投票：1回答：4

4个回答

最新问题

通过迭代列将pandas数据帧转换为系列

问题描述 投票：1回答：4

4个回答

最新问题

问题描述投票：1回答：4