改进Python代码以供使用(跳过重复步骤)

问题描述 投票:1回答:1

我一直在研究一个需要拆分10-12列并堆叠它们的项目。唯一的问题是我必须反复做。我的意思是,一旦我拆分1列,我将其堆叠,然后对其他列重复相同的步骤。虽然运行代码没有问题,但我正在寻找一种更有效的方法。

我目前正在重复相同的过程10-12次,并且运行代码需要一些时间,因为有50多个列名称。

df1 = (df1.set_index(['Announced Date', 'Completed Date', 'Target Company',
                      'Target Dominant Sector', 'Target Dominant Country', 'Target State',
                      'Target Financial Advisor', 'Target Legal Advisor', 'Target Broker', 
                      'Target Accountant', 'Target PR', 'Target Consultant',
                      'Bidder Company', 'Bidder Dominant Country', 'Bidder State',
                      'Bidder Financial Advisor', 'Bidder Legal Advisor', 'Bidder Broker', 
                      'Bidder Accountant', 'Bidder PR', 'Bidder Consultant', 
                      'Seller Company', 'Seller Dominant Country', 'Seller State', 
                      'Seller Financial Advisor', 'Seller Legal Advisor', 'Seller Broker', 
                      'Seller Accountant', 'Seller PR', 'Seller Consultant',
                      'Reported Revenue Multiple Y1', 'Reported EBIT Multiple Y1', 'Reported EBITDA Multiple Y1', 
                      'Reported PE Multiple Y1', 'Reported Book Value Multiple Y1', 'Deal Description', 
                      'Deal Type', 'Deal Nature', "Deal Value USD(m)", 
                      'Deal ID', 'Deal within regular criteria','Target companies', 
                      'Target FAs', 'Taget LAs', "Taget Brokers", 
                      "Target Accountants", 'Target PRs','Target Consultants',
                      'Bidder Companies', 'Bidder FAs', 'Bidder LAs', 
                      "Bidder Brokers", "Bidder Accountants","Bidder PRs",
                      "Bidder Consultants",'Seller Companies']).stack()
        .reset_index(level=[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,
                          29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55], name='Seller FAs')
        .reset_index(drop=True))

我知道我可以使用而不是键入所有列名

df1.columns

而不是单独使用0-55数字,我可以使用

np.arange(56)

但我无法在代码中加入这些内容。请有人帮我提高效率吗?

python pandas numpy
1个回答
0
投票

您可以使用:

df1 = (df1.set_index(df1.columns.tolist())
          .stack()
          .reset_index(level=np.arange(56))
          .reset_index(drop=True))

但也许DataFrame.melt应该更好:

df1 = pd.DataFrame({
         'A':[4,5,4],
         'B':[7,2,3],
         'C':[1,3,1],
})

print (df1)
   A  B  C
0  4  7  1
1  5  2  3
2  4  3  1

df1 = df1.rename_axis('a').reset_index().melt('a',var_name='b', value_name='c')
print (df1)
   a  b  c
0  0  A  4
1  1  A  5
2  2  A  4
3  0  B  7
4  1  B  2
5  2  B  3
6  0  C  1
7  1  C  3
8  2  C  1

如有必要排序:

df2 = df1.sort_values(['a','b'])
print (df2)
   a  b  c
0  0  A  4
3  0  B  7
6  0  C  1
1  1  A  5
4  1  B  2
7  1  C  3
2  2  A  4
5  2  B  3
8  2  C  1
© www.soinside.com 2019 - 2024. All rights reserved.