将两个数据帧按元素组合为长格式

问题描述 投票:0回答:2

我有两个dfs,

df1

       ARHGEF10L     HIF3A     RNF17     RNF10     RNF11
NCBP1        NaN -0.432931       NaN -0.233554  0.165081
NCBP2   0.184332 -0.077655  0.331873 -0.449421  0.153836
RPL37        NaN       NaN  0.192629       NaN -0.089123
DHX9   -0.115242 -0.133209 -0.207657 -0.267636  0.363868
TCOF1        NaN       NaN  0.084838  0.140575 -0.122832

df2:

       ARHGEF10L     HIF3A     RNF17     RNF10     RNF11
NCBP1        NaN  0.000067       NaN  0.038310       NaN
NCBP2        NaN       NaN  0.002809  0.000033       NaN
RPL37        NaN       NaN       NaN       NaN       NaN
DHX9         NaN       NaN       NaN  0.017100  0.000979
TCOF1        NaN       NaN       NaN       NaN       NaN

现在我想创建一个包含4列的新df:gene1(df1和df2的行索引),gene2(df1和df2的列索引),value1(df1值)和value2(df2值)

所以就像 25*4

gene1    gene2      value1     value2
NCBP1   ARHGEF10L   NaN        Nan
NCBP1   HIF3A      -0.432931   0.000067
NCBP1   RNF17      NaN         NaN 
NCBP1   RNF10      -0.233554   0.038310
NCBP1   RNF11      0.165081    NaN
NCBP2   ARHGEF10L   0.184332   Nan
NCBP2   HIF3A      -0.077655   NaN
NCBP2   RNF17      0.331873    0.002809 
....
TCOF1   ARHGEF10L    NaN       NaN 
TCOF1   HIF3A        NaN       NaN 
...

我有这个假代码,但不确定中间部分

def coef_fdr_table(df1,df2):
    column_names = ['gene1','gene2', 'value1', 'value2']
    df = pd.DataFrame(columns = column_names)
    for i in range(25):
            df.iloc[i,2] = df1[...]
            df.iloc[i,3] = df2[...]
    df.set_index(['gene1'],inplace = True)
    return(df)

如有任何建议,我们将不胜感激!

python pandas dataframe
2个回答
2
投票

您可以使用 MultiIndex

concat
两个数据框并重塑:

(pd.concat({'value1': df1,
            'value2': df2,
           }, axis=1)
   .stack(1, dropna=False)           # value identifiers to rows
   .rename_axis(['gene1', 'gene2'])  # set future column names
   .reset_index()                    # index to columns
)

输出:

    gene1      gene2    value1    value2
0   NCBP1  ARHGEF10L       NaN       NaN
1   NCBP1      HIF3A -0.432931  0.000067
2   NCBP1      RNF10 -0.233554  0.038310
3   NCBP1      RNF11  0.165081       NaN
4   NCBP1      RNF17       NaN       NaN
5   NCBP2  ARHGEF10L  0.184332       NaN
...

0
投票
import pandas as pd

df1 = pd.DataFrame({
    'ARHGEF10L': [None, 0.184332, None, -0.115242, None],
    'HIF3A': [-0.432931, -0.077655, None, -0.133209, None],
    'RNF17': [None, 0.331873, 0.192629, -0.207657, 0.084838],
    'RNF10': [-0.233554, -0.449421, None, -0.267636, 0.140575],
    'RNF11': [0.165081, 0.153836, -0.089123, 0.363868, -0.122832]
}, index=['NCBP1', 'NCBP2', 'RPL37', 'DHX9', 'TCOF1'])

df2 = pd.DataFrame({
    'ARHGEF10L': [None, None, None, None, None],
    'HIF3A': [0.000067, None, None, None, None],
    'RNF17': [None, 0.002809, None, None, None],
    'RNF10': [0.038310, 0.000033, None, 0.017100, None],
    'RNF11': [None, None, None, 0.000979, None]
}, index=['NCBP1', 'NCBP2', 'RPL37', 'DHX9', 'TCOF1'])

dfs_stacked  = [df1.stack(),df2.stack()]

res1 = pd.concat(dfs_stacked, axis=1).reset_index()\
.rename(columns={'level_0':'Gini_1','level_1':'Gini_2'})\
.sort_values(['Gini_1','Gini_2'])
print(res1)

或者

res2 = pd.concat(dfs_stacked,axis=1).reset_index()
res2.columns = ['Gini_1', 'Gini_2', 'value1', 'value2']

print(res2)
© www.soinside.com 2019 - 2024. All rights reserved.