我有两个dfs,
df1
ARHGEF10L HIF3A RNF17 RNF10 RNF11
NCBP1 NaN -0.432931 NaN -0.233554 0.165081
NCBP2 0.184332 -0.077655 0.331873 -0.449421 0.153836
RPL37 NaN NaN 0.192629 NaN -0.089123
DHX9 -0.115242 -0.133209 -0.207657 -0.267636 0.363868
TCOF1 NaN NaN 0.084838 0.140575 -0.122832
df2:
ARHGEF10L HIF3A RNF17 RNF10 RNF11
NCBP1 NaN 0.000067 NaN 0.038310 NaN
NCBP2 NaN NaN 0.002809 0.000033 NaN
RPL37 NaN NaN NaN NaN NaN
DHX9 NaN NaN NaN 0.017100 0.000979
TCOF1 NaN NaN NaN NaN NaN
现在我想创建一个包含4列的新df:gene1(df1和df2的行索引),gene2(df1和df2的列索引),value1(df1值)和value2(df2值)
所以就像 25*4
gene1 gene2 value1 value2
NCBP1 ARHGEF10L NaN Nan
NCBP1 HIF3A -0.432931 0.000067
NCBP1 RNF17 NaN NaN
NCBP1 RNF10 -0.233554 0.038310
NCBP1 RNF11 0.165081 NaN
NCBP2 ARHGEF10L 0.184332 Nan
NCBP2 HIF3A -0.077655 NaN
NCBP2 RNF17 0.331873 0.002809
....
TCOF1 ARHGEF10L NaN NaN
TCOF1 HIF3A NaN NaN
...
我有这个假代码,但不确定中间部分
def coef_fdr_table(df1,df2):
column_names = ['gene1','gene2', 'value1', 'value2']
df = pd.DataFrame(columns = column_names)
for i in range(25):
df.iloc[i,2] = df1[...]
df.iloc[i,3] = df2[...]
df.set_index(['gene1'],inplace = True)
return(df)
如有任何建议,我们将不胜感激!
您可以使用 MultiIndex
concat
两个数据框并重塑:
(pd.concat({'value1': df1,
'value2': df2,
}, axis=1)
.stack(1, dropna=False) # value identifiers to rows
.rename_axis(['gene1', 'gene2']) # set future column names
.reset_index() # index to columns
)
输出:
gene1 gene2 value1 value2
0 NCBP1 ARHGEF10L NaN NaN
1 NCBP1 HIF3A -0.432931 0.000067
2 NCBP1 RNF10 -0.233554 0.038310
3 NCBP1 RNF11 0.165081 NaN
4 NCBP1 RNF17 NaN NaN
5 NCBP2 ARHGEF10L 0.184332 NaN
...
import pandas as pd
df1 = pd.DataFrame({
'ARHGEF10L': [None, 0.184332, None, -0.115242, None],
'HIF3A': [-0.432931, -0.077655, None, -0.133209, None],
'RNF17': [None, 0.331873, 0.192629, -0.207657, 0.084838],
'RNF10': [-0.233554, -0.449421, None, -0.267636, 0.140575],
'RNF11': [0.165081, 0.153836, -0.089123, 0.363868, -0.122832]
}, index=['NCBP1', 'NCBP2', 'RPL37', 'DHX9', 'TCOF1'])
df2 = pd.DataFrame({
'ARHGEF10L': [None, None, None, None, None],
'HIF3A': [0.000067, None, None, None, None],
'RNF17': [None, 0.002809, None, None, None],
'RNF10': [0.038310, 0.000033, None, 0.017100, None],
'RNF11': [None, None, None, 0.000979, None]
}, index=['NCBP1', 'NCBP2', 'RPL37', 'DHX9', 'TCOF1'])
dfs_stacked = [df1.stack(),df2.stack()]
res1 = pd.concat(dfs_stacked, axis=1).reset_index()\
.rename(columns={'level_0':'Gini_1','level_1':'Gini_2'})\
.sort_values(['Gini_1','Gini_2'])
print(res1)
或者
res2 = pd.concat(dfs_stacked,axis=1).reset_index()
res2.columns = ['Gini_1', 'Gini_2', 'value1', 'value2']
print(res2)