以下是pandas DataFrame的示例:
import pandas as pd
import numpy as np
dict1 = {'file': ['filename2', 'filename2', 'filename3', 'filename4',
'filename4', 'filename3'], 'amount': [3, 4, 5, 1, 2, 1],
'front': [21889611, 36357723, 196312, 11, 42, 1992],
'back':[21973805, 36403870, 277500, 19, 120, 3210],
'type':['A', 'A', 'A', 'B', 'B', 'C']}
df1 = pd.DataFrame(dict1)
print(df1)
file amount front back type
0 filename2 3 21889611 21973805 A
1 filename2 4 36357723 36403870 A
2 filename3 5 196312 277500 A
3 filename4 1 11 19 B
4 filename4 2 42 120 B
5 filename3 1 1992 3210 C
我正在使用end
定义一个新列numpy.where()
:
df1['end'] = np.where(df1['type']=='B', df1['front'], df1['front'] + df1['back'])
print(df1)
amount back file front type end
0 3 21973805 filename2 21889611 A 43863416
1 4 36403870 filename2 36357723 A 72761593
2 5 277500 filename3 196312 A 473812
3 1 19 filename4 11 B 11
4 2 120 filename4 42 B 42
5 1 3210 filename3 1992 C 5202
如果NaN
列部分存在,我想使用相同的方法填充end
值,例如这里是一个DataFrame
,其中end
确实作为一个列存在,但有许多NaN
值。 (编辑:这些非NA的值可能完全是唯一的):
new_df
amount back file front type end
0 3 21973805 filename2 21889611 A NaN
1 4 36403870 filename2 36357723 A NaN
2 5 277500 filename3 196312 A 12
3 1 19 filename4 11 B NaN
4 2 120 filename4 42 B 49
5 1 3210 filename3 1992 C NaN
我认为可以用pandas.DataFrame.fillna()
做到这一点,但这会引发错误:
df1['end'].fillna(np.where(df1['type']=='B', df1['front'], df1['front'] + df1['back']), inplace=True)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.6/site-packages/pandas/core/series.py", line 2434, in fillna
**kwargs)
File "/usr/local/lib/python3.6/site-packages/pandas/core/generic.py", line 3631, in fillna
type(value))
ValueError: invalid fill value with a <class 'numpy.ndarray'>
问题:如何仅在某列中的np.where()
值上有效使用NaN
?
fillna
基于指数
df['New']=np.where(df1['type']=='B', df1['front'], df1['front'] + df1['back'])
df
Out[125]:
amount back file front type end New
0 3 21973805 filename2 21889611 A NaN 43863416
1 4 36403870 filename2 36357723 A NaN 72761593
2 5 277500 filename3 196312 A 473812.0 473812
3 1 19 filename4 11 B NaN 11
4 2 120 filename4 42 B 42.0 42
5 1 3210 filename3 1992 C NaN 5202
df.end.fillna(df.New)
Out[126]:
0 43863416.0
1 72761593.0
2 473812.0
3 11.0
4 42.0
5 5202.0
Name: end, dtype: float64
df.end=df.end.fillna(df.New)
df
Out[128]:
amount back file front type end New
0 3 21973805 filename2 21889611 A 43863416.0 43863416
1 4 36403870 filename2 36357723 A 72761593.0 72761593
2 5 277500 filename3 196312 A 473812.0 473812
3 1 19 filename4 11 B 11.0 11
4 2 120 filename4 42 B 42.0 42
5 1 3210 filename3 1992 C 5202.0 5202
更新
df['New']=np.where(df1['type']=='B', df1['front'], df1['front'] + df1['back'])
df.end=df.end.fillna(df.New)
df
Out[133]:
amount back file front type end New
0 3 21973805 filename2 21889611 A 43863416.0 43863416
1 4 36403870 filename2 36357723 A 72761593.0 72761593
2 5 277500 filename3 196312 A 12.0 473812
3 1 19 filename4 11 B 11.0 11
4 2 120 filename4 42 B 49.0 42
5 1 3210 filename3 1992 C 5202.0 5202