我是数据科学的新手,并试图在iPython笔记本中使用python 2.7进行一些数据争论。我为第一个项目关注的教程要求我用0或1替换所有NaN输入。但是我想考虑另一种方法,我可以首先查看具有与所有行对应的非数值的行的计数将credit_history作为NaN ......
Credit_History为NaN时的理想输出:
Self_Employed
Yes 532
No 32
Married
No 398
Yes 213
对于数值,我想在credit_history为NaN时得到所有列的均值
当Credit History为NaN时,非数值的理想输出:
Mean Applicant Income: 54003.1232
LoanAmount: 35435.12
Loan_Amount_Term: 360
提前致谢!
对于值计数,您可以使用pd.Series.value_counts
:
df.loc[pd.isnull(df['Credit_History']), 'Self_Employed'].value_counts()
df.loc[pd.isnull(df['Credit_History']), 'Married'].value_counts()
要计算平均值,您可以使用pd.DataFrame.mean
:
cols = ['Applicant_Income', 'LoanAmount', 'Loan_Amount_Term']
df.loc[pd.isnull(df['Credit_History']), cols].mean()