我如何解决这个问题?
代码:
#import libraries
import pandas as pd
data = pd.read_csv('file.csv', engine='python', delimiter=';')
#change object columns into a numeric columnn
for i in data.columns :
data[i] = pd.to_numeric(data[i], errors='coerce')
dataframe
t0 (actual) t0 t0,lower t0,upper
0 11861,6318726842 0 0 0
1 4761,43316 5709,1728515625 3776,725188260803 7939,908970830105
2 36,22841951973635 0 0 0
3 583,3716479196096 0 0 0
4 25087,16436661841 26040,7890625 21825,20941707611 31905,394350044822
....
恢复:
t0 (actual) t0 t0,lower t0,upper
0 NaN NaN NaN NaN
1 NaN NaN NaN NaN
2 NaN NaN NaN NaN
3 NaN NaN NaN NaN
4 NaN NaN NaN NaN
我不能复制您的错误,但是我认为您可以添加
decimal=","
:
decimal:str,默认'.'识别小数点的特征(例如,将“”用于欧洲数据)。
更改为
如果您从#import libraries import pandas as pd data = pd.read_csv('file.csv', engine='python', delimiter=';', decimal=",") #change object columns into a numeric columnn for i in data.columns : data[i] = pd.to_numeric(data[i], errors='coerce')
errors
coerce
无法将这些值转换为数值数据类型。这是因为它不将pandas
)。这意味着拥有
,
将用
.
替换这些值。,如果看来,您正在尝试在从
errors = 'coerce'
文件中读取这些值后立即转换这些值,则应指定NaN
调用中使用的小数分离器。那么您将无需手动转换值。
.csv
但是,如果这些值来自您的脚本,则需要在字符串值中替换read_csv
值:data = pd.read_csv('file.csv', engine='python', delimiter=';', decimal = ',')
#导入numpy,因为它较早缺少
Importnumpy asnp#重新运行日志转换和回归##将“ hutpay”转换为数字,将错误迫使NAN
DF['HURPAY'] = pd.to_numeric(df ['hourpay'],errors ='coerce')
#步骤1(修订):丢失,零或负工资df= df [df ['hourpay']> 0]#创建日志(工资)
#步骤2:母亲假人:1如果有19,0下的受抚养孩子,否则
DF['MATHORHOOD'] = DF ['FDPCH19']。应用(lambda x:1 If x> 0 else 0 0)
#步骤3:转换分类变量
df['ducuntor'] = df ['degcls7']
df['sutiveation'] = df ['uction_group']。astype('cattory')
df['worktype'] = df ['ftpt']。astype('cattory')
#步骤4:体验近似(按年龄代理)
df['axperience'] = df ['age']
#步骤5:回归公式
Formula='log_wage〜母性 + C(教育) +经验 + C(职业) + C(worktype)'
#步骤6:使用可靠的标准错误运行OLS回归
model= smf.ols(公式,data = df).fit(cov_type ='hc1')
#显示回归结果
Model.summary()