使用中的行数已更改:删除缺失值?

问题描述 投票:0回答:1

我一直在尝试用 R 对我的变量进行逐步选择。这是我的代码:

library(lattice)#to get the matrix plot, assuming this package is already installed
library(ftsa) #to get the out-of sample performance metrics, assuming this package is already installed
library(car) 

mydata=read.csv("C:/Users/jgozal1/Desktop/Multivariate Project/Raw data/FINAL_alldata_norowsunder90_subgroups.csv")

names(mydata)
str(mydata)

mydata$country_name=NULL
mydata$country_code=NULL
mydata$year=NULL
mydata$Unemployment.female....of.female.labor.force...modeled.ILO.estimate.=NULL
mydata$Unemployment.male....of.male.labor.force...modeled.ILO.estimate.=NULL
mydata$Life.expectancy.at.birth.male..years.= NULL
mydata$Life.expectancy.at.birth.female..years. = NULL

str(mydata)

Full_model=lm(mydata$Fertility.rate.total..births.per.woman. + mydata$Immunization.DPT....of.children.ages.12.23.months. + mydata$Immunization.measles....of.children.ages.12.23.months. + mydata$Life.expectancy.at.birth.total..years. + mydata$Mortality.rate.under.5..per.1000.live.births. + mydata$Improved.sanitation.facilities....of.population.with.access. ~ mydata$Primary.completion.rate.female....of.relevant.age.group. + mydata$School.enrollment.primary....gross. + mydata$School.enrollment.secondary....gross. + mydata$School.enrollment.tertiary....gross. + mydata$Internet.users..per.100.people. + mydata$Primary.completion.rate.male....of.relevant.age.group. + mydata$Mobile.cellular.subscriptions..per.100.people. + mydata$Foreign.direct.investment.net.inflows..BoP.current.US.. + mydata$Unemployment.total....of.total.labor.force...modeled.ILO.estimate., data= mydata)

summary(Full_model) #this provides the summary of the model

Reduced_model=lm(mydata$Fertility.rate.total..births.per.woman. + mydata$Immunization.DPT....of.children.ages.12.23.months. + mydata$Immunization.measles....of.children.ages.12.23.months. + mydata$Life.expectancy.at.birth.total..years. + mydata$Mortality.rate.under.5..per.1000.live.births. + mydata$Improved.sanitation.facilities....of.population.with.access. ~1,data= mydata)

step(Reduced_model,scope=list(lower=Reduced_model, upper=Full_model), direction="forward", data=mydata)

step(Full_model, direction="backward", data=mydata)

step(Reduced_model,scope=list(lower=Reduced_model, upper=Full_model), direction="both", data=mydata)

这是我正在使用的数据集的链接:http://speedy.sh/YNXxj/FINAL-alldata-norowsunder90-subgroups.csv

为我的逐步设置范围后,我收到此错误:

步骤错误(Reduced_model,scope = list(lower = Reduced_model,upper = Full_model),: 使用中的行数已更改:删除缺失值? 此外: 警告信息: 1: 在 add1.lm(fit, scope$add, scale = scale, trace = trace, k = k, : 使用组合拟合中的 548/734 行 2: 在 add1.lm(fit, scope$add, scale = scale, trace = trace, k = k, : 使用组合拟合中的 548/734 行

我看过其他有同样错误的帖子,解决方案通常是从使用的数据中省略 NA,但这并没有解决我的问题,我仍然得到完全相同的错误。

r linear-regression
1个回答
1
投票

对我有用的是在简化模型的

data
参数中使用MrFlick的建议,即:

model_reduced <- lm(y ~., data = na.omit(data_subset))

它也给了我比我将整个模型封闭在

na.omit()
中更多的观察结果。

© www.soinside.com 2019 - 2024. All rights reserved.