我有这种类型的数据:
iso3year UHC cata10
AFG 2010 0.3551409 NA
AFG 2011 0.3496452 NA
AFG 2012 0.3468012 NA
AFG 2013 0.3567721 14.631331
AFG 2014 0.3647436 NA
AFG 2015 0.3717983 NA
AFG 2016 0.3855273 4.837534
AFG 2017 0.3948606 NA
AGO 2011 0.3250651 12.379809
AGO 2012 0.3400455 NA
AGO 2013 0.3397722 NA
AGO 2014 0.3385741 NA
AGO 2015 0.3521086 16.902584
AGO 2016 0.3636765 NA
AGO 2017 0.3764945 NA
并且我想找到与cata10变量最接近的2012年和2017年(+ ou-2年,即2012年可能是2010年,2011年,2013年或2014年的数据)的值。输出应为:
iso3year_UHC UHC year_cata cata10
AFG 2012 0.3468012 2013 14.631331
AFG 2017 0.3948606 2016 4.837534
AGO 2012 0.3400455 2011 12.379809
AGO 2017 0.3764945 2015 16.902584
我已经从两天开始尝试使用命令提示音,但是找不到解决方法。您能否建议要尝试的命令类型?
非常感谢,
N。
使用最后在Note中可重复定义的DF
在同一个iso上执行自连接在不超过2年的时间内,并在cata10中找到最小的绝对差。
library(sqldf)
sqldf("select
substr(a.iso3year, 5, 8) year,
a.iso3year,
a.UHC,
substr(b.iso3year, 5, 8) year_cata,
b.cata10,
min(abs(a.cata10 - b.cata10)) 'min.cata10'
from DF a
left join DF b on year != year_cata and
abs(year - year_cata) <= 2 and
substr(a.iso3year, 1, 3) = substr(b.iso3year, 1, 3) and
b.cata10 is not null
group by a.iso3year
having year in ('2012', '2017')")[2:5]
给予:
iso3year UHC year_cata cata10
1 AFG 2012 0.3468012 2013 14.631331
2 AFG 2017 0.3948606 2016 4.837534
3 AGO 2012 0.3400455 2011 12.379809
4 AGO 2017 0.3764945 2015 16.902584
Lines <- "iso3year UHC cata10
AFG 2010 0.3551409 NA
AFG 2011 0.3496452 NA
AFG 2012 0.3468012 NA
AFG 2013 0.3567721 14.631331
AFG 2014 0.3647436 NA
AFG 2015 0.3717983 NA
AFG 2016 0.3855273 4.837534
AFG 2017 0.3948606 NA
AGO 2011 0.3250651 12.379809
AGO 2012 0.3400455 NA
AGO 2013 0.3397722 NA
AGO 2014 0.3385741 NA
AGO 2015 0.3521086 16.902584
AGO 2016 0.3636765 NA
AGO 2017 0.3764945 NA"
DF <- read.csv(text = gsub(" +", ",", Lines), as.is = TRUE)