我正在处理面板数据。我们在 2019 年和 2020 年对儿童进行了评估。因此,我有两个数据集(2019 年和 2020 年),我想创建与第二个数据集(2020 年)中的数据相匹配的第三个数据集,该数据集与第一个数据集(2019 年)的特征相匹配。第三个数据集的参与者较少,但他们将具有与 2019 年“同龄人”相同的特征。因此,男孩和女孩的比例将与 2019 年大致相同,母亲的年龄将大致相同,等等。
代码:
df_2019 = structure(list(asqse_quest = c(24, 24, 24, 24, 24, 24, 24, 24,
24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24,
24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24,
24, 24, 24, 24, 24, 24, 24, 24, 24, 24), year_completed_cat = structure(c(2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L), levels = c("18", "19", "20", "21", "22", "23", "24"), class = "factor"),
sex_male = c(1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 0, 0,
1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1,
0, 1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 0), momage = c(36,
39, 22, 20, 29, 40, 31, 37, 29, 38, 24, 35, 32, 30, 32, 31,
29, 21, 28, 29, 40, 21, 38, 29, 28, 33, 25, 25, 30, 29, 25,
27, 28, 31, 24, 28, 35, 29, 17, 35, 32, 29, 27, 24, 29, 25,
28, 24, 21, 26), momed = c(4, 4, 2, 2, 4, 3, 2, 3, 2, 4,
3, 4, 4, 4, 4, 4, 3, 4, 3, 4, 4, 2, 2, 4, 4, 4, 4, 4, 4,
4, 2, 4, 3, 3, 3, 3, 4, 4, 2, 4, 4, 3, 2, 2, 3, 4, 4, 3,
2, 4), income = c(4, 4, 2, 3, 4, 1, 2, 5, 4, 4, 5, 4, 4,
4, 4, 4, 4, 2, 3, 3, 4, 2, 3, 4, 4, 4, 5, 4, 3, 3, 4, 4,
3, 4, 1, 4, 2, 4, 3, 4, 4, 3, 4, 3, 4, 4, 4, 3, 4, 4)), class = "data.frame", row.names = c(NA,
-50L))
df_2020 = structure(list(asqse_quest = c(24, 24, 24, 24, 24, 24, 24, 24,
24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24,
24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24,
24, 24, 24, 24, 24, 24, 24, 24, 24, 24), year_completed_cat = structure(c(3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L), levels = c("18", "19", "20", "21", "22", "23", "24"), class = "factor"),
sex_male = c(1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1,
0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 1, 0, 1, 1, 1, 0, 1, 0,
1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1), momage = c(23,
26, 33, 34, 29, 26, 23, 29, 40, 36, 33, 18, 31, 31, 31, 32,
34, 35, 29, 37, 19, 30, 33, 25, 32, 35, 37, 27, 23, 29, 28,
26, 30, 27, 38, 28, 29, 39, 26, 25, 29, 39, 35, 32, 20, 38,
31, 27, 28, 23), momed = c(2, 4, 4, 3, 4, 3, 2, 2, 3, 4,
1, 2, 2, 4, 4, 4, 4, 2, 4, 4, 2, 4, 4, 4, 2, 4, 4, 2, 4,
2, 1, 4, 3, 2, 4, 4, 4, 2, 4, 2, 4, 4, 4, 4, 2, 4, 4, 4,
4, 1), income = c(2, 4, 4, 4, 4, 5, 3, 2, 2, 4, 1, 3, 4,
5, 1, 4, 3, 1, 4, 5, 5, 4, 4, 4, 3, 4, 4, 2, 4, 5, 1, 4,
4, 1, 4, 4, 4, 4, 3, 4, 4, 4, 5, 4, 2, 4, 4, 4, 4, 4)), class = "data.frame", row.names = c(NA,
-50L))
创建于 2024-07-12,使用 reprex v2.1.0
您可以尝试MatchIt包,它有一个执行倾向得分匹配的功能。
我们首先将两个数据集与
bind_rows
合并,分配一个id来区分两个数据集:
data <- bind_rows(df_2019, df_2020, .id="year") |>
mutate(year=+(year==1)) # 1=2019 (cases), 0=2020 (controls)
对应于year==1的行是您的案例(来自2019年的数据),year==0对应于您的控件(来自2020年的数据)。
为了找到尽可能与情况匹配的控件,我们可以使用
matchit
函数。有很多参数,为了简洁起见,我们将仅使用默认值。
图书馆(MatchIt)
我们首先尝试精确匹配完成年份、性别和母亲的年龄,看看是否有运气。
match_obj <- matchit(year ~ asqse_quest+year_completed_cat+sex_male+momage+momed+income,
data = data,
exact= ~ year_completed_cat+sex_male+momage,
replace = FALSE)
#Error in `matchit()`:
#! No matches were found.
这并不奇怪,因为这两个数据集在完成年份上根本不匹配。 让我们的匹配条件不那么严格吧
match_obj <- matchit(year ~ asqse_quest+year_completed_cat+sex_male+momage+momed+income,
data = data,
exact= ~ sex_male+momage,
replace = FALSE)
这次没有错误,但我们收到警告
#Warning message:
#Fewer control units than treated units in some `exact` strata; not all treated units will get a match.
没关系。现在总结一下结果。
summary(match_obj)
...
Sample Sizes:
Control Treated
All 50 50
Matched 25 25
Unmatched 25 25
Discarded 0 0
输出表明我们从原始的 50 个控件中找到了 25 个控件。还给出了其他有用的信息,但为了简单起见,我在这里省略了。现在使用
match.data
获取匹配项以及原始案例。
matched_data <- match.data(match_obj)
现在我们只需过滤掉案例,剩下匹配的控件:
df_2020_new <- filter(matched_data, year==0)
head(df_2020_new)
asqse_quest year_completed_cat sex_male momage momed income
1 24 20 1 23 2 2
2 24 20 1 26 4 4
3 24 20 1 33 4 4
4 24 20 1 34 3 4
5 24 20 0 29 4 4
6 24 20 1 26 3 5
7 24 20 0 23 2 3
8 24 20 1 29 2 2
9 24 20 0 40 3 2
10 24 20 1 36 4 4
查看
matchit
的帮助页面,了解如何修改匹配方法。这里要介绍的细节太多,但这是基本思想。