我正在运行双向方差分析来比较两个变量如何影响服务价格。然而,当我运行事后成对比较时,我只想看到可比较的行(即当一个自变量匹配时)。这是一个显示问题的示例:
变量 1 - 种类:金枪鱼、大比目鱼、鲑鱼、比目鱼
变量 2 - 性别:男、女
因变量:质量
这是我用于双向方差分析和成对比较的内容:
> dat <- read.csv(<file location>)
> aov2 <- aov(mass ~ species * sex, data = dat)
> TukeyHSD(aov2, which = "species:sex")
当我使用 TukeyHSD 进行质量的成对比较时,我只想看到金枪鱼:雄性 - 金枪鱼:雌性或大比目鱼:雄性 - 比目鱼:雄性,但我不想看到比目鱼:雌性 - 大比目鱼:男性。基本上,我只是想看看其中一个变量具有相同值的比较。
老实说我什至不确定要尝试什么,或者这是否可能。我有一些 NA 值用于一些比较,我尝试使用:
TukeyHSD(aov2, which = "species:sex") %>% filter(is.na(diff))
但我得到以下信息:
Error in UseMethod("filter") :
no applicable method for 'filter' applied to an object of class "c('TukeyHSD', 'multicomp')"
如果这是不可能的,是否有办法在数据帧进入单向方差分析之前过滤数据帧中的数据,例如通过其中一列中的某个值?然后我可以尝试运行多个单向方差分析来实现类似的目标。我宁愿不必创建单独的数据文件,因为我正在使用的实际数据集非常大,并且手动分离会非常耗时。
当前的
TukeyHSD
功能无法实现这一点。但是,通过对此函数进行一些修改,您可以获得所需的输出。下面是修改后的函数,它添加了一个名为“match”的新参数,默认值为 FALSE:
TukeyHSD2 <- function (x, which = seq_along(tabs), ordered = FALSE, conf.level = 0.95,
match=FALSE, ...)
{
mm <- model.tables(x, "means")
if (is.null(mm$n))
stop("no factors in the fitted model")
tabs <- mm$tables
if (names(tabs)[1L] == "Grand mean")
tabs <- tabs[-1L]
tabs <- tabs[which]
nn <- mm$n[names(tabs)]
nn_na <- is.na(nn)
if (all(nn_na))
stop("'which' specified no factors")
if (any(nn_na)) {
warning("'which' specified some non-factors which will be dropped")
tabs <- tabs[!nn_na]
nn <- nn[!nn_na]
}
out <- setNames(vector("list", length(tabs)), names(tabs))
MSE <- sum(x$residuals^2)/x$df.residual
for (nm in names(tabs)) {
tab <- tabs[[nm]]
means <- as.vector(tab)
nms <- if (length(dim(tab)) > 1L) {
dn <- dimnames(tab)
apply(do.call("expand.grid", dn), 1L, paste, collapse = ":")
}
else names(tab)
n <- nn[[nm]]
if (length(n) < length(means))
n <- rep.int(n, length(means))
if (as.logical(ordered)) {
ord <- order(means)
means <- means[ord]
n <- n[ord]
if (!is.null(nms))
nms <- nms[ord]
}
center <- outer(means, means, `-`)
keep <- lower.tri(center)
dnames <- list(outer(nms, nms, paste, sep = "-"), c("diff", "lwr", "upr", "p adj"))
dk1 <- dnames[[1L]][keep]
if(match) {
keep2 <- sapply(1:length(dk), function(x) {
pattern <- '(.+):(.+)-(.+):(.+)'
d1 <- sub(pattern, '\\1', dk1[x])
d2 <- sub(pattern, '\\2', dk1[x])
d3 <- sub(pattern, '\\3', dk1[x])
d4 <- sub(pattern, '\\4', dk1[x])
d1==d3 | d2==d4 } )
dnames[[1L]] <- dk1[keep2]
} else dnames[[1L]] <- dk1
center <- center[keep]
width <- qtukey(conf.level, length(means), x$df.residual) *
sqrt((MSE/2) * outer(1/n, 1/n, `+`))[keep]
est <- center/(sqrt((MSE/2) * outer(1/n, 1/n, `+`))[keep])
if(match) {
center <- center[keep2]
width <- width[keep2]
est <- est[keep2]
}
pvals <- ptukey(abs(est), length(means), x$df.residual,
lower.tail = FALSE)
out[[nm]] <- array(c(center, center - width, center +
width, pvals), c(length(width), 4L), dnames)
}
class(out) <- c("TukeyHSD", "multicomp")
attr(out, "orig.call") <- x$call
attr(out, "conf.level") <- conf.level
attr(out, "ordered") <- ordered
out
}
将上述函数发送到您的 R 控制台,然后在您的数据上进行测试。 这是使用“warpbreaks”示例数据集的输出。
fm1 <- aov(breaks ~ wool * tension, data = warpbreaks)
> TukeyHSD(fm1, which="wool:tension")
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = breaks ~ wool * tension, data = warpbreaks)
$`wool:tension`
diff lwr upr p adj
B:L-A:L -16.3333333 -31.63966 -1.027012 0.0302143
A:M-A:L -20.5555556 -35.86188 -5.249234 0.0029580
B:M-A:L -15.7777778 -31.08410 -0.471456 0.0398172
A:H-A:L -20.0000000 -35.30632 -4.693678 0.0040955
B:H-A:L -25.7777778 -41.08410 -10.471456 0.0001136
A:M-B:L -4.2222222 -19.52854 11.084100 0.9626541
B:M-B:L 0.5555556 -14.75077 15.861877 0.9999978
A:H-B:L -3.6666667 -18.97299 11.639655 0.9797123
B:H-B:L -9.4444444 -24.75077 5.861877 0.4560950
B:M-A:M 4.7777778 -10.52854 20.084100 0.9377205
A:H-A:M 0.5555556 -14.75077 15.861877 0.9999978
B:H-A:M -5.2222222 -20.52854 10.084100 0.9114780
A:H-B:M -4.2222222 -19.52854 11.084100 0.9626541
B:H-B:M -10.0000000 -25.30632 5.306322 0.3918767
B:H-A:H -5.7777778 -21.08410 9.528544 0.8705572
这是修改版本的输出,其中“match”设置为 TRUE。
> TukeyHSD2(fm1, which="wool:tension", match=TRUE)
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = breaks ~ wool * tension, data = warpbreaks)
$`wool:tension`
diff lwr upr p adj
B:L-A:L -16.3333333 -31.63966 -1.027012 0.0302143
A:M-A:L -20.5555556 -35.86188 -5.249234 0.0029580
A:H-A:L -20.0000000 -35.30632 -4.693678 0.0040955
B:M-B:L 0.5555556 -14.75077 15.861877 0.9999978
B:H-B:L -9.4444444 -24.75077 5.861877 0.4560950
B:M-A:M 4.7777778 -10.52854 20.084100 0.9377205
A:H-A:M 0.5555556 -14.75077 15.861877 0.9999978
B:H-B:M -10.0000000 -25.30632 5.306322 0.3918767
B:H-A:H -5.7777778 -21.08410 9.528544 0.8705572