我的数据如下所示:
Tab4 <- read.table(text = "
nodepair `++` `--` `+-` `-+` `0+` `+0` `0-` `-0` `00` ES
1 A1_A1 0 4 0 0 0 0 0 0 16 3
2 A1_A1 0 5 0 0 0 0 0 0 16 4
3 A1_A1 0 5 0 0 0 0 0 0 15 5
", header = TRUE)
我已经编写了这段代码,以便每个组“ES”通过节点对进行成对比较:
ES_combs <- combn(unique(Tab4$ES), 2, simplify = FALSE)
Tab5 <- Tab4 %>% ########### compare every pair to eachother
group_split(nodepair) %>%
map(.f = function(df) df %>%
map(.x = 1:length(ES_combs),
.f = ~df %>%
filter(ES %in% ES_combs[[.x]]) %>%
summarize(nodepair = first(nodepair),
ES_1 = ES[1],
ES_2 = ES[2],
across(2:10, ~as.numeric(.))))) %>%
bind_rows()
结果是:
Tab5 <- read.table(text = "
nodepair ES_1 ES_2 `++` `--` `+-` `-+` `0+` `+0` `0-` `-0` `00`
1 A1_A1 3 4 0 4 0 0 0 0 0 0 16
2 A1_A1 3 4 0 5 0 0 0 0 0 0 16
3 A1_A1 3 5 0 4 0 0 0 0 0 0 16
4 A1_A1 3 5 0 5 0 0 0 0 0 0 15
5 A1_A1 4 5 0 5 0 0 0 0 0 0 16
6 A1_A1 4 5 0 5 0 0 0 0 0 0 15
", header = TRUE)
这可行,但当我比较完整数据集时需要很长时间。我希望有更有效的代码?我怀疑我收到的这个警告暴露了部分问题:
Warning messages:
1: Returning more (or less) than 1 row per `summarise()` group was deprecated in dplyr 1.1.0.
ℹ Please use `reframe()` instead.
ℹ When switching from `summarise()` to `reframe()`, remember that `reframe()` always returns an ungrouped
data frame and adjust accordingly.
但我不知道从这里该去哪里。
我们可以进行内部联接并删除重复项:
out <- merge(Tab4,Tab4[,c('nodepair','ES')],by='nodepair',suffixes=c("1","2"),all=T)
out[out$ES1!=out$ES2,]
nodepair X.... X.....1 X.....2 X.....3 X.0.. X..0. X.0...1 X..0..1 X.00. ES1 ES2
2 A1_A1 0 4 0 0 0 0 0 0 16 3 4
3 A1_A1 0 4 0 0 0 0 0 0 16 3 5
4 A1_A1 0 5 0 0 0 0 0 0 16 4 3
6 A1_A1 0 5 0 0 0 0 0 0 16 4 5
7 A1_A1 0 5 0 0 0 0 0 0 15 5 3
8 A1_A1 0 5 0 0 0 0 0 0 15 5 4