根据第二列数据的条件生成组合列表

Question

我有一个简单的数据框，其中包含两列：参与者 ID（字符变量）和参与者分数。我需要生成所有可能的五个参与者集合，但是他们的分数总和不能超过 15。我了解如何使用 gtools 的组合和排列函数来组合五个参与者 ID（无需在组内重复相同的 ID），但我坚持如何将这些 ID 与其分数保持联系，以便我可以计算每个组合的分数总和，然后筛选出符合条件的组合。有任何想法吗？我需要输出看起来像这样：第一组：A、B、C、I、K（总分=14.5）第 2 组：A、C、I、L、M（总分 = 14） ...

id <- c("A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M")
score <- c(4.5, 5, 3, 3.5, 4, 4, 3.5, 5, 2, 1, 0, 1.5, 3)
dat1 <- data.frame(id, score)

# permutations 
library(gtools)
N <- length(dat1$id) #size of sampling vector
n <- 5 #size of samples

x = permutations(n=N, r=n, v=id, repeats.allowed=F)
# n = size of sampling vector 
# r = size of samples 
# v = vector to sample from

Answer 1

一个解决方案是将其转换为数据框，然后使用 {tidyr} 对数据进行透视，将其连接到您的值数据框，按排列分组，将分数相加，然后将其旋转回来：

colnames(x) = 1:5
x |> 
  as_tibble() |> 
  mutate(id = row_number()) |> 
  pivot_longer(-id) |> 
  left_join(dat1, by = c("value" = "id")) |> 
  group_by(id) |> 
  mutate(score = sum(score)) |> 
  filter(score <= 15) |> 
  pivot_wider(values_from = value, names_from = name)

退货

# A tibble: 73,440 × 7
# Groups:   id [73,440]
      id score `1`   `2`   `3`   `4`   `5`  
   <int> <dbl> <chr> <chr> <chr> <chr> <chr>
 1    52  14.5 A     B     C     I     K    
 2    61  13.5 A     B     C     J     K    
 3    62  15   A     B     C     J     L    
 4    69  14.5 A     B     C     K     I    
 5    70  13.5 A     B     C     K     J

Answer 2

您可以使用

combn()

获取唯一的 grps，然后使用

apply()

按行获取组分数，利用

scores

的命名版本。

# name the elements of the scores vector
names(scores) <- id

# generate a data frame with the unique combinations
grp_scores <- data.frame(t(combn(id,5)))

# assign the scores by summing the elements of the
grp_scores['score'] <- apply(grp_scores, 1, \(x) sum(score[x]))

# reduce to the rows where the sum is less than or equal to 15
grp_scores <- grp_scores[grp_scores$score<=15,]

根据第二列数据的条件生成组合列表

问题描述投票：0回答：2

2个回答

最新问题

根据第二列数据的条件生成组合列表

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2