在R中重复生成特定长度的排列?

问题描述 投票:1回答:3

我有一个单词列表,我需要重复生成所有排列。必须指定排列长度。单词列表相当大(即30个单词)所以我需要的功能也是有效的。例如:

wordsList = c("alice", "moon", "walks", "mars", "sings", "guitar", "bravo")

我需要生成所有排列,因为每个排列必须正好有3个单词。这将是["alice", "moon", "walks"]["alice", "walks", "moon"]["moon", "alice", "walks"]

r permutation
3个回答
2
投票

有几个包可以完全满足您的需求。让我们从经典的gtools开始吧。此外,从OP提供的示例的外观来看,我们正在寻找不重复的排列,而不是重复的组合。

wordsList <- c("alice", "moon", "walks", "mars", "sings", "guitar", "bravo")

library(gtools)
attempt1 <- permutations(length(wordsList), 3, wordsList)
head(attempt1)
        [,1]    [,2]     [,3]    
[1,] "alice" "bravo"  "guitar"
[2,] "alice" "bravo"  "mars"  
[3,] "alice" "bravo"  "moon"  
[4,] "alice" "bravo"  "sings" 
[5,] "alice" "bravo"  "walks" 
[6,] "alice" "guitar" "bravo"

然后有iterpc

library(iterpc)
attempt2 <- getall(iterpc(length(wordsList), 3, labels = wordsList, ordered = TRUE))
head(attempt2)
        [,1]    [,2]    [,3]    
[1,] "alice" "moon"  "walks" 
[2,] "alice" "moon"  "mars"  
[3,] "alice" "moon"  "sings" 
[4,] "alice" "moon"  "guitar"
[5,] "alice" "moon"  "bravo" 
[6,] "alice" "walks" "moon"

最后,RcppAlgos(我是其作者)

library(RcppAlgos)
attempt3 <- permuteGeneral(wordsList, 3)
head(attempt3)
        [,1]     [,2]     [,3]    
[1,] "alice"  "bravo"  "guitar"
[2,] "bravo"  "alice"  "guitar"
[3,] "guitar" "alice"  "bravo" 
[4,] "alice"  "guitar" "bravo" 
[5,] "bravo"  "guitar" "alice" 
[6,] "guitar" "bravo"  "alice"

它们都相当有效并产生类似的结果(不同的排序)

identical(attempt1[do.call(order,as.data.frame(attempt1)),],
          attempt3[do.call(order,as.data.frame(attempt3)),])
[1] TRUE

identical(attempt1[do.call(order,as.data.frame(attempt1)),],
          attempt2[do.call(order,as.data.frame(attempt2)),])
[1] TRUE

如果你真的想要重复排列,每个函数都提供了执行该函数的参数。

由于OP正在使用超过3000字的wordsList并且正在寻找一次选择15个的所有排列,因此上述方法将失败。有一些替代品,来自iterpc以及RcppAlgos

使用iterpc,您可以使用函数getnext并生成连续的排列。我怀疑你能够在合理的时间内生成它们或将它们存储在一个位置(即假设每个单元占用8个字节,10^52 * 15 * 8/(2^80) > 10^29 YB是的......那些是yobibytes ...解释:“很多数据” )。

使用RcppAlgos,您可以利用rowCap参数输出特定数量的排列,直到2^31 - 1。例如。:

permuteGeneral(wordsList, 3, upper = 5)
        [,1]     [,2]     [,3]    
[1,] "alice"  "bravo"  "guitar"
[2,] "bravo"  "alice"  "guitar"
[3,] "guitar" "alice"  "bravo" 
[4,] "alice"  "guitar" "bravo" 
[5,] "bravo"  "guitar" "alice"

0
投票

您可以使用combn包中的utils函数。

wordsList = c("alice", "moon", "walks", "mars", "sings", "guitar", "bravo")
combn(wordsList, 3)

这给出了很长的输出,我不想在这里重现。您也可以将输入作为一个因素,这可能有助于提高速度。


0
投票

为了真正产生重复的组合,约瑟夫伍德的解决方案是关于排列而不重复。 (编辑:虽然OP写的重复组合,他可能意味着排列!?看评论)

library(iterpc)
wordsList = c("alice", "moon", "walks", "mars", "sings", "guitar", "bravo")
getall(iterpc(length(wordsList), 3, labels = wordsList, replace = TRUE))
#>       [,1]     [,2]     [,3]    
#>  [1,] "alice"  "alice"  "alice" 
#>  [2,] "alice"  "alice"  "moon"  
#>  [3,] "alice"  "alice"  "walks" 
#>  [4,] "alice"  "alice"  "mars"  
#>  [5,] "alice"  "alice"  "sings" 
#>  [6,] "alice"  "alice"  "guitar"
#>  [7,] "alice"  "alice"  "bravo" 
#>  [8,] "alice"  "moon"   "moon"  
#>  [9,] "alice"  "moon"   "walks" 
..
..
© www.soinside.com 2019 - 2024. All rights reserved.