我有一个单词列表,我需要重复生成所有排列。必须指定排列长度。单词列表相当大(即30个单词)所以我需要的功能也是有效的。例如:
wordsList = c("alice", "moon", "walks", "mars", "sings", "guitar", "bravo")
我需要生成所有排列,因为每个排列必须正好有3个单词。这将是["alice", "moon", "walks"]
,["alice", "walks", "moon"]
,["moon", "alice", "walks"]
等
有几个包可以完全满足您的需求。让我们从经典的gtools
开始吧。此外,从OP提供的示例的外观来看,我们正在寻找不重复的排列,而不是重复的组合。
wordsList <- c("alice", "moon", "walks", "mars", "sings", "guitar", "bravo")
library(gtools)
attempt1 <- permutations(length(wordsList), 3, wordsList)
head(attempt1)
[,1] [,2] [,3]
[1,] "alice" "bravo" "guitar"
[2,] "alice" "bravo" "mars"
[3,] "alice" "bravo" "moon"
[4,] "alice" "bravo" "sings"
[5,] "alice" "bravo" "walks"
[6,] "alice" "guitar" "bravo"
然后有iterpc
。
library(iterpc)
attempt2 <- getall(iterpc(length(wordsList), 3, labels = wordsList, ordered = TRUE))
head(attempt2)
[,1] [,2] [,3]
[1,] "alice" "moon" "walks"
[2,] "alice" "moon" "mars"
[3,] "alice" "moon" "sings"
[4,] "alice" "moon" "guitar"
[5,] "alice" "moon" "bravo"
[6,] "alice" "walks" "moon"
最后,RcppAlgos
(我是其作者)
library(RcppAlgos)
attempt3 <- permuteGeneral(wordsList, 3)
head(attempt3)
[,1] [,2] [,3]
[1,] "alice" "bravo" "guitar"
[2,] "bravo" "alice" "guitar"
[3,] "guitar" "alice" "bravo"
[4,] "alice" "guitar" "bravo"
[5,] "bravo" "guitar" "alice"
[6,] "guitar" "bravo" "alice"
它们都相当有效并产生类似的结果(不同的排序)
identical(attempt1[do.call(order,as.data.frame(attempt1)),],
attempt3[do.call(order,as.data.frame(attempt3)),])
[1] TRUE
identical(attempt1[do.call(order,as.data.frame(attempt1)),],
attempt2[do.call(order,as.data.frame(attempt2)),])
[1] TRUE
如果你真的想要重复排列,每个函数都提供了执行该函数的参数。
由于OP正在使用超过3000字的wordsList
并且正在寻找一次选择15个的所有排列,因此上述方法将失败。有一些替代品,来自iterpc
以及RcppAlgos
。
使用iterpc
,您可以使用函数getnext
并生成连续的排列。我怀疑你能够在合理的时间内生成它们或将它们存储在一个位置(即假设每个单元占用8个字节,10^52 * 15 * 8/(2^80) > 10^29 YB
是的......那些是yobibytes ...解释:“很多数据” )。
使用RcppAlgos
,您可以利用rowCap
参数输出特定数量的排列,直到2^31 - 1
。例如。:
permuteGeneral(wordsList, 3, upper = 5)
[,1] [,2] [,3]
[1,] "alice" "bravo" "guitar"
[2,] "bravo" "alice" "guitar"
[3,] "guitar" "alice" "bravo"
[4,] "alice" "guitar" "bravo"
[5,] "bravo" "guitar" "alice"
您可以使用combn
包中的utils
函数。
wordsList = c("alice", "moon", "walks", "mars", "sings", "guitar", "bravo")
combn(wordsList, 3)
这给出了很长的输出,我不想在这里重现。您也可以将输入作为一个因素,这可能有助于提高速度。
为了真正产生重复的组合,约瑟夫伍德的解决方案是关于排列而不重复。 (编辑:虽然OP写的重复组合,他可能意味着排列!?看评论)
library(iterpc)
wordsList = c("alice", "moon", "walks", "mars", "sings", "guitar", "bravo")
getall(iterpc(length(wordsList), 3, labels = wordsList, replace = TRUE))
#> [,1] [,2] [,3]
#> [1,] "alice" "alice" "alice"
#> [2,] "alice" "alice" "moon"
#> [3,] "alice" "alice" "walks"
#> [4,] "alice" "alice" "mars"
#> [5,] "alice" "alice" "sings"
#> [6,] "alice" "alice" "guitar"
#> [7,] "alice" "alice" "bravo"
#> [8,] "alice" "moon" "moon"
#> [9,] "alice" "moon" "walks"
..
..