从头开始快速成对最长公共子串

问题描述 投票:0回答:1

假设有很多二进制字符串

x <- c("0100100010101010", "0100110010101010","0111001000","010111")

我正在 R 中寻找一种快速方法来输出一个矩阵,该矩阵包含从一开始的成对(不包括自匹配)最长公共子串。例如,解决方案可能如下所示

> mySolution(x)
    [,1]     [,2]      [,3]      [,4]
[1,] ""       "01001"   "01"     "010"
[2,] "01001"  ""        "01"     "010"
[3,] "01"     "01"      ""       "01"
[4,] "010"    "010"     "01"     ""

例如,位置 [1,2] 处的矩阵是从

x[1]
x[2]
开始的最长公共子串。请注意,所得矩阵是对称的。所以,我们只需要计算一半即可。

我知道这可以与像

substr,sub,grepl
这样的函数一起使用,但由于我有很多字符串,我正在寻找一种非常有效的解决方案,需要很少的计算时间。也许可以选择转换为二进制数以提高性能?

r string performance
1个回答
0
投票
f <- function(x) {
    helper <- function(a, b) {
        l <- min(nchar(a), nchar(b))
        aa <- utf8ToInt(a)[1:l]
        bb <- utf8ToInt(b)[1:l]
        intToUtf8(aa[cumsum(aa != bb) == 0])
    }
    `diag<-`(outer(x, x, Vectorize(helper)), "")
}

这样

> f(x)
     [,1]    [,2]    [,3] [,4] 
[1,] ""      "01001" "01" "010"
[2,] "01001" ""      "01" "010"
[3,] "01"    "01"    ""   "01"
[4,] "010"   "010"   "01" ""
© www.soinside.com 2019 - 2024. All rights reserved.