如何在js中高效地搜索大型二进制字符串中的任何字符串数组？

Question

我从一个包含字符串的文件加载了一些二进制数据，但与二进制数据混合并创建了它的 DataView。

我想找出字符串数组中的哪些（如果有）存在于该数据的特定范围内，并将它们添加到数组中。

function findStrings(data, startIdx, endIdx, matchStrings) {
  let returnStrings = [];

  ...

  return returnStrings;
}

字符串可以被任何 8 位代码包围。

数据可能高达 1 Mb。 matchStrings 可能包含 10-100 个字符串。

使用正则表达式或多次搜索似乎效率很低，因此不知道如何处理它，特别是需要忽略非 ASCII 字符。

输入示例：

matchStrings = [
    'String1\x00',
    'String2\x00',
    'String\x00',
]

data = '\x00String\x00\x1b\x0c\x00String\x00String2\x00\x04'

输出示例：

[0, 1] or ['String', 'String2']

高效指的是时间和记忆力。因此，理想情况下，它不会制作不必要的数据副本，并且只会遍历缓冲区一次。因此，最小附加内存和时间与缓冲区的大小成线性比例，而不是与匹配字符串的数量成线性比例。

执行此操作的方法如下：

const matchStrings = [
    'String1\x00',
    'String2\x00',
    'String\x00',
];

function findStrings(buffer, startIdx, endIdx, matchStrings) {
    let returnStrings = [];

    const data = new TextDecoder("latin1").decode(buffer.slice(startIdx, endIdx));
    const re = new RegExp('('+matchStrings.join(')|(?:')+')', 'g');
    const matches = data.match(re);
    if(matches){
        for(let m of matches){
            if(!returnStrings.includes(m)){
                returnStrings.push(m);
            }
        }
    }
  
    return returnStrings;
  }

Answer 1

这是我现在使用的代码（来自我的问题）：

常量匹配字符串 = [ '字符串1\x00', '字符串2\x00', '字符串\x00', ];

函数 findStrings(buffer, startIdx, endIdx, matchStrings) { 让 returnStrings = [];

const data = new TextDecoder("latin1").decode(buffer.slice(startIdx, endIdx));
const re = new RegExp('('+matchStrings.join(')|(?:')+')', 'g');
const matches = data.match(re);
if(matches){
    for(let m of matches){
        if(!returnStrings.includes(m)){
            returnStrings.push(m);
        }
    }
}

return returnStrings;

}

如何在js中高效地搜索大型二进制字符串中的任何字符串数组？

问题描述投票：0回答：1

1个回答

最新问题

如何在js中高效地搜索大型二进制字符串中的任何字符串数组？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1