如何查找字符串中的重复字符?

问题描述 投票:0回答:1

我正在尝试查找字符串中的任何字符在 clickhouse 云中是否重复五次或更多次。例子:

'12344444156'
'abcrrrrrggds'

我知道一般情况下有效的正则表达式:

.*(.)\1{4,}.*

但是 clickhouse 使用的是 RE2 引擎,不支持反向引用。我还能怎么做?

我尝试过:

WITH '12344444156' as str
SELECT str, extract(str, '.*(.)\\1{4,}.*');

预期输出:

12344444156

得到:

SQL Error [427] [07000]: Code: 427. DB::Exception: OptimizedRegularExpression: cannot compile re2: .*(.)\1{4,}.*, error: invalid escape sequence: \1. Look at https://github.com/google/re2/wiki/Syntax for reference. Please note that if you specify regex as an SQL string literal, the slashes have to be additionally escaped. For example, to match an opening brace, write '\(' -- the first slash is for SQL and the second one is for regex: While processing '12344444156' AS str, extract(str, '.*(.)\\1{4,}.*'). (CANNOT_COMPILE_REGEXP) (version 24.6.1.4410 (official build))
, server ClickHouseNode [uri=https://w2z74jyoma.ap-southeast-2.aws.clickhouse.cloud:8443/default, options={use_server_time_zone=false,use_time_zone=false}]@248459710
sql regex clickhouse
1个回答
0
投票

由于ClickHouse的RE2引擎不支持反向引用,所以使用extractAll来查找重复字符

与‘12344444156’AS str SELECT arrayExists(x -> length(x) >= 5, extractAll(str, ‘(.)\1*’) ) AS 有_重复_字符;

这会检查任何字符是否重复五次或更多次

© www.soinside.com 2019 - 2024. All rights reserved.