我有一个表tbl1
,其中两列col1
和col2
包含字符串:
col1 | col2
--------+--------
bar | foo
foo | foobar
bar1foo | bar2foo
对应的SQL转储:
CREATE TABLE `tbl1` (
`col1` varchar(20) COLLATE latin1_general_ci NOT NULL,
`col2` varchar(20) COLLATE latin1_general_ci NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1 COLLATE=latin1_general_ci;
INSERT INTO `tbl1` (`col1`, `col2`) VALUES
('bar', 'foo'),
('foo', 'foobar'),
('bar1foo', 'bar2foo');
在大多数情况下,条目的字符串共享一个公共前缀。我需要一个删除那些常用前缀的查询。预期结果:
bar | foo
| bar
1foo | 2foo
到目前为止,我的方法:
SELECT
SUBSTR(`col1`, 1+GREATEST(LENGTH(`col1`), LENGTH(`col2`)) - CEIL(LENGTH(TRIM(TRAILING '0' FROM HEX(ABS(CONV(HEX(REVERSE(`col1`)),16,10) - CONV(HEX(REVERSE(`col2`)),16,10)))))/2)),
SUBSTR(`col2`, 1+GREATEST(LENGTH(`col1`), LENGTH(`col2`)) - CEIL(LENGTH(TRIM(TRAILING '0' FROM HEX(ABS(CONV(HEX(REVERSE(`col1`)),16,10) - CONV(HEX(REVERSE(`col2`)),16,10)))))/2))
FROM tbl1
简短说明:字符串反转(REVERSE
),转换为整数(HEX
和CONV
),彼此相减(-
和ABS
),转换为十六进制表示形式[HEX
),0
从结尾处修剪(TRIM
),此结果的长度从最长字符串(-
,LENGTH
和GREATEST
)的长度中减去,然后由[ C0]以获得结果。
我的方法有问题:
此代码有效,尽管它很冗长且丑陋和(也许)性能不佳:
SUBSTR
请参见select
substring(t.col1, g.maxlen + 1) col1,
substring(t.col2, g.maxlen + 1) col2
from tbl1 t inner join (
select t.col1, t.col2,
max(case when left(col1, tt.n) = left(col2, tt.n) then tt.n else 0 end) maxlen
from tbl1 t inner join (
select 1 n union all select 2 union all select 3 union all select 4 union all
select 5 union all select 6 union all select 7 union all select 8 union all
select 9 union all select 10 union all select 11 union all select 12 union all
select 13 union all select 14 union all select 15 union all select 16 union all
select 17 union all select 18 union all select 19 union all select 20
) tt on least(length(t.col1), length(t.col2)) >= tt.n
group by t.col1, t.col2
) g on g.col1 = t.col1 and g.col2 = t.col2
。结果:
demo
可悲的是,最通用和最有效的方法可能是巨大的| col1 | col2 |
| ---- | ---- |
| | bar |
| bar | foo |
| 1foo | 2foo |
表达式。但是,这只能在一定长度下起作用:
case