这是搜索记录:
A = {
field1: value1,
field2: value2,
...
fieldN: valueN
}
我在数据库中有很多这样的记录。
如果这些记录中的偶数N-M字段相等,则其他记录(B)几乎匹配记录A.这是一个例子,M = 2:
B = {
field1: OTHER_value1,
field2: OTHER_value2,
field3: value3,
...
fieldN: valueN
}
如果可以是任何领域,不仅仅是第一个。
我可以进行非常大的组合SQL查询,但可能有更美观的解决方案。
P.S。:我的数据库是PostgreSQL。
这样的搜索条件将无法使用任何索引,但可以做到......
SELECT
*
FROM
yourTable
WHERE
N-M <= CASE WHEN yourTable.field1 = searchValue1 THEN 1 ELSE 0 END
+ CASE WHEN yourTable.field2 = searchValue2 THEN 1 ELSE 0 END
+ CASE WHEN yourTable.field3 = searchValue3 THEN 1 ELSE 0 END
...
+ CASE WHEN yourTable.fieldN = searchValueN THEN 1 ELSE 0 END
同样,如果您的搜索条件位于另一个表格中......
SELECT
*
FROM
yourTable
INNER JOIN
search
ON N-M <= CASE WHEN yourTable.field1 = search.field1 THEN 1 ELSE 0 END
+ CASE WHEN yourTable.field2 = search.field2 THEN 1 ELSE 0 END
+ CASE WHEN yourTable.field3 = search.field3 THEN 1 ELSE 0 END
...
+ CASE WHEN yourTable.fieldN = search.fieldN THEN 1 ELSE 0 END
(你需要填充N-M
yourself的值)
编辑:
一个更长的啰嗦方法,可以使用索引......
SELECT
id, -- your table would need to have a primary key / identity column
MAX(field1) AS field1,
MAX(field2) AS field2,
MAX(field3) AS field3,
...
MAX(fieldN) AS fieldN
FROM
(
SELECT * FROM yourTable WHERE field1 = searchValue1
UNION ALL
SELECT * FROM yourTable WHERE field2 = searchValue2
UNION ALL
SELECT * FROM yourTable WHERE field3 = searchValue3
...
SELECT * FROM yourTable WHERE fieldN = searchValueN
)
AS unioned_seeks
GROUP BY
id
HAVING
COUNT(*) >= N-M
如果每个字段都有一个索引,并且您希望每个字段的匹配数相对较少,那么这可能会超过第一个选项,代价是非常重复的代码。
我会用is not distinct from
来处理NULL
值。
您也可以使用Postgres简写来简化逻辑。一种方法是:
where ( (a.field1 is not distinct from b.field1)::int +
(a.field2 is not distinct from b.field2)::int +
. . .
(a.fieldn is not distinct from b.fieldn)::int +
) >= N - M
我认为这更容易用M
来表达。所以,只看看不同的字段:
where ( (a.field1 is distinct from b.field1)::int +
(a.field2 is distinct from b.field2)::int +
. . .
(a.fieldn is distinct from b.fieldn)::int +
) <= M
对数据执行此操作需要使用cross join
,这非常昂贵。