我正在使用 MySQL,但遇到了问题。 我需要对同一个表上执行的 3 个查询的输出进行交叉。就是这张桌子:
CREATE TABLE posting (
doc integer NOT NULL,
word varchar(30) NOT NULL,
freq integer NOT NULL,
primary key (doc, word));
然后我插入了一些值。
SELECT * FROM posting;
+-----+----------------+------+
| doc | word | freq |
+-----+----------------+------+
| 1 | app | 40 |
| 1 | classification | 20 |
| 1 | context | 30 |
| 1 | information | 15 |
| 1 | mobile | 20 |
| 2 | app | 40 |
| 2 | context | 30 |
| 2 | discovery | 15 |
| 2 | mobile | 20 |
| 2 | recommandation | 30 |
| 2 | wall | 15 |
| 3 | app | 40 |
| 3 | discovery | 10 |
| 3 | ideal | 10 |
| 3 | mobile | 20 |
| 3 | search | 30 |
| 3 | server | 25 |
| 4 | app | 40 |
| 4 | killer | 25 |
| 4 | mobile | 20 |
| 4 | recommandation | 10 |
| 4 | search | 30 |
| 5 | app | 40 |
| 5 | beyond | 20 |
| 5 | mobile | 20 |
| 5 | model | 15 |
| 5 | service | 20 |
| 5 | share | 30 |
| 5 | store | 15 |
+-----+----------------+------+
我做了一些基本的询问。
SELECT DISTINCT doc FROM posting WHERE word LIKE 'mobile' AND freq >= 20;
+-----+
| doc |
+-----+
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
+-----+
所以集合为 M = {1, 2, 3, 4, 5}
SELECT DISTINCT doc FROM posting WHERE word LIKE 'context' AND freq >= 20;
+-----+
| doc |
+-----+
| 1 |
| 2 |
+-----+
所以集合是 C = {1, 2}
SELECT DISTINCT doc FROM posting WHERE word LIKE 'search' AND freq >= 20;
+-----+
| doc |
+-----+
| 3 |
| 4 |
+-----+
所以集合是 S = {3, 4}
SELECT DISTINCT doc FROM posting WHERE word LIKE 'app' AND freq >= 20;
+-----+
| doc |
+-----+
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
+-----+
所以集合是 A = {1, 2, 3, 4, 5}
现在我需要进行 2 个查询。
将第一个、第二个和第三个相交。 --> M ∩ C ∩ S = {1,2,3,4,5} ∩ {1,2} ∩ {3,4} = {}
(SELECT DISTINCT doc FROM posting WHERE word LIKE 'mobile' AND freq >= 20
INNER JOIN
posting WHERE word LIKE 'context' AND freq >= 20)
INNER JOIN
posting WHERE word LIKE 'search' AND freq >= 20;
将第一组、第四组和第二组的互补组相交。 --> M ∩ A ∩ 互补C = {1,2,3,4,5} ∩ {1,2,3,4,5} ∩ {3,4,5} = {3,4,5}
(SELECT DISTINCT doc FROM posting WHERE word LIKE 'mobile' AND freq >= 20
INNER JOIN
posting WHERE wordLIKE 'apps' AND freq >= 20)
LEFT JOIN
posting WHERE word LIKE 'context' AND freq >= 20;
对于单个表中多个条件的交叉结果,
INNER JOIN
与表本身一起使用,基于匹配的文档 ID 和每个交叉集的特定条件:
SELECT DISTINCT m.doc
FROM posting m
INNER JOIN posting c ON m.doc = c.doc AND c.word = 'context' AND c.freq >= 20
INNER JOIN posting s ON m.doc = s.doc AND s.word = 'search' AND s.freq >= 20
WHERE m.word = 'mobile' AND m.freq >= 20;
要查找给定条件的互补集,请结合使用
LEFT JOIN
和 WHERE ... IS NULL
检查来查找与条件不匹配的条目:
SELECT DISTINCT m.doc
FROM posting m
INNER JOIN posting a ON m.doc = a.doc AND a.word = 'app' AND a.freq >= 20
LEFT JOIN posting c ON m.doc = c.doc AND c.word = 'context' AND c.freq >= 20
WHERE m.word = 'mobile' AND m.freq >= 20 AND c.doc IS NULL;