使用像google这样的人工智能搜索MySQL的最佳方式

问题描述 投票:0回答:2

我正在构建一个网络爬虫,它将爬行结果收集到 MySQL 表中。

有五个主要栏目:

URL, TITLE, DESCRIPTION, KEYWORDS, BODY

目前我正在使用MySQL的

FULLTEXT
搜索功能如下:

SELECT URL,title, description, MATCH (description, keywords, title, URL) AGAINST ('$keyword' in boolean mode) 
AS score FROM record
WHERE MATCH (description, keywords, title, URL) AGAINST ('$keyword' in boolean mode) order by score desc;";

但这并没有给我带来好的结果。考虑下图。 enter image description here

这里,Facebook 的搜索排名为第 23 位

"Facebook"
(?)

我可以根据列名称确定搜索的优先级吗?例如,我希望查询将最大优先级赋予

URL
,然后是
description
,然后是
title
keywords
.. 最后是
body
.

有什么建议吗?

php mysql search full-text-search
2个回答
0
投票
SELECT URL,title, description, MATCH (description, keywords, title, URL) AGAINST ('$keyword' in boolean mode) AS score FROM record WHERE URL LIKE '%$keyword%' OR MATCH (description, keywords, title, URL) AGAINST ('$keyword' in boolean mode) order by score desc;";

只需使用 LIKE 运算符进行 URL 匹配。参见上面的代码。谢谢你!


-1
投票

看看 SoundEx 之类的东西:

参见:http://www.madirish.net/?article=85

另外你可以不考虑自己做加权吗:(我本地没有MySQL,很抱歉半伪代码)

SELECT 
    URL
    ,title
    , description
    , MATCH (URL) AGAINST ('$keyword' in boolean mode) AS urlscore 
    , MATCH (description) AGAINST ('$keyword' in boolean mode) AS descscore 
    , MATCH (title) AGAINST ('$keyword' in boolean mode) AS titlescore 
    , MATCH (body) AGAINST ('$keyword' in boolean mode) AS bodyscore 

    ,((MATCH (URL) AGAINST ('$keyword' in boolean mode))*4) 
    + ((MATCH (description) AGAINST ('$keyword' in boolean mode))*3) 
    + ((MATCH (title) AGAINST ('$keyword' in boolean mode))*2) 
    + ((MATCH (body) AGAINST ('$keyword' in boolean mode))*1)  AS weightedscore
FROM    
    record
WHERE 
    MATCH (description, keywords, title, URL) AGAINST ('$keyword' in boolean mode) 
order by 
    ((MATCH (URL) AGAINST ('$keyword' in boolean mode))*4) 
    + ((MATCH (description) AGAINST ('$keyword' in boolean mode))*3) 
    + ((MATCH (title) AGAINST ('$keyword' in boolean mode))*2) 
    + ((MATCH (body) AGAINST ('$keyword' in boolean mode))*1)  desc;
© www.soinside.com 2019 - 2024. All rights reserved.