我有一个问题和答案库,并在NodeJS中构建了一个API,该API允许根据作为输入传递的问题来搜索答案。以下是我的目标:
我已经使用下面的代码实现了1到3:
let question = req.query.question;
let arrQuestions = question.split(" ");
let tokenizedQuestion = stopwords.removeStopwords(arrQuestions);
let whereClause = tokenizedQuestion.join("%' OR answer LIKE '%");
whereClause = " answer LIKE '%" + whereClause + "%' ";
let query = "SELECT * FROM tbl_libraries WHERE " + whereClause;
我不知道如何实现4。有人可以提供指针吗?
谢谢!
您确定不想为此使用MySQL全文搜索吗?
如果答案是'否',您可以继续阅读...
在我的一个项目中,我正在实现这样的东西。明智的查询看起来像这样(简化版):
SELECT
name
FROM
table
WHERE
name REGEXP 'term1|term2|term3' -- you can use your OR + LIKE way
ORDER BY
SP_TermsWeitght(name, 'term1 term2 term3') DESC
所有的魔术都在我的SP_TermsWieght函数中,该函数返回“ weight”(数字),并且我向该函数提供了一系列术语(经过清理和归一化)。
功能:
CREATE FUNCTION `SP_TermsWeight`(
`sValue` TEXT,
`sTerms` VARCHAR(127)
)
RETURNS INT
DETERMINISTIC
BEGIN
DECLARE i INT DEFAULT 1;
DECLARE p INT DEFAULT 1;
DECLARE w INT DEFAULT 0;
DECLARE l INT;
DECLARE c CHAR(1);
DECLARE s VARCHAR(63);
DECLARE delimiters VARCHAR(15) DEFAULT ' ,';
SET sTerms = TRIM(sTerms);
SET l = LENGTH(sTerms);
IF (l > 0) THEN
-- checking is value matched terms exactly
IF (sTerms = sValue) THEN
SET w = 50000;
ELSE
-- supposing that "the terms" is one single term so it it match in full, the weight will be high
IF (l <= 63) THEN
SET w = w + SP_TermWeight(sValue, sTerms, 5000, 1000, 100);
END IF;
-- not processing it term by term if it is already matched as full
IF (w = 0) THEN
-- processing term by term using space or comma as delimiter
WHILE i <= l DO
BEGIN
SET c = SUBSTRING(sTerms, i, 1);
IF (LOCATE(c, delimiters) > 0) THEN
SET s = SUBSTRING(sTerms, p, i - p);
SET w = w + SP_TermWeight(sValue, s, 50, 10, 0);
SET p = i + 1;
END IF;
SET i = i + 1;
END;
END WHILE;
IF (p > 1 AND p < i) THEN
SET s = SUBSTRING(sTerms, p, i - 1);
SET w = w + SP_TermWeight(sValue, s, 50, 10, 0);
END IF;
END IF;
END IF;
END IF;
RETURN w;
END
从技术上讲,它是使用定界符“分隔”术语并检查值是否“包含”该术语。要解释它的所有功能有些困难(我在代码中为您添加了一些注释)。如果您不了解某些内容,请随时提出问题。
在您的情况下,由于不需要区分开始/结束/中间比赛,因此可以大大简化。
内部使用的另一个辅助函数:
CREATE FUNCTION `SP_TermWeight`(
`sValue` TEXT,
`sTerm` VARCHAR(63),
`iWeightBegin` INT,
`iWeightEnd` INT,
`iWeightMiddle` INT
)
RETURNS INT
DETERMINISTIC
BEGIN
DECLARE r INT DEFAULT 0;
SET sTerm = TRIM(sTerm);
IF (LENGTH(sTerm) > 1) THEN
IF (iWeightBegin != 0 AND sValue REGEXP CONCAT('[[:<:]]', sTerm)) THEN
SET r = r + iWeightBegin;
END IF;
IF (iWeightEnd != 0 AND sValue REGEXP CONCAT(sTerm, '[[:>:]]')) THEN
SET r = r + iWeightEnd;
END IF;
IF (r = 0 AND iWeightMiddle != 0 AND sValue REGEXP sTerm) THEN
SET r = r + iWeightMiddle;
END IF;
END IF;
RETURN r;
END
如果术语从字符串的开头,字符串的结尾或中间匹配到值,则该函数用于分配不同的权重。就我而言,这很重要。您的情况可能很简单。