如何使用 VESPA.AI 中的排名功能微调相关性分数

Question

背景

我正在努力提高搜索结果的相关性得分，我有几个候选人资料，我正在根据他们在行业中所扮演的技能和角色搜索最佳候选人资料。

我已经提出了排名概况，并使用它来寻找最相关的候选人。我同时使用词汇+语义。

这里的挑战是 vespa 生成的相关性分数不是很好，我想微调排名和相关性分数。

对此的任何提示将不胜感激！.

我想： A。提高此配置文件的相关性分数。 b.

bm25(skills)

和

matchfeatures

中的

summaryfeature

值均为 0.0，而实际上它同时具有

java

和

python

。

输出：

{
  "root": {
    "id": "toplevel",
    "relevance": 1,
    "fields": {
      "totalCount": 143
    },
    "coverage": {
      "coverage": 100,
      "documents": 143,
      "full": true,
      "nodes": 1,
      "results": 1,
      "resultsFull": 1
    },
    "children": [
      {
        "id": "id:candidate_profile:candidate_profile::a866fa7f-7e13-48fe-bdca-5a60a3198fd9",
        "relevance": 0.01639344262295082,
        "source": "candidate_profile",
        "fields": {
          "matchfeatures": {
            "bm25(profile_summary)": 5.470910610067547,
            "bm25(skills)": 0,
            "firstPhase": 0.8789145673605757,
            "nativeRank(profile_summary)": 0.08308099301928237,
            "semantic": 0.8789145673605757
          },
        "skills": [
            "HTML",
            "CSS",
            "Java Script",
            "React Js",
            "Python",
            "Web Designing",
            "Leadership",
            "Teamwork",
            "Observation",
            "Time management",
            "Communication",
            "Avid fitness enthusiast",
            "Volunteering",
            "Sports",
            "English",
            "Hindi"
          ],
        "summaryfeatures": {
            "bm25(latest_industry)": 0,
            "bm25(latest_job_title)": 0,
            "bm25(latest_role)": 0,
            "bm25(profile_summary)": 5.470910610067547,
            "bm25(skills)": 0,
            "embedding_sum": 55.06214759836439,
            "latest_industry_sum": 40.86598728704121,
            "latest_role_sum": 0,
            "skill_sum": 52.88688380786334,
            "vespa.summaryFeatures.cached": 0
          }
        }
    }
    ]
 }
}

我在 Vespa DB 中运行的查询：

"yql" : " select * from candidate_profile WHERE userQuery() or (all_role_title matches 'Software Developer') AND (skills matches 'python' OR skills matches 'java') AND (latest_role_title matches 'Senior Developer') or ({scoreThreshold:0.032 ,targetHits: 4}nearestNeighbor(embedding, e))",
"input.query(e)" : 'embed(e5, "query: Candidate who is working as Software Developer, Senior Developer has the following skills python, java.")',
"query": " Candidate who is working as Software Developer, Senior Developer has the following skills python, java.",
"ranking" : "common"

我创建的排名档案：

rank-profile common {
        weight skills : 500
        weight latest_role : 500
        weight latest_industry : 500
        weight latest_job_title : 400

        inputs {
            query(e) tensor<float>(x[384])
        }
        function semantic() {
            expression: max(0, cos(distance(field, embedding)))
        }
        function semantic_skills() {
            expression: max(0, cos(distance(field, skills_embedding)))
        }
        function semantic_latest_role() {
            expression: max(0, cos(distance(field, latest_role_embedding)))
        }
        function semantic_latest_job_title() {
            expression: max(0, cos(distance(field, latest_job_title_embedding)))
        }
        function semantic_latest_industry() {
            expression: max(0, cos(distance(field, latest_industry_embedding)))
        }
        function keyword_match(){
            expression: bm25(skills) + bm25(latest_role) + bm25(latest_industry) + bm25(latest_job_title)
        }
        first-phase {
            expression:  sum(keyword_match + semantic)
        }

        rank-properties {
            fieldMatch(skills).occurrenceImportance: 0.5
            fieldMatch(skills).proximityCompletenessImportance: 0.9
            bm25(skills).k1: 1.5
            bm25(skills).b: 0.85
            fieldMatch(profile_summary).occurrenceImportance: 0.5
            fieldMatch(profile_summary).proximityCompletenessImportance: 0.9
            bm25(profile_summary).k1: 1.5
            bm25(profile_summary).b: 0.85
        }

        summary-features: embedding_sum skill_sum latest_role_sum latest_industry_sum bm25(profile_summary) bm25(skills) bm25(latest_role) bm25(latest_industry) bm25(latest_job_title)

        function embedding_score() {
            expression: attribute(embedding) * query(e)
        }
        function embedding_sum() {
            expression: sum(embedding_score)
        }
        function skill_score(){
            expression : attribute(skills_embedding) * query(e)
        }
        function skill_sum(){
            expression : sum(skill_score)
        }
        function latest_role_score(){
            expression : attribute(latest_role_embedding)  * query(e)
        }
        function latest_role_sum(){
            expression : sum(latest_role_score)
        }
        function latest_industry_score(){
            expression : attribute(latest_industry_embedding) * query(e)
        }
        function latest_industry_sum(){
            expression : sum(latest_industry_score)
        }

        match-features {
            bm25(skills)
            bm25(profile_summary)
            nativeRank(profile_summary)
            semantic
            firstPhase
            
        }
        global-phase {
            expression {
            reciprocal_rank(semantic)
            }
        }
}

Answer 1

我能够获得bm25（技能）分数，还生成其他匹配字段的分数。

我的发现：

bm25 是一个纯文本排名功能，它对索引字符串字段进行操作，在我们的例子中，技能是索引字段，但类型是数组。因此，我们将值更改为逗号分隔或将类型更改为字符串。参考：bm25
第一步之后，您必须在查询中使用 rank() 运算符。

查询示例：

"yql" : " select * from candidate_profile WHERE rank((all_role_title matches 'senior') AND (skills matches 'python' OR skills matches 'java') AND (latest_role_title matches 'developer') or ({targetHits: 40}nearestNeighbor(embedding, e)),userQuery())",
"input.query(e)" : 'embed(e5, "query: Candidate who is working as senior, developer has the following skills python, java.")',
"query": " Candidate who is working as senior, developer has the following skills python, java.",
"ranking" : "common"

搜索输出：

{
  "root": {
    "id": "toplevel",
    "relevance": 1,
    "fields": {
      "totalCount": 40
    },
    "coverage": {
      "coverage": 100,
      "documents": 144,
      "full": true,
      "nodes": 1,
      "results": 1,
      "resultsFull": 1
    },
    "children": [
      {
        "id": "id:candidate_profile:candidate_profile::85715181-73f9-4f61-9398-4e350e41e989",
        "relevance": 22.045853545320217,
        "source": "candidate_profile",
        "fields": {
          "matchfeatures": {
            "bm25(profile_summary)": 35.56386309994314,
            "bm25(skills)": 11.98552149345962,
            "firstPhase": 22.045853545320217,
            "semantic": 0.9177947832357726
          },
          "sddocname": "candidate_profile",
          "documentid": "id:candidate_profile:candidate_profile::85715181-73f9-4f61-9398-4e350e41e989",
          "first_name": "DEEPAK",
          "middle_name": "SINGH",
          "city": "Bengaluru",
          "gender": "Male",
          "skills": [
            "Java,Springboot,J2EE,Hibernet,AWS,C/C++,Core Java,Python Programming"
          ],
          "total_months_of_experience": 98,
          "candidate_type": "New_candidate",
          "languages": [
            "English",
            "Kannada",
            "Hindi",
            "Telugu"
          ],
          "has_own_vehicle": false,
          "profile_summary": "The candidate has an experience of 8.2 years and is working as Developer, Senior Developer and has the following skills Java,Springboot,J2EE,Hibernet,AWS,C/C++,Core Java,Python Programming in industries like Software.",
          "latest_organisation_name": "Flipkart",
          "latest_job_title": "Senior Developer",
          "latest_role": "Developer",
          "latest_industry": "Software",
          "latest_employment_type": "Permanent",
          "employment_history": [
            {
              "role": "Developer",
              "job_title": "Senior Developer",
              "employment_type": "Permanent",
              "is_current_job": 1,
              "industry": "Software",
              "organisation_name": "Flipkart"
            }
          ],
          "highest_education_level": "Not mentioned",
          "highest_course_is_full_time": false,
          "highest_course_is_highest_qualification": false,
          "financials": [
            {}
          ],
          "summaryfeatures": {
            "bm25(candidate_type)": 0,
            "bm25(highest_course_name)": 0,
            "bm25(highest_education_level)": 0,
            "bm25(highest_specialization)": 0,
            "bm25(languages)": 0,
            "bm25(latest_employment_type)": 0,
            "bm25(latest_industry)": 0,
            "bm25(latest_job_title)": 4.5712686343124105,
            "bm25(latest_role)": 4.5712686343124105,
            "bm25(profile_summary)": 35.56386309994314,
            "bm25(skills)": 11.98552149345962,
            "embedding_sum": 58.45279276280053,
            "latest_industry_sum": 48.31929411615329,
            "latest_role_sum": 51.93824407275679,
            "skill_sum": 52.75094562502136,
            "vespa.summaryFeatures.cached": 0
          }
        }
      }
    ]
  }

Answer 2

bm25(skills) 为 0 的原因是查询不搜索技能字段：仅针对搜索的字段填充匹配特征。

您可以通过在查询中使用 RANK 项来搜索它，而不影响召回。

其余的 - 我如何获得与我的用例的巨大相关性 - 更适合会议。

如何使用 VESPA.AI 中的排名功能微调相关性分数

问题描述投票：0回答：2

背景

2个回答

最新问题

如何使用 VESPA.AI 中的排名功能微调相关性分数

问题描述 投票：0回答：2

背景

2个回答

最新问题

问题描述投票：0回答：2