平均精度 - python

问题描述 投票:0回答:1

我正在计算前 k 个检索对象的平均精度。这是我的代码。在这个阶段,我正在计算R@K。 该代码从

csv
文件中读取两个列表,然后从列表中获取样本,与其他列表中的所有样本计算欧氏距离,对它们进行排序,最后获取前 k 个对象以查看该对象在检索到的样本中是否可用。

import csv
from scipy.spatial import distance
from sklearn.utils import shuffle
from numpy import dot
from numpy.linalg import norm
from sklearn.preprocessing import StandardScaler  
import numpy as np
from numpy import array


def parse_features_from_csv(csv_file):
    feat_lst = []
    id_lst = []
    row_lst = []
    with open(csv_file) as fr:
        reader = csv.reader(fr, delimiter=',')
        for row in reader:
            s_feat = row[:-1]
            identifier = row[-1]
            s_feat = [float(i) for i in s_feat]
            feat_lst.append(s_feat)
            id_lst.append(identifier)
            row_lst.append(row[-1])
    return feat_lst, id_lst,row_lst


def compute_distances(et_item, feat_lst, id_lst):
    dist_list = []
    for id_img_item, img_item in enumerate(feat_lst):
        dist = distance.euclidean(img_item,et_item)
        #print (dist)
        dist_list.append((id_lst[id_img_item], dist))
    return dist_list


def main():
    top_k = 10
    feat_file = "list_1.csv"
    test_file = "list_2.csv"
    et_feat_lst, et_id_list, _ = parse_features_from_csv(test_file)
    feat_list, id_list,row_lst_et = parse_features_from_csv(feat_file)



    print (len(feat_list))
    print (len(et_feat_lst))




    correct = 0
    for id_et_item, et_item in enumerate(et_feat_lst):
        distances = compute_distances(et_item, feat_list, row_lst_et)
        sort_dst = sorted(distances, key=lambda x: x[1])

        #print("Target: " + et_id_list[id_et_item] + ", Distances: " + str(sort_dst[:top_k]))

        eucl_dist = sort_dst[:top_k]
        gt = et_id_list[id_et_item]
        for idx in eucl_dist:
            tar = idx[0]        
            if gt == tar:
                correct+= 1
                break

        print ("correct", str(correct) + '/' +  str(id_et_item))


if __name__ == '__main__':
    main()

有人可以告诉我如何使用吗

sklearn.metrics.average_precision_score
函数计算前 K 个检索对象的平均精度。 我对
(y_true, y_scores)
感到困惑。如果有人能解释该函数的这两个参数,我将不胜感激。

python scikit-learn information-retrieval
1个回答
0
投票

y_true - 真实值(例如类标签) y_scores - 您的算法预测的值

我认为这是一个输入问题。您可以过滤前 K 个对象的真实答案 (y_true),计算这 K 个对象的模型输出,并将它们用作 y_score。这应该有效。

这是使用官方文档中的 sklearn.metrics.average_ precision_score 的示例:

  from sklearn.metrics import average_precision_score
 
  y_true = np.array([0, 0, 1, 1])
  y_scores = np.array([0.1, 0.4, 0.35, 0.8]) # these are the predicted probabilities of an object to be of the class 1
  
  average_precision_score(y_true, y_scores)

顺便说一句,我很难理解平均精度/平均平均精度作为 ML 指标的概念 - 此页面对我帮助很大。

© www.soinside.com 2019 - 2024. All rights reserved.