尝试,除非/ if语句组合 - 缺少结果

问题描述 投票:-1回答:1

我比较与其他12所列出的大学之一名单,发现模糊字符串匹配和写所有的结果为csv。我没有做模糊字符串匹配到一个大名单,因为我需要知道比赛来自哪一个列表。列表的例子:

data = [[1-00000, "MIT"], [1-00001, "Stanford"] ,...]

Data1 = ['MASSACHUSETTS INSTITUTE OF TECHNOLOGY (MIT)'], ['STANFORD UNIVERSITY'],...

堆栈溢出的帮助,我得到尽可能:

for uni in data:
    hit = process.extractOne(str(uni[1]), data10, scorer = fuzz.token_set_ratio, score_cutoff = 90)
    try:
        if float(hit[1]) >= 94:
            with open(filename, mode='a', newline="") as csv_file:
                fieldnames = ['bwbnr', 'uni_name', 'match', 'points']
                writer = csv.DictWriter(csv_file, fieldnames=fieldnames, delimiter=';')
                writer.writerow({'bwbnr': str(uni[0]), 'uni_name': str(uni[1]), 'match': str(hit), 'points': 10})

    except:
        hit1 = process.extractOne(str(uni[1]), data11, scorer = fuzz.token_set_ratio, score_cutoff = 90)
        try:
            if float(hit1[1]) >= 94:
                with open(filename, mode='a', newline="") as csv_file:
                      fieldnames = ['bwbnr', 'uni_name', 'match', 'points']
                      writer = csv.DictWriter(csv_file, fieldnames=fieldnames, delimiter=';')
                      writer.writerow({'bwbnr': str(uni[0]), 'uni_name': str(uni[1]), 'match': str(hit), 'points': 5})

下乡的12名名单,直到最后的节选,我包括那些与“未找到”得分比94低,结束:

    except:
        hit12 = process.extractOne(str(uni[1]), data9, scorer = fuzz.token_set_ratio)
        try:
            if float(hit12[1]) < 94:
                with open(filename, mode='a', newline="") as csv_file:
                       fieldnames = ['bwbnr', 'uni_name', 'match', 'points']
                       writer = csv.DictWriter(csv_file, fieldnames=fieldnames, delimiter=';')
                       writer.writerow({'bwbnr': str(uni[0]), 'uni_name': str(uni[1]), 'match': str(hit), 'points': 3})
        except:
            with open(filename, mode='a', newline="") as csv_file:
                  fieldnames = ['bwbnr', 'uni_name', 'match', 'points']
                  writer = csv.DictWriter(csv_file, fieldnames=fieldnames, delimiter=';')
                  writer.writerow({'bwbnr': str(uni[0]), 'uni_name': str(uni[1]), 'match': str(hit), 'points': 3})

不过,我只返回2854结果在我原来的名单反对3175(这都需要进行检查,并写入新CSV)。

当我把我所有的名单一起,做我的extractOne我得到3175分的结果:

scored_testdata = []
for uni in data:
     hit = process.extractOne(str(uni[1]), big_list, scorer = fuzzy.token_set_ratio, score_cutoff = 90)
     scored_testdata.append(hit)
print(len(scored_testdata))

我缺少的是在这里吗?给我的感觉导致process.extractOne返回“无”被丢弃的某些原因。任何帮助将非常感激。

完整的代码可以发现here

python python-3.x csv exception-handling fuzzywuzzy
1个回答
0
投票

最后的尝试,除了应该是一个检查所有的清单和不score_cutoff做一个extractBest:

except:
    hit12 = process.extractOne(str(uni[1]), big_list, scorer = fuzz.token_set_ratio)
    with open(filename, mode='a', newline="") as csv_file:
           fieldnames = ['bwbnr', 'uni_name', 'match', 'confidence', 'points']
           writer = csv.DictWriter(csv_file, fieldnames=fieldnames, delimiter=';')
           writer.writerow({'bwbnr': str(uni[0]), 'uni_name': str(uni[1]), 'match': "CHECK AGAIN " + str(hit12[0]), 'confidence': str(hit12[1]), 'points': 3})
© www.soinside.com 2019 - 2024. All rights reserved.