我有一个包含 3 个特征的数据集:汽车的版本、型号和品牌。如何应用模糊搜索(使用 Python)来查看用户输入的内容是否与版本、型号、品牌或部分/所有这些功能匹配。当它有多个功能/列时,我不知道如何评分...你能帮助我吗?
我想提取用户输入的特征(例如:型号、型号和品牌、型号或品牌和版本)。我正在使用 fuzzywuzzy 库。
您没有提供任何样本数据,所以我只是在这里猜测。
import pandas as pd
data = {
'Version': [2019, 2020, 2018, 2020, 2019, 2018, 2017, 2018],
'Model': ['Accord', 'Civic', 'Corolla', 'Camry', 'Altima', 'Maxima', 'Fusion', 'Focus'],
'Brand': ['Honda', 'Honda', 'Toyota', 'Toyota', 'Nissan', 'Nissan', 'Ford', 'Ford']
}
df = pd.DataFrame(data)
print(df)
from fuzzywuzzy import process
from rapidfuzz import fuzz
def fuzzy_match(input_text, column_values):
"""
Perform fuzzy matching on input_text against a list of column_values.
Args:
- input_text: The text input by the user.
- column_values: A list of values to match against.
Returns:
- The best match and its score.
"""
# Using fuzzywuzzy
# best_match, score = process.extractOne(input_text, column_values)
# Using rapidfuzz
best_match = process.extractOne(input_text, column_values)
return best_match
user_input = "user's input"
# Fuzzy match for Version
version_match = fuzzy_match(user_input, df['Version'])
# Fuzzy match for Model
model_match = fuzzy_match(user_input, df['Model'])
# Fuzzy match for Brand
brand_match = fuzzy_match(user_input, df['Brand'])
print("Version match:", version_match)
print("Model match:", model_match)
print("Brand match:", brand_match)
结果:
Version match: (2019, 0, 0)
Model match: ('Fusion', 45, 6)
Brand match: ('Nissan', 33, 4)