我正在使用CSV文件,其中包含如下数据记录:enter image description here
我正在python中使用此代码来查找和推荐相似的项目。如果我在CSV文件中仅有customer_id和title字段,它会很好地工作。但当图像中有许多字段时,它不起作用。
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import linear_kernel
from sklearn.metrics.pairwise import cosine_similarity
ds = pd.read_csv("sample-data2.csv")
tf = TfidfVectorizer(analyzer='word', ngram_range=(1, 2), min_df=0, stop_words='english')
tfidf_matrix = tf.fit_transform(ds['title'])
#cosine_similarities = linear_kernel(tfidf_matrix, tfidf_matrix)
cosine_similarities = cosine_similarity(tfidf_matrix,tfidf_matrix)
results = {}
for idx, row in ds.iterrows():
similar_indices = cosine_similarities[idx].argsort()[:-100:-1]
similar_items = [(cosine_similarities[idx][i], ds['customer_id'][i]) for i in similar_indices]
results[row['customer_id']] = similar_items[1:]
def item(id):
return ds.loc[ds['customer_id'] == id]['title'].tolist()[0]
def recommend(user_id, num):
if (num == 0):
print("Please select user_id")
elif (num==1):
print("Recommending " + str(num) + " items similar to " + item(id))
else:
print("Recommending " + str(num) + " items similar to " + item(id))
print("----------------------------------------------------------")
recs = results[id][:num]
for rec in recs:
print("You may also like to buy: " + item(rec[1]) + " (score:" + str(rec[0]) + ")")
recommend(user_id="0d1b7397-7d3c-44c0-9efc-d38bf197828b", num=5)
请告诉我为什么会这样。
这是我每次都会收到的错误:
Traceback (most recent call last):
File "recommender_system.py", line 10, in <module>
tfidf_matrix = tf.fit_transform(ds['title'])
File "C:\Users\aaa\Anaconda3\lib\site-packages\sklearn\feature_extraction\text.py", line 1652, in fit_transform
X = super().fit_transform(raw_documents)
File "C:\Users\aaa\Anaconda3\lib\site-packages\sklearn\feature_extraction\text.py", line 1058, in fit_transform
self.fixed_vocabulary_)
File "C:\Users\aaa\Anaconda3\lib\site-packages\sklearn\feature_extraction\text.py", line 970, in _count_vocab
for feature in analyze(doc):
File "C:\Users\aaa\Anaconda3\lib\site-packages\sklearn\feature_extraction\text.py", line 352, in <lambda>
tokenize(preprocess(self.decode(doc))), stop_words)
File "C:\Users\aaa\Anaconda3\lib\site-packages\sklearn\feature_extraction\text.py", line 143, in decode
raise ValueError("np.nan is an invalid document, expected byte or "
ValueError: np.nan is an invalid document, expected byte or unicode string.
可用作测试的CSV数据:
customer_id time product_id title
0d1b7397-7d3c-44c0-9efc-d38bf197828b 2019-07-01 00:05:54.308966 UTC 954f9f2c-d3ca-4236-ac9a-4ea7bcf09305 Tata notebook for children.
81ccae7e-e496-4997-a289-4669bf53f33e 2019-07-01 00:20:03.404186 UTC 75b281e5-8a16-42cb-9ae0-9a98db7a2c40 The boss glasses made from clay.
50777c55-8dd6-4309-a5ca-26e66c8a8279 2019-07-01 00:34:35.989935 UTC 0112dec8-47f5-4c2c-9109-571e2dbb6345 Taiwanee Multi data processors.
50777c55-8dd6-4309-a5ca-26e66c8a8279 2019-07-01 00:34:35.991935 UTC 0fa25a2d-2aa1-4397-82f1-5a64f3b1272d Highly effiecient used energy.
647d269f-b18f-4558-8653-93369d862ec9 2019-07-01 00:52:53.083698 UTC 81c01216-55a9-4588-a722-bccf0bf35fd5 Women dress punjabi used and new
谢谢
def recommend(user_id, number):
if (num == 0):
print("Please select user_id")
elif (num==1):
print("Recommending " + str(num) + " items similar to " + item(id))
else:
print("Recommending " + str(num) + " items similar to " + item(id))
print("----------------------------------------------------------")
recs = results[id][:num]
for rec in recs:
print("You may also like to buy: " + item(rec[1]) + " (score:" + str(rec[0]) + ")")
我看不到您在哪里定义了num。并且您可能想要添加
try:
for rec in recs:
print("You may also like to buy: " + item(rec[1]) + " (score:"+ str(rec[0]) + ")")
except:
pass
对于永远不会满足创建记录的条件。
您能否附上收到的错误?我认为“ num”应该是“ number”,因为您从未定义“ num”。共享表以便我们可以复制它进行测试也将非常有帮助。谢谢!