我正在尝试计算数据框中两列之间的余弦相似度。它的代码片段如下:
def cal_cosine_similarity(row):
vec1 = np.array(row['sup_vec'])
vec2 = np.array(row['vector'])
return cosine_similarity([vec1], [vec2])[0][0]
cross_join_df['cos_sim'] = cross_join_df.apply(cal_cosine_similarity,axis = 1)
这在大多数情况下都可以正常工作,但有时我会收到如下错误:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 2898, in get_loc
return self._engine.get_loc(casted_key)
File "pandas/_libs/index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 101, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1675, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1683, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'cos_sim'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/pandas/core/generic.py", line 3576, in _set_item
loc = self._info_axis.get_loc(key)
File "/usr/local/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 2900, in get_loc
raise KeyError(key) from err
KeyError: 'cos_sim'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/prism/src/main.py", line 79, in <module>
res = job.run()
File "/opt/prism/src/jobs/v2/SparkJob.py", line 45, in run
self.start()
File "/opt/prism/src/jobs/v2/SparkJob.py", line 71, in start
raise e
File "/opt/prism/src/jobs/v2/SparkJob.py", line 68, in start
self.execute(self.input_data, 1)
File "/opt/prism/src/jobs/v2/DprmMappingInference.py", line 289, in execute
cross_join_df['cos_sim'] = cross_join_df.apply(cal_cosine_similarity,axis = 1)
File "/usr/local/lib/python3.8/site-packages/pandas/core/frame.py", line 3044, in __setitem__
self._set_item(key, value)
File "/usr/local/lib/python3.8/site-packages/pandas/core/frame.py", line 3121, in _set_item
NDFrame._set_item(self, key, value)
File "/usr/local/lib/python3.8/site-packages/pandas/core/generic.py", line 3579, in _set_item
self._mgr.insert(len(self._info_axis), key, value)
File "/usr/local/lib/python3.8/site-packages/pandas/core/internals/managers.py", line 1198, in insert
block = make_block(values=value, ndim=self.ndim, placement=slice(loc, loc + 1))
File "/usr/local/lib/python3.8/site-packages/pandas/core/internals/blocks.py", line 2744, in make_block
return klass(values, ndim=ndim, placement=placement)
File "/usr/local/lib/python3.8/site-packages/pandas/core/internals/blocks.py", line 2400, in __init__
super().__init__(values, ndim=ndim, placement=placement)
File "/usr/local/lib/python3.8/site-packages/pandas/core/internals/blocks.py", line 130, in __init__
raise ValueError(
ValueError: Wrong number of items passed 9, placement implies 1
我无法找到此错误。这个错误是由于余弦相似函数的某些功能造成的吗?
您说过有时代码不起作用,但并非总是如此。最明显的原因是
cross_join_df
不知何故没有密钥 'cos_sim'
并且 不允许使用新密钥作为创建新条目的手段。我不太确定 cross_join_df
是什么类型的对象,但通常您可以使用以下函数来确定是否存在 cos_sim
条目:
def check_have_entry(object,key):
if issubclass(object, dict):
return key in object.keys() # .keys() not really needed here, just for clarity
elif isinstance(object, pandas.DataFrame):
return key in object.index
return False
但是,如果您的数据框是其他对象,则上述功能将不起作用。