我有以下数据:
物体 | l2a | l2b | l4 | l5 |
---|---|---|---|---|
a | 0.6649 | 0.5916 | 0.033569 | 0.557373 |
b | 0.8421 | 0.5132 | 0.000000 | 0.697193 |
c | 0.6140 | 0.2807 | 0.084217 | 0.650313 |
d | 0.7619 | 0.3810 | 0.000000 | 0.662306 |
e | 0.6957 | 0.3043 | 0.000000 | 0.645135 |
是否可以使用 RMSE 来测量 (a-b)、(a-c)、(a-d)、(a-e)、(b-c)、...、(d,e) 之间的相似度?
例如:
对象a(_a)和对象b(_b)之间的相似性:
diff_l2a = l2a_a - l2a_b
diff_l2b = l2b_a - l2b_b
diff_l4 = l4_a - l4_b
diff_l5 = l5_a - l5_b
然后计算RMSE:
RMSEs = [RMSE(diff_l2a, diff_l2b), RMSE(diff_l2a, diff_l4), RMSE(diff_l2a, diff_l5), ..., RMSE(diff_l4, diff_l5)]
相似之处:
average(RMSEs)
RMSE 相似度 DF 代码部分:
num_objects = len(df)
sim_matrix = np.zeros((num_objects, num_objects))
for i in range(num_objects):
for j in range(i + 1, num_objects):
rmse = np.sqrt(mean_squared_error(attributes[i], attributes[j]))
sim_matrix[i, j] = rmse
sim_matrix[j, i] = rmse
代码(带DF):
import pandas as pd
import numpy as np
from sklearn.metrics import mean_squared_error
data = {
'object': ['a', 'b', 'c', 'd', 'e'],
'l2a': [0.6649, 0.8421, 0.6140, 0.7619, 0.6957],
'l2b': [0.5916, 0.5132, 0.2807, 0.3810, 0.3043],
'l4': [0.033569, 0.0, 0.084217, 0.0, 0.0],
'l5': [0.557373, 0.697193, 0.650313, 0.662306, 0.645135]
}
df = pd.DataFrame(data)
attributes = df.iloc[:, 1:].values
num_objects = len(df)
sim_matrix = np.zeros((num_objects, num_objects))
for i in range(num_objects):
for j in range(i + 1, num_objects):
rmse = np.sqrt(mean_squared_error(attributes[i], attributes[j]))
sim_matrix[i, j] = rmse
sim_matrix[j, i] = rmse
sim_df = pd.DataFrame(sim_matrix, columns=df['object'], index=df['object'])
print("Similarity Matrix:")
print(sim_df)
sim = sim_df.values[sim_df.values != 0.0]
average_sim = sim.mean()
print(f"Average Similarity (excluding 0.0): {average_sim:.3f}")
输出:
补充:
如果您想计算基于成对 RMSE 的相似度:
from scipy.spatial.distance import pdist, squareform
sim_matrix = np.sqrt(squareform(pdist(attributes, 'euclidean')))
其他: https://docs.scipy.org/doc/scipy/reference/ generated/scipy.spatial.distance.pdist.html