当我尝试计算 SHAP 值或尝试使用 Pytorch 模型运行 LIME 时,出现错误“mat1 和 mat2 必须具有相同的 dtype”。它适用于 sklearn 包中的其他机器学习算法。
我的想法是从模型中提取信息来解释深度学习算法做出的预测,所以我也很感激任何有关于此的建议。
我猜问题是 Pytorch 用于权重的 dtype。我使用 float32 作为数据,所以我尝试使用 nn.Linear(n_feat, n_feat).float() 更改模型,但它不起作用。
这是一个示例数据集(data),在这里我粘贴了模型的代码(有效)。之后,我粘贴 SHAP 值和 LIME 的代码,这显示了相同的错误。
import torch, torch.nn as nn
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
var = 'target'
data = pd.read_csv('data800.csv', index_col=0)
train_dataset = data.sample(frac=0.8, random_state=1)
test_dataset = data.drop(train_dataset.index)
train_labels = train_dataset.pop(var)
test_labels = test_dataset.pop(var)
#Flatten distribution by replacing each value with its percentile
train_dataset_transformed = train_dataset.copy()
test_dataset_transformed = test_dataset.copy()
for feature in train_dataset.columns:
#Percentiles estimated from train data
bin_res = 0.2
eval_percentiles = np.arange(bin_res, 100, bin_res)
percentiles = [
np.percentile(train_dataset[feature], p)
for p in eval_percentiles
]
#Apply to both train and test data
train_dataset_transformed[feature] = pd.cut(
train_dataset[feature],
bins=[-np.inf] + percentiles + [np.inf],
labels=False
).astype(np.float32)
test_dataset_transformed[feature] = pd.cut(
test_dataset[feature],
bins=[-np.inf] + percentiles + [np.inf],
labels=False
).astype(np.float32)
n_feat = train_dataset.shape[1]
model = nn.Sequential(
nn.Linear(n_feat, n_feat), nn.ReLU(), nn.BatchNorm1d(n_feat),
nn.Linear(n_feat, n_feat // 2), nn.ReLU(), nn.BatchNorm1d(n_feat // 2),
# nn.Linear(n_feat // 2, n_feat // 2), nn.ReLU(), nn.BatchNorm1d(n_feat // 2),
nn.Linear(n_feat // 2, n_feat // 4), nn.ReLU(), nn.BatchNorm1d(n_feat // 4),
# nn.Linear(n_feat // 4, n_feat // 4), nn.ReLU(), nn.BatchNorm1d(n_feat // 4),
nn.Linear(n_feat // 4, 1)
)
optim = torch.optim.Adam(model.parameters(), 0.01)
#Scale
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler().fit(train_dataset_transformed)
X_train = scaler.transform(train_dataset_transformed)
X_test = scaler.transform(test_dataset_transformed)
#Convert to tensors
X_train = torch.tensor(X_train).float()
y_train = torch.tensor(train_labels.values).float()
X_test = torch.tensor(X_test).float()
y_test = torch.tensor(test_labels.values).float()
torch.manual_seed(0)
for epoch in range(1500):
yhat = model(X_train)
loss = nn.MSELoss()(yhat.ravel(), y_train)
optim.zero_grad()
loss.backward()
optim.step()
with torch.no_grad():
yhatt = model(X_test)
score = np.corrcoef(y_test, yhatt.ravel())
if epoch % 100 == 0:
print('epoch', epoch, '| loss:', loss.item(), '| R:', score[0, 1])
yhat = model(X_test)
yhat = yhat.detach().numpy()
plt.scatter(test_labels, yhat)
ax_lims = plt.gca().axis()
plt.plot([0, 100], [0, 100], 'k:', label='y=x')
plt.gca().axis(ax_lims)
plt.xlabel('True Values')
plt.ylabel('Predictions')
plt.legend()
形状:
import shap
def model2(x):
return model(torch.tensor(x)).detach().numpy()
explainer = shap.Explainer(model2, X_test.detach().numpy())
shap_values = explainer(X_test.detach().numpy(), max_evals=10000)
石灰:
from lime import lime_tabular
features = train_dataset.columns
explainer_lime = lime_tabular.LimeTabularExplainer(X_train.detach().numpy(), feature_names=features, verbose=True, mode='regression')
#test vector
i = 10
#top features
k = 10
def model2(x):
return model(torch.tensor(x)).detach().numpy()
exp_lime = explainer_lime.explain_instance(X_test[i].detach().numpy(), model2, num_features=k)
exp_lime.show_in_notebook()
您可能会考虑查看
captum
库,它是专门为 PyTorch 模型可解释性而设计的。它提供了一系列类似于 SHAP
和 LIME
的工具,但与 PyTorch 模型集成可能更直接。
如果您仍然面临数据类型不匹配问题,您可能希望在整个代码中更明确地设置数据类型,而不仅仅是在创建初始张量时。