尝试从 pytorch DL 模型计算 SHAP 值时出现 Dtype 错误

问题描述 投票:0回答:1

当我尝试计算 SHAP 值或尝试使用 Pytorch 模型运行 LIME 时,出现错误“mat1 和 mat2 必须具有相同的 dtype”。它适用于 sklearn 包中的其他机器学习算法。

我的想法是从模型中提取信息来解释深度学习算法做出的预测,所以我也很感激任何有关于此的建议。

我猜问题是 Pytorch 用于权重的 dtype。我使用 float32 作为数据,所以我尝试使用 nn.Linear(n_feat, n_feat).float() 更改模型,但它不起作用。

这是一个示例数据集(data),在这里我粘贴了模型的代码(有效)。之后,我粘贴 SHAP 值和 LIME 的代码,这显示了相同的错误。

import torch, torch.nn as nn
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

var = 'target'
data = pd.read_csv('data800.csv', index_col=0)

train_dataset = data.sample(frac=0.8, random_state=1)
test_dataset = data.drop(train_dataset.index)

train_labels = train_dataset.pop(var)
test_labels = test_dataset.pop(var)

#Flatten distribution by replacing each value with its percentile
train_dataset_transformed = train_dataset.copy()
test_dataset_transformed = test_dataset.copy()
for feature in train_dataset.columns:
    #Percentiles estimated from train data
    bin_res = 0.2
    eval_percentiles = np.arange(bin_res, 100, bin_res)
    percentiles = [
        np.percentile(train_dataset[feature], p)
        for p in eval_percentiles
    ]

    #Apply to both train and test data
    train_dataset_transformed[feature] = pd.cut(
        train_dataset[feature],
        bins=[-np.inf] + percentiles + [np.inf],
        labels=False
    ).astype(np.float32)
    
    test_dataset_transformed[feature] = pd.cut(
        test_dataset[feature],
        bins=[-np.inf] + percentiles + [np.inf],
        labels=False
    ).astype(np.float32)

n_feat = train_dataset.shape[1]

model = nn.Sequential(
    nn.Linear(n_feat, n_feat), nn.ReLU(), nn.BatchNorm1d(n_feat),                   
    nn.Linear(n_feat, n_feat // 2), nn.ReLU(), nn.BatchNorm1d(n_feat // 2),                   
    # nn.Linear(n_feat // 2, n_feat // 2), nn.ReLU(),  nn.BatchNorm1d(n_feat // 2),
    nn.Linear(n_feat // 2, n_feat // 4), nn.ReLU(),  nn.BatchNorm1d(n_feat // 4),
    # nn.Linear(n_feat // 4, n_feat // 4), nn.ReLU(),  nn.BatchNorm1d(n_feat // 4),
    nn.Linear(n_feat // 4, 1)
)

optim = torch.optim.Adam(model.parameters(), 0.01)

#Scale
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler().fit(train_dataset_transformed)

X_train = scaler.transform(train_dataset_transformed)
X_test = scaler.transform(test_dataset_transformed)

#Convert to tensors
X_train = torch.tensor(X_train).float()
y_train = torch.tensor(train_labels.values).float()

X_test = torch.tensor(X_test).float()
y_test = torch.tensor(test_labels.values).float()

torch.manual_seed(0)
for epoch in range(1500):
    yhat = model(X_train)

    loss = nn.MSELoss()(yhat.ravel(), y_train)
    optim.zero_grad()
    loss.backward()
    optim.step()

    with torch.no_grad():
        yhatt = model(X_test)
        score = np.corrcoef(y_test, yhatt.ravel())
        if epoch % 100 == 0:
            print('epoch', epoch, '| loss:', loss.item(), '| R:', score[0, 1])

yhat = model(X_test)
yhat = yhat.detach().numpy()
plt.scatter(test_labels, yhat)
ax_lims = plt.gca().axis()
plt.plot([0, 100], [0, 100], 'k:', label='y=x')
plt.gca().axis(ax_lims)
plt.xlabel('True Values')
plt.ylabel('Predictions')
plt.legend()

形状:

import shap

def model2(x):
    return model(torch.tensor(x)).detach().numpy()

explainer = shap.Explainer(model2, X_test.detach().numpy())
shap_values = explainer(X_test.detach().numpy(), max_evals=10000)

石灰:

from lime import lime_tabular

features = train_dataset.columns

explainer_lime = lime_tabular.LimeTabularExplainer(X_train.detach().numpy(), feature_names=features, verbose=True, mode='regression')

#test vector
i = 10
#top features
k = 10

def model2(x):
    return model(torch.tensor(x)).detach().numpy()

exp_lime = explainer_lime.explain_instance(X_test[i].detach().numpy(), model2, num_features=k)
 
exp_lime.show_in_notebook()
deep-learning pytorch shap lime
1个回答
0
投票

您可能会考虑查看

captum
库,它是专门为 PyTorch 模型可解释性而设计的。它提供了一系列类似于
SHAP
LIME
的工具,但与 PyTorch 模型集成可能更直接。

如果您仍然面临数据类型不匹配问题,您可能希望在整个代码中更明确地设置数据类型,而不仅仅是在创建初始张量时。

© www.soinside.com 2019 - 2024. All rights reserved.