高斯过程二元分类：为什么 GPy 的方差比 scikit-learn 小得多？

Question

我正在学习使用高斯过程的二元分类，并且我正在将 GPy 与 scikit-learn 在一个受 Martin Krasser 的博客文章启发的玩具一维问题上进行比较。两种实现（GPy 和 scikit-learn）似乎都使用类似的 RBF 内核设置。优化内核超参数后，长度尺度相似，但方差相差很大。 GPy 内核方差似乎太小。

我如何修改我的 GPy 实现并获得与 scikit-learn 类似的结果？我怀疑这与每个算法的内部实现有关，但我无法说出是什么导致了这种巨大的差异。我在下面进一步解释为什么我认为我的 GPy 实现需要修复。

实现细节：Python 3.9 与 GPy 1.13.2 和 scikit-learn 1.5.1。可重现的示例：

import numpy as np
from scipy.stats import bernoulli
from scipy.special import expit as sigmoid

##############################
# Part 1: toy dataset creation
##############################

np.random.seed(0)
X = np.arange(0, 5, 0.05).reshape(-1, 1)
X_test = np.arange(-2, 7, 0.1).reshape(-1, 1)

a = np.sin(X * np.pi * 0.5) * 2  # latent function
t = bernoulli.rvs(sigmoid(a))    # Bernoulli training data (0s and 1s)

#####################################
# Part 2: scikit-learn implementation
#####################################

from sklearn.gaussian_process import GaussianProcessClassifier
from sklearn.gaussian_process.kernels import ConstantKernel, RBF

rbf = ConstantKernel(1.0, constant_value_bounds=(1e-3, 10)) \
        * RBF(length_scale=1.0, length_scale_bounds=(1e-3, 10))
gpc = GaussianProcessClassifier(
    kernel=rbf,
    optimizer='fmin_l_bfgs_b',
    n_restarts_optimizer=10)

gpc.fit(X_scaled, t.ravel())

print(gpc.kernel_)
# 1.5**2 * RBF(length_scale=0.858)

############################
# Part 3: GPy implementation
############################

import GPy

kern = GPy.kern.RBF(
    input_dim=1,
    variance=1.,
    lengthscale=1.)
kern.lengthscale.unconstrain()
kern.variance.unconstrain()
kern.lengthscale.constrain_bounded(1e-3, 10)
kern.variance.constrain_bounded(1e-3, 10)

m = GPy.core.GP(
    X=X,Y=t, kernel=kern, 
    inference_method=GPy.inference.latent_function_inference.laplace.Laplace(),    
    likelihood=GPy.likelihoods.Bernoulli())

m.optimize_restarts(
    num_restarts=10, optimizer='lbfgs',
    verbose=True, robust=True)

print(m.kern)
#  rbf.         |               value  |  constraints  |  priors
#  variance     |  0.8067562453940487  |  0.001,10.0   |        
#  lengthscale  |  0.8365668826459536  |  0.001,10.0   |

lenghtscale 值大致相似（0.858 与 0.836），但方差值非常不同（scikit-learn 为 1.5**2 = 2.25，GPy 仅 0.806）。

我认为我的 GPy 实现需要调整的原因是，真正的潜在函数（参见上面代码第 1 部分中的“a”）与预测函数并不紧密匹配，即使具有 +/- 2 标准差范围。另一方面，scikit-learn 实现与其匹配得相当好（可以使用 scikit-learn 检索潜在函数平均值和标准偏差，如此处所示）。

^{左：两个模型的预测概率相似（这是有道理的，因为它们共享相似的 lenghtscale 值）。右图：GPy 的预测潜在函数并不像 scikit-learn 模型那样拟合真实的潜在函数。}

到目前为止我已经尝试过，结果没有显着变化：

输入特征（X）归一化
使用 GPy.inference.latent_function_inference.expectation_propagation.EP() 作为 GPy 推理方法而不是 Laplace 方法
按照建议，将 WhiteKernel 组件添加到 scikit-learn 实现中此处

Answer 1

事实证明，GPy 的伯努利似然类使用标准正态 CDF 将潜在预测映射到概率（源代码显示默认为概率链接函数）。我的示例中的数据集使用 sigmoid 函数，这意味着需要 logit 链接函数。尽管它没有在 GPy 库中实现，但我们可以创建自己的链接函数对象，并在初始化时将其传递给 GPy 模型。这是我的代码的更新版本（请参阅第 3 部分中的更改）：

import numpy as np
from scipy.stats import bernoulli
from scipy.special import expit as sigmoid

##############################
# Part 1: toy dataset creation
##############################

np.random.seed(0)
X = np.arange(0, 5, 0.05).reshape(-1, 1)
X_test = np.arange(-2, 7, 0.1).reshape(-1, 1)

a = np.sin(X * np.pi * 0.5) * 2  # latent function
t = bernoulli.rvs(sigmoid(a))    # Bernoulli training data (0s and 1s)

#####################################
# Part 2: scikit-learn implementation
#####################################

from sklearn.gaussian_process import GaussianProcessClassifier
from sklearn.gaussian_process.kernels import ConstantKernel, RBF

rbf = ConstantKernel(1.0, constant_value_bounds=(1e-3, 10)) \
        * RBF(length_scale=1.0, length_scale_bounds=(1e-3, 10))
gpc = GaussianProcessClassifier(
    kernel=rbf,
    optimizer='fmin_l_bfgs_b',
    n_restarts_optimizer=10)

gpc.fit(X_scaled, t.ravel())

print(gpc.kernel_)
# 1.95**2 * RBF(length_scale=0.898)

############################
# Part 3: GPy implementation
############################

import GPy
from GPy.likelihoods.link_functions import GPTransformation

kern = GPy.kern.RBF(
    input_dim=1,
    variance=1.,
    lengthscale=1.)
kern.lengthscale.unconstrain()
kern.variance.unconstrain()
kern.lengthscale.constrain_bounded(1e-3, 10)
kern.variance.constrain_bounded(1e-3, 10)

# Create custom link function
class LogitLink(GPTransformation):
    def transf(self, f):
        """Sigmoid function"""
        return 1 / (1 + np.exp(-f))

    def dtransf_df(self, f):
        """First derivative of sigmoid"""
        p = self.transf(f)
        return p * (1 - p)

    def d2transf_df2(self, f):
        """Second derivative of sigmoid"""
        p = self.transf(f)
        return p * (1 - p) * (1 - 2 * p)

    def d3transf_df3(self, f):
        """Third derivative of sigmoid"""
        p = self.transf(f)
        return p * (1 - p) * (1 - 6 * p + 6 * p**2)

logit_link = LogitLink()

m = GPy.core.GP(
    X=X,Y=t, kernel=kern, 
    inference_method=GPy.inference.latent_function_inference.laplace.Laplace(),    
    likelihood=GPy.likelihoods.Bernoulli(gp_link=logit_link))

m.optimize_restarts(
    num_restarts=10, optimizer='lbfgs',
    verbose=True, robust=True)

print(m.kern)
#  rbf.         |               value  |  constraints  |  priors
#  variance     |  3.7922168280178328  |  0.001,10.0   |        
#  lengthscale  |  0.8977846144774547  |  0.001,10.0   |

scikit-learn 实现（代码的第 2 部分）使用 sigmoid 映射（logit 链接函数），这就是为什么它的结果更有意义。现在，两种实现（scikit-learn 和 GPy）在优化后返回相同的长度和方差值。

高斯过程二元分类：为什么 GPy 的方差比 scikit-learn 小得多？

问题描述投票：0回答：1

1个回答

最新问题

高斯过程二元分类：为什么 GPy 的方差比 scikit-learn 小得多？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1