解释了极高的偏差,但 mgcv() 中没有显着的预测因子

问题描述 投票:0回答:1

我有一个包含三种海鸟出现计数数据的小数据集(n=28),并运行了 hurdel GAM 模型(使用 mgcv::gam()),首先使用具有存在/缺席的二项式模型,然后使用负二项式模型只要存在。存在模型使每只海鸟的样本量达到 12,12 和 22。海鸟数据也过度分散,零值较高且通常较低(< 10) occurrence at presence points. this is the models for each seabird: #three seabirds; prion, storm petrel, sooty shearwater

prion_binary <- mgcv::gam(prion_binary ~ s(avg_SST) + 
                            s(avg_SSS)+ 
                            s(delta_SST)+
                            s(delta_SSS)+
                            s(distance, k=8)+ # 9 different distances
                            s(total_zp)+ #total zooplankton
                            s(trip_factor,bs = "re"),
                          method = "ML",
                          family = binomial(link = "logit"), 
                          data = seabird)
prion_count <- mgcv::gam(prion ~ s(avg_SST) + 
                           s(avg_SSS)+ 
                           s(delta_SST)+
                           s(delta_SSS)+
                           s(distance, k=5)+ # 6 different distances
                           s(total_zp)+ #total zooplankton
                           s(trip_factor,bs = "re"), 
                         method = "ML",
                         family = "ziP", 
                         data = seabird[seabird$prion >0,])

我的问题是模型的输出显示出解释的非常高的偏差,并且相对没有显着的预测因子。在一种情况下,r2 也是负值。我认为我可能有太多的预测变量,但是当我运行单变量模型时,所有预测变量都会出现偏差解释和 p<0.05 so not sure which to remove. The residual plots also don't aline with such high dev explined.

不确定下一步该去哪里,因此我们将不胜感激。

这是三个海鸟模型的输出:

prion_binary

Family: binomial 
Link function: logit 

Formula:
prion_binary ~ s(avg_SST) + s(avg_SSS) + s(delta_SST) + s(delta_SSS) + 
    s(distance, k = 8) + s(total_zp) + s(trip_factor, bs = "re")

Parametric coefficients:
            Estimate Std. Error z value Pr(>|z|)
(Intercept)   -2.557      4.021  -0.636    0.525

Approximate significance of smooth terms:
                 edf Ref.df Chi.sq p-value
s(avg_SST)     1.000  1.000  0.687   0.407
s(avg_SSS)     1.000  1.000  0.324   0.569
s(delta_SST)   1.000  1.000  0.282   0.596
s(delta_SSS)   1.000  1.000  0.440   0.507
s(distance)    1.000  1.000  0.963   0.326
s(total_zp)    1.742  2.051  0.782   0.736
s(trip_factor) 1.349  3.000  3.615   0.120

R-sq.(adj) =  0.995   Deviance explained = 97.5%
-ML = 6.8535  Scale est. = 1         n = 28

朊病毒二元残差

朊病毒计数:

Family: Negative Binomial(2277108.965) 
Link function: log 

Formula:
prion ~ s(avg_SST) + s(avg_SSS) + s(delta_SST) + s(delta_SSS) + 
    s(distance, k = 5) + s(total_zp) + s(trip_factor, bs = "re")

Parametric coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)   0.8218     0.1979   4.153 3.29e-05 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Approximate significance of smooth terms:
                     edf Ref.df Chi.sq p-value
s(avg_SST)     1.000e+00      1  0.017   0.896
s(avg_SSS)     1.000e+00      1  0.675   0.411
s(delta_SST)   1.000e+00      1  0.085   0.771
s(delta_SSS)   1.000e+00      1  0.148   0.700
s(distance)    1.000e+00      1  0.727   0.394
s(total_zp)    1.000e+00      1  0.059   0.809
s(trip_factor) 1.016e-07      2  0.000   0.508

R-sq.(adj) =  -0.177   Deviance explained = 55.1%
-ML = 17.514  Scale est. = 1         n = 12

朊病毒计数残差

乌黑海鸥二元

Family: binomial 
Link function: logit 

    Formula:
    shearwater_binary ~ s(avg_SST) + s(avg_SSS) + s(delta_SST) + 
        s(delta_SSS, k = 15) + s(distance, k = 8) + s(total_zp) + 
        s(trip_factor, bs = "re")
    
    Parametric coefficients:
                Estimate Std. Error z value Pr(>|z|)
    (Intercept)    3.611      2.656    1.36    0.174
    
    Approximate significance of smooth terms:
                     edf Ref.df Chi.sq p-value   
    s(avg_SST)     1.000      1  0.917 0.33829   
    s(avg_SSS)     1.000      1  0.914 0.33915   
    s(delta_SST)   1.000      1  0.000 0.99210   
    s(delta_SSS)   1.000      1  0.017 0.89504   
    s(distance)    1.000      1  0.004 0.94848   
    s(total_zp)    1.000      1  0.113 0.73652   
    s(trip_factor) 1.141      3 11.683 0.00514 **
    ---
    Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
    
    R-sq.(adj) =  0.472   Deviance explained = 59.4%
    -ML = 9.4597  Scale est. = 1         n = 28

乌黑的二进制残差

海鸥计数

 Family: Negative Binomial(6642419.022) 
Link function: log 

Formula:
shearwater ~ s(avg_SST) + s(avg_SSS) + s(delta_SST) + s(delta_SSS) + 
    s(distance, k = 8) + s(total_zp) + s(trip_factor, bs = "re")

Parametric coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)   1.9002     0.1211    15.7   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Approximate significance of smooth terms:
                     edf Ref.df  Chi.sq  p-value    
s(avg_SST)     1.000e+00  1.000   1.841   0.1748    
s(avg_SSS)     3.606e+00  4.272 125.264  < 2e-16 ***
s(delta_SST)   1.000e+00  1.000   5.619   0.0178 *  
s(delta_SSS)   1.000e+00  1.000  18.094 2.11e-05 ***
s(distance)    4.657e+00  5.328 277.393  < 2e-16 ***
s(total_zp)    1.000e+00  1.000  10.490   0.0012 ** 
s(trip_factor) 9.002e-07  3.000   0.000   0.4375    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

R-sq.(adj) =  0.999   Deviance explained = 99.5%
-ML = 67.138  Scale est. = 1         n = 22

烟灰计数残留

风暴海燕双星

 Family: binomial 
Link function: logit 

Formula:
storm_petrel_binary ~ s(avg_SST) + s(avg_SSS) + s(delta_SST) + 
    s(delta_SSS) + s(distance, k = 8) + s(total_zp) + s(trip_factor, 
    bs = "re")

Parametric coefficients:
            Estimate Std. Error z value Pr(>|z|)
(Intercept)  -0.7884     0.9928  -0.794    0.427

Approximate significance of smooth terms:
                     edf Ref.df Chi.sq p-value
s(avg_SST)     1.000e+00   1.00  0.174   0.676
s(avg_SSS)     1.000e+00   1.00  0.190   0.663
s(delta_SST)   1.000e+00   1.00  1.038   0.308
s(delta_SSS)   1.000e+00   1.00  0.000   0.996
s(distance)    3.003e+00   3.69  5.302   0.213
s(total_zp)    1.000e+00   1.00  0.039   0.844
s(trip_factor) 5.115e-07   3.00  0.000   0.369

R-sq.(adj) =  0.595   Deviance explained = 64.9%
-ML = 12.629  Scale est. = 1         n = 28

风暴海燕二元残差

风暴彼得尔计数

Family: Negative Binomial(1572380.699) 
Link function: log 

Formula:
storm_petrel ~ s(avg_SST) + s(avg_SSS) + s(delta_SST) + s(delta_SSS) + 
    s(distance, k = 5) + s(total_zp) + s(trip_factor, bs = "re")

Parametric coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)   0.8340     0.2065   4.039 5.36e-05 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Approximate significance of smooth terms:
                     edf Ref.df Chi.sq p-value  
s(avg_SST)     1.000e+00      1  2.654  0.1033  
s(avg_SSS)     1.000e+00      1  0.861  0.3535  
s(delta_SST)   1.000e+00      1  1.389  0.2386  
s(delta_SSS)   1.000e+00      1  4.626  0.0315 *
s(distance)    1.000e+00      1  0.562  0.4534  
s(total_zp)    1.000e+00      1  0.580  0.4463  
s(trip_factor) 1.018e-07      2  0.000  0.2196  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

R-sq.(adj) =  0.647   Deviance explained = 73.8%
-ML = 19.901  Scale est. = 1         n = 12

风暴海燕计数残差

output rstudio gam mgcv
1个回答
0
投票

我猜测正在发生的事情(但如果没有看到数据就无法知道)是你的解释变量彼此高度相关。每个变量的显着性是根据当您将该变量添加到包含除该变量之外的所有变量的简化模型时解释的附加方差量来计算的。因此,如果您的解释变量是共线的,那么添加另一个解释变量并不能解释其他解释变量无法解释的方差。

此外,对于您拥有的数据来说,预测变量肯定太多了。对于只有 12 个数据,您可能不需要超过一两个预测变量(尽管请阅读其他地方的其他观点)。

一种可能的前进方法是对解释变量或自然分组的解释变量子集进行主成分分析。如果一两个主成分解释了解释变量中很大一部分方差,则使用这些主成分作为预测变量。

另一种可能性是放弃任何看起来不那么重要的预测因素先验(并且不是事后,除非你只是在进行探索性数据分析)。

© www.soinside.com 2019 - 2024. All rights reserved.