解释了极高的偏差,但 mgcv() 中没有显着的预测因子

问题描述 投票:0回答:1

我有一个包含三种海鸟出现计数数据的小数据集(n=28),并运行了 hurdel GAM 模型(使用 mgcv::gam()),首先使用具有存在/缺席的二项式模型,然后使用负二项式模型只要存在。存在模型使每只海鸟的样本量达到 12,12 和 22。海鸟数据也过度分散,零值较高且通常较低(< 10) occurrence at presence points. this is the models for each seabird: #three seabirds; prion, storm petrel, sooty shearwater

prion_binary <- mgcv::gam(prion_binary ~ s(avg_SST) + 
                            s(distance, k=8)+ # 9 different distances
                            s(total_zp)+ #total zooplankton
                            s(trip_factor,bs = "re"),
                          method = "ML",
                          family = binomial(link = "logit"), 
                          data = seabird)
prion_count <- mgcv::gam(prion ~ s(avg_SST) + 
                           s(distance, k=5)+ # 6 different distances
                           s(total_zp)+ #total zooplankton
                           s(trip_factor,bs = "re"), 
                         method = "ML",
                         family = "ziP", 
                         data = seabird[seabird$prion >0,])

我的问题是模型的输出显示出解释的非常高的偏差,并且相对没有显着的预测因子。在一种情况下,r2 也是负值。我认为我可能有太多的预测变量,但是当我运行单变量模型时,所有预测变量都会出现偏差解释和 p<0.05 so not sure which to remove. The residual plots also don't aline with such high dev explined.




Family: binomial 
Link function: logit 

prion_binary ~ s(avg_SST) + s(avg_SSS) + s(delta_SST) + s(delta_SSS) + 
    s(distance, k = 8) + s(total_zp) + s(trip_factor, bs = "re")

Parametric coefficients:
            Estimate Std. Error z value Pr(>|z|)
(Intercept)   -2.557      4.021  -0.636    0.525

Approximate significance of smooth terms:
                 edf Ref.df Chi.sq p-value
s(avg_SST)     1.000  1.000  0.687   0.407
s(avg_SSS)     1.000  1.000  0.324   0.569
s(delta_SST)   1.000  1.000  0.282   0.596
s(delta_SSS)   1.000  1.000  0.440   0.507
s(distance)    1.000  1.000  0.963   0.326
s(total_zp)    1.742  2.051  0.782   0.736
s(trip_factor) 1.349  3.000  3.615   0.120

R-sq.(adj) =  0.995   Deviance explained = 97.5%
-ML = 6.8535  Scale est. = 1         n = 28



Family: Negative Binomial(2277108.965) 
Link function: log 

prion ~ s(avg_SST) + s(avg_SSS) + s(delta_SST) + s(delta_SSS) + 
    s(distance, k = 5) + s(total_zp) + s(trip_factor, bs = "re")

Parametric coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)   0.8218     0.1979   4.153 3.29e-05 ***
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Approximate significance of smooth terms:
                     edf Ref.df Chi.sq p-value
s(avg_SST)     1.000e+00      1  0.017   0.896
s(avg_SSS)     1.000e+00      1  0.675   0.411
s(delta_SST)   1.000e+00      1  0.085   0.771
s(delta_SSS)   1.000e+00      1  0.148   0.700
s(distance)    1.000e+00      1  0.727   0.394
s(total_zp)    1.000e+00      1  0.059   0.809
s(trip_factor) 1.016e-07      2  0.000   0.508

R-sq.(adj) =  -0.177   Deviance explained = 55.1%
-ML = 17.514  Scale est. = 1         n = 12



Family: binomial 
Link function: logit 

    shearwater_binary ~ s(avg_SST) + s(avg_SSS) + s(delta_SST) + 
        s(delta_SSS, k = 15) + s(distance, k = 8) + s(total_zp) + 
        s(trip_factor, bs = "re")
    Parametric coefficients:
                Estimate Std. Error z value Pr(>|z|)
    (Intercept)    3.611      2.656    1.36    0.174
    Approximate significance of smooth terms:
                     edf Ref.df Chi.sq p-value   
    s(avg_SST)     1.000      1  0.917 0.33829   
    s(avg_SSS)     1.000      1  0.914 0.33915   
    s(delta_SST)   1.000      1  0.000 0.99210   
    s(delta_SSS)   1.000      1  0.017 0.89504   
    s(distance)    1.000      1  0.004 0.94848   
    s(total_zp)    1.000      1  0.113 0.73652   
    s(trip_factor) 1.141      3 11.683 0.00514 **
    Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
    R-sq.(adj) =  0.472   Deviance explained = 59.4%
    -ML = 9.4597  Scale est. = 1         n = 28



 Family: Negative Binomial(6642419.022) 
Link function: log 

shearwater ~ s(avg_SST) + s(avg_SSS) + s(delta_SST) + s(delta_SSS) + 
    s(distance, k = 8) + s(total_zp) + s(trip_factor, bs = "re")

Parametric coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)   1.9002     0.1211    15.7   <2e-16 ***
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Approximate significance of smooth terms:
                     edf Ref.df  Chi.sq  p-value    
s(avg_SST)     1.000e+00  1.000   1.841   0.1748    
s(avg_SSS)     3.606e+00  4.272 125.264  < 2e-16 ***
s(delta_SST)   1.000e+00  1.000   5.619   0.0178 *  
s(delta_SSS)   1.000e+00  1.000  18.094 2.11e-05 ***
s(distance)    4.657e+00  5.328 277.393  < 2e-16 ***
s(total_zp)    1.000e+00  1.000  10.490   0.0012 ** 
s(trip_factor) 9.002e-07  3.000   0.000   0.4375    
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

R-sq.(adj) =  0.999   Deviance explained = 99.5%
-ML = 67.138  Scale est. = 1         n = 22



 Family: binomial 
Link function: logit 

storm_petrel_binary ~ s(avg_SST) + s(avg_SSS) + s(delta_SST) + 
    s(delta_SSS) + s(distance, k = 8) + s(total_zp) + s(trip_factor, 
    bs = "re")

Parametric coefficients:
            Estimate Std. Error z value Pr(>|z|)
(Intercept)  -0.7884     0.9928  -0.794    0.427

Approximate significance of smooth terms:
                     edf Ref.df Chi.sq p-value
s(avg_SST)     1.000e+00   1.00  0.174   0.676
s(avg_SSS)     1.000e+00   1.00  0.190   0.663
s(delta_SST)   1.000e+00   1.00  1.038   0.308
s(delta_SSS)   1.000e+00   1.00  0.000   0.996
s(distance)    3.003e+00   3.69  5.302   0.213
s(total_zp)    1.000e+00   1.00  0.039   0.844
s(trip_factor) 5.115e-07   3.00  0.000   0.369

R-sq.(adj) =  0.595   Deviance explained = 64.9%
-ML = 12.629  Scale est. = 1         n = 28



Family: Negative Binomial(1572380.699) 
Link function: log 

storm_petrel ~ s(avg_SST) + s(avg_SSS) + s(delta_SST) + s(delta_SSS) + 
    s(distance, k = 5) + s(total_zp) + s(trip_factor, bs = "re")

Parametric coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)   0.8340     0.2065   4.039 5.36e-05 ***
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Approximate significance of smooth terms:
                     edf Ref.df Chi.sq p-value  
s(avg_SST)     1.000e+00      1  2.654  0.1033  
s(avg_SSS)     1.000e+00      1  0.861  0.3535  
s(delta_SST)   1.000e+00      1  1.389  0.2386  
s(delta_SSS)   1.000e+00      1  4.626  0.0315 *
s(distance)    1.000e+00      1  0.562  0.4534  
s(total_zp)    1.000e+00      1  0.580  0.4463  
s(trip_factor) 1.018e-07      2  0.000  0.2196  
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

R-sq.(adj) =  0.647   Deviance explained = 73.8%
-ML = 19.901  Scale est. = 1         n = 12


output rstudio gam mgcv


此外,对于您拥有的数据来说,预测变量肯定太多了。对于只有 12 个数据,您可能不需要超过一两个预测变量(尽管请阅读其他地方的其他观点)。



© www.soinside.com 2019 - 2024. All rights reserved.