我正在努力将替代计数模型融入我的数据中。我想我的问题是太多的零。
这是我的数据
> summary(smpl)
response predict1 predict2
Min. :0.000 Min. : 1.00 Min. : 22005
1st Qu.:0.000 1st Qu.: 3.00 1st Qu.: 4669705
Median :0.000 Median : 8.00 Median : 12540318
Mean :0.017 Mean : 23.27 Mean : 20382574
3rd Qu.:0.000 3rd Qu.: 20.00 3rd Qu.: 25468156
Max. :3.000 Max. :1584.00 Max. :145348049
> table(smpl$response)
0 1 2 3
987 10 2 1
我尝试了三种回归:基本泊松、负二项式和零膨胀,但唯一没有警告的返回系数的公式是泊松:
> summary(glm(response ~ ., data = smpl, family = poisson))
Call:
glm(formula = response ~ ., family = poisson, data = smpl)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.3871 -0.2214 -0.1722 -0.1148 4.7861
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.472e+00 3.521e-01 -9.862 < 2e-16 ***
predict1 3.229e-03 7.271e-04 4.442 8.93e-06 ***
predict2 -6.258e-08 3.060e-08 -2.045 0.0409 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 150.67 on 999 degrees of freedom
Residual deviance: 135.84 on 997 degrees of freedom
AIC: 170.06
Number of Fisher Scoring iterations: 8
负二项式返回关于收敛和交替限制的警告
summary(glm.nb(response ~ ., data = smpl))
Call:
glm.nb(formula = response ~ ., data = smpl, init.theta = 0.04901296596,
link = log)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.28844 -0.17677 -0.14542 -0.09808 2.38314
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.899e+00 4.587e-01 -8.499 < 2e-16 ***
predict1 1.226e-02 2.144e-03 5.720 1.06e-08 ***
predict2 -5.982e-08 3.407e-08 -1.756 0.0791 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for Negative Binomial(0.049) family taken to be 1)
Null deviance: 69.927 on 999 degrees of freedom
Residual deviance: 55.940 on 997 degrees of freedom
AIC: 152.37
Number of Fisher Scoring iterations: 1
Theta: 0.0490
Std. Err.: 0.0251
Warning while fitting theta: alternation limit reached
2 x log-likelihood: -144.3700
Warning messages:
1: glm.fit: algorithm did not converge
2: In glm.nb(response ~ ., data = smpl) : alternation limit reached
并且零膨胀(来自pscl包装)根本不返回任何东西
> summary(zeroinfl(response ~ ., data = smpl, dist = "negbin"))
Call:
zeroinfl(formula = response ~ ., data = smpl, dist = "negbin")
Pearson residuals:
Min 1Q Median 3Q Max
-0.45252 -0.08817 -0.05515 -0.04210 19.56118
Count model coefficients (negbin with log link):
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.477e+00 NA NA NA
predict1 2.678e-03 NA NA NA
predict2 -1.160e-07 NA NA NA
Log(theta) -1.241e+00 NA NA NA
Zero-inflation model coefficients (binomial with logit link):
Estimate Std. Error z value Pr(>|z|)
(Intercept) 4.869e+00 NA NA NA
predict1 -1.329e-01 NA NA NA
predict2 -1.346e-07 NA NA NA
Error in if (getOption("show.signif.stars") & any(rbind(x$coefficients$count, :
missing value where TRUE/FALSE needed
然后我的问题是:
- 在使用负二项式(以避免警告)和零膨胀(以获得系数)方面,我能做些什么?
- 只看上面的结果(因此包括收敛和交替限制的问题)我应该选择负二项式模型,因为从 AIC 来看,在我的数据中似乎比泊松更适合?