我对线性模型中的标准化系数(beta)有疑问。我已经在这里问了一个问题。从答案中,我假设我应该scale()对因变量以及所有自变量 (IV) 使用 R 函数来估计模型的标准化系数。但是当我scale()在属于因子类的 IV 上使用该函数时,我收到以下错误消息:
Error in colMeans(x, na.rm = TRUE) : 'x' must be numeric
为了说明我的问题,这里是一个 MWE:
首先是具有非标准化系数的线性模型:
> data(ChickWeight)
> aa <- lm(weight ~ Time + Diet, data=ChickWeight)
> summary(aa)
Call:
lm(formula = weight ~ Time + Diet, data = ChickWeight)
Residuals:
Min 1Q Median 3Q Max
-136.851 -17.151 -2.595 15.033 141.816
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 10.9244 3.3607 3.251 0.00122 **
Time 8.7505 0.2218 39.451 < 2e-16 ***
Diet2 16.1661 4.0858 3.957 8.56e-05 ***
Diet3 36.4994 4.0858 8.933 < 2e-16 ***
Diet4 30.2335 4.1075 7.361 6.39e-13 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 35.99 on 573 degrees of freedom
Multiple R-squared: 0.7453, Adjusted R-squared: 0.7435
F-statistic: 419.2 on 4 and 573 DF, p-value: < 2.2e-16
现在我想使用该scale函数估计标准化系数,这会导致以下错误消息:
> bb <- lm(scale(weight) ~ scale(Time) + scale(Diet), data=ChickWeight)
Error in colMeans(x, na.rm = TRUE) : 'x' must be numeric
正如我自己发现的那样,出现错误消息,因为Diet属于因子类,而不是scale()函数所需的数字变量。Diet我通过包含不带的变量来替代地尝试了以下操作scale():
> cc <- lm(scale(weight) ~ scale(Time) + Diet, data=ChickWeight)
> summary(cc)
Call:
lm(formula = scale(weight) ~ scale(Time) + Diet, data = ChickWeight)
Residuals:
Min 1Q Median 3Q Max
-1.92552 -0.24132 -0.03652 0.21151 1.99538
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.24069 0.03415 -7.048 5.25e-12 ***
scale(Time) 0.83210 0.02109 39.451 < 2e-16 ***
Diet2 0.22746 0.05749 3.957 8.56e-05 ***
Diet3 0.51356 0.05749 8.933 < 2e-16 ***
Diet4 0.42539 0.05779 7.361 6.39e-13 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.5064 on 573 degrees of freedom
Multiple R-squared: 0.7453, Adjusted R-squared: 0.7435
F-statistic: 419.2 on 4 and 573 DF, p-value: < 2.2e-16
我现在的问题是,这是否是估计具有数字变量和因子变量的模型的标准化系数的正确方法?
非常感谢您提前回答。
问候,
马格努斯