正交多项式不同于“正常”多项式。因此回归输出将不同(系数、标准误差等)。
library(ISLR)
df = ISLR::Auto
reg1 = lm(mpg~poly(weight,3,raw=F),data=df)
summary(reg1)
reg2 = lm(mpg~poly(weight,3,raw=T),data=df)
summary(reg2)
reg1结果是:
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 23.4459 0.2112 111.008 < 2e-16 ***
poly(weight, 3, raw = F)1 -128.4436 4.1817 -30.716 < 2e-16 ***
poly(weight, 3, raw = F)2 23.1589 4.1817 5.538 5.65e-08 ***
poly(weight, 3, raw = F)3 0.2204 4.1817 0.053 0.958
reg2结果是:
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.170e+01 1.104e+01 5.587 4.36e-08 ***
poly(weight, 3, raw = T)1 -1.793e-02 1.091e-02 -1.643 0.101
poly(weight, 3, raw = T)2 1.515e-06 3.450e-06 0.439 0.661
poly(weight, 3, raw = T)3 1.846e-11 3.503e-10 0.053 0.958
但是,两个模型的预测(又名“拟合”)值将是相同的:
predict(reg1,newdata=df)[0:3]
1 2 3
18.26982 17.07799 18.72854
predict(reg2,newdata=df)[0:3]
1 2 3
18.26982 17.07799 18.72854
您还可以查看多项式背后的实际数字:
poly(df$weight,3,raw=F)[0:3]
[1] 0.03134202 0.04259480 0.02729340
poly(df$weight,3,raw=T)[0:3]
[1] 3504 3693 3436
该poly函数在 时返回“正常”多项式raw=T。但是由于这些是相关的(而不是正交的)并且因为它们只是不同的数字(与正交多项式相比),因此上述两个模型中的估计系数等不同。
poly(c(1,2,3,4),degree = 2,raw=T)
1 2
[1,] 1 1
[2,] 2 4
[3,] 3 9
[4,] 4 16