我已经建立在 skaluzny 的答案之上,如果您想要一种更直观的方法来执行此操作,而不保存比例属性,而是使用默认情况下 scale() 函数的功能的知识(您实际上只需要这个答案的最后几行)。
尺度函数居中(减去平均值),然后尺度(除以数据的标准差):
sdist <- scale(cars$dist)
head(sdist)
[,1]
[1,] -1.5902596
[2,] -1.2798136
[3,] -1.5126481
[4,] -0.8141446
[5,] -1.0469791
[6,] -1.2798136
sdist2<-(cars$dist-mean(cars$dist))/sd(cars$dist)
head(sdist2)
[1] -1.5902596 -1.2798136 -1.5126481 -0.8141446 -1.0469791 -1.2798136
# Note this only is oriented the other way because scale() function outputs a matrix:
sdist2<-as.matrix(sdist2)
head(sdist2)
# The output now looks identical
[,1]
[1,] -1.5902596
[2,] -1.2798136
[3,] -1.5126481
[4,] -0.8141446
[5,] -1.0469791
[6,] -1.2798136
因此,我们实际上可以使用原始数据的均值和标准差,而不是将事物存储为列表。
# Scale cars data:
scars <- scale(cars)
# Save scaled attibutes:
scaleList <- list(scale = attr(scars, "scaled:scale"),
center = attr(scars, "scaled:center"))
scaleList
$`scale`
speed dist
5.287644 25.769377
$center
speed dist
15.40 42.98
> sapply(cars,mean) # note that these values are the same as the `center` values above
speed dist
15.40 42.98
> sapply(cars,sd) # note that these values are the same as the `scale` values above
speed dist
5.287644 25.769377
所以现在我们可以检查如果我们只使用而不是缩放属性,预测值是否都mean()相同sd():
# scars is a matrix, make it a data frame like cars for modeling:
scars <- as.data.frame(scars)
smod <- lm(speed ~ dist, data = scars)
# Predictions on scaled data:
sp <- predict(smod, scars)
# Fit the same model to the original cars data:
omod <- lm(speed ~ dist, data = cars)
op <- predict(omod, cars)
# Now the original answer was to use these stored attributes to modify the predictions:
usp1 <- sp * scaleList$scale["speed"] + scaleList$center["speed"]
# We can also simply use the standard deviation and mean from the original dataset:
usp2 <- sp * sd(cars$speed) + mean(cars$speed)
identical(usp1,usp2)
[1] TRUE
all.equal(op, usp1, usp2)
[1] TRUE
如果您这样做,这可能会更快/更有效,因为不需要创建额外的数据帧/对象:
Mod <- lm(scale(speed) ~ scale(dist), data = cars) # add scale() function directly to model
Unscaled_Pred <- predict(Mod, cars) * sd(cars$speed) + mean(cars$speed)
all.equal(op, Unscaled_Pred)
[1] TRUE # predictions are the same as the model that was never scaled