在 R 中预测季节性时间序列

机器算法验证 r 时间序列 预测 有马 自相关
2022-03-31 11:12:48

使用预测航空公司乘客季节性时间序列auto.arima()

我正在尝试对一些航空公司数据进行建模,以尝试使用 2003 年 1 月以后的月度数据来提供今年 6 月至 12 月的准确月度预测。数据取自:http ://www.transtats.bts.gov/Data_Elements.aspx?Data=1

这是时间序列图和 ACF

我使用 auto.arima 开发了两个模型并检查了它们是否对应于自相关函数。基本上我无法决定是否使用:

  1. 以下季节性 ARIMA 模型

  1. 的非季节性ARIMA模型是我先用12点移动平均把模型分解成趋势、季节性分量和随机分量 (基本上和手动做函数一样)NtXt=Tt+St+Ntdecompose()

我已经分析了这两个模型的重要属性,例如确保残差接近白噪声过程等,但我不确定上述两个模型中的哪一个最适合预测目的,为什么?

,我也不确定如何计算趋势分量向量的预测甚至可以使用这种类型的模型创建预测吗?Xt=Tt+St+Nt

编辑:这是dput(IAP)(未删除趋势或季节性成分的原始数据)的输出

dput(IAP) structure(c(9726436L, 8283372L, 9538653L, 8309305L, 8801873L, 10347900L, 11705206L, 11799672L, 9454647L, 9608358L, 9481886L, 10512547L, 10252443L, 9310317L, 10976440L, 10802022L, 10971254L, 12159514L, 13502913L, 13203566L, 10570682L, 10772177L, 10174320L, 11244427L, 11387275L, 9945067L, 12479643L, 11521174L, 12164600L, 13140061L, 14421209L, 13703334L, 11325800L, 11107586L, 10580099L, 11812574L, 11724098L, 10167275L, 12707241L, 12619137L, 12610793L, 13690835L, 14912621L, 14171796L, 12010922L, 11517228L, 11222687L, 12385958L, 12072442L, 10590281L, 13246293L, 12795517L, 12978086L, 14170877L, 15470687L, 15120200L, 12321953L, 12381689L, 12004268L, 13098697L, 12767516L, 11648482L, 14194753L, 12961165L, 13602014L, 14413771L, 15449821L, 15327739L, 11731364L, 11921490L, 11256163L, 12463351L, 12075267L,10412676L, 12508793L, 12629805L, 11806548L, 13199636L, 14953615L, 14844821L, 11659775L, 11905529L, 11093714L, 12659154L, 12393439L, 10694165L, 13279320L, 12398700L, 13380664L, 14406776L, 16026852L, 15317926L, 12599149L, 12874707L, 11651314L, 12915663L, 12668763L, 10944610L, 13473705L, 13537152L, 13935132L, 14814672L, 16623674L, 15753387L, 13220884L, 13185627L, 12144742L, 13546071L, 13206682L, 11732944L, 14387677L, 13995377L, 14291285L, 15582335L, 16969590L, 16621336L, 13791714L, 13397785L, 12762536L, 14096567L, 13766673L, 12023339L, 15177069L, 14278932L, 15306328L, 16232176L, 17645538L, 17517022L, 14239561L, 14209627L, 13133257L, 15083929L, 14589637L, 12385546L, 15486317L, 14857685L, 15615732L ), .Tsp = c(2003, 2014.33333333333, 12), class = "ts")14953615L, 14844821L, 11659775L, 11905529L, 11093714L, 12659154L, 12393439L, 10694165L, 13279320L, 12398700L, 13380664L, 14406776L, 16026852L, 15317926L, 12599149L, 12874707L, 11651314L, 12915663L, 12668763L, 10944610L, 13473705L, 13537152L, 13935132L, 14814672L, 16623674L, 15753387L, 13220884L, 13185627L, 12144742L, 13546071L, 13206682L, 11732944L, 14387677L, 13995377L, 14291285L, 15582335L, 16969590L, 16621336L, 13791714L, 13397785L, 12762536L, 14096567L, 13766673L, 12023339L, 15177069L, 14278932L, 15306328L, 16232176L, 17645538L, 17517022L, 14239561L, 14209627L, 13133257L, 15083929L, 14589637L, 12385546L, 15486317L, 14857685L, 15615732L), .Tsp = c(2003, 20133)334.333333 =3314953615L, 14844821L, 11659775L, 11905529L, 11093714L, 12659154L, 12393439L, 10694165L, 13279320L, 12398700L, 13380664L, 14406776L, 16026852L, 15317926L, 12599149L, 12874707L, 11651314L, 12915663L, 12668763L, 10944610L, 13473705L, 13537152L, 13935132L, 14814672L, 16623674L, 15753387L, 13220884L, 13185627L, 12144742L, 13546071L, 13206682L, 11732944L, 14387677L, 13995377L, 14291285L, 15582335L, 16969590L, 16621336L, 13791714L, 13397785L, 12762536L, 14096567L, 13766673L, 12023339L, 15177069L, 14278932L, 15306328L, 16232176L, 17645538L, 17517022L, 14239561L, 14209627L, 13133257L, 15083929L, 14589637L, 12385546L, 15486317L, 14857685L, 15615732L), .Tsp = c(2003, 20133)334.333333 =3314406776L, 16026852L, 15317926L, 12599149L, 12874707L, 11651314L, 12915663L, 12668763L, 10944610L, 13473705L, 13537152L, 13935132L, 14814672L, 16623674L, 15753387L, 13220884L, 13185627L, 12144742L, 13546071L, 13206682L, 11732944L, 14387677L, 13995377L, 14291285L, 15582335L, 16969590L, 16621336L, 13791714L, 13397785L, 12762536L, 14096567L, 13766673L, 12023339L, 15177069L, 14278932L, 15306328L, 16232176L, 17645538L, 17517022L, 14239561L, 14209627L, 13133257L, 15083929L, 14589637L, 12385546L, 15486317L, 14857685L, 15615732L ), .Tsp = c(2003, 2014.33333333333, 12), 类 = "ts")14406776L, 16026852L, 15317926L, 12599149L, 12874707L, 11651314L, 12915663L, 12668763L, 10944610L, 13473705L, 13537152L, 13935132L, 14814672L, 16623674L, 15753387L, 13220884L, 13185627L, 12144742L, 13546071L, 13206682L, 11732944L, 14387677L, 13995377L, 14291285L, 15582335L, 16969590L, 16621336L, 13791714L, 13397785L, 12762536L, 14096567L, 13766673L, 12023339L, 15177069L, 14278932L, 15306328L, 16232176L, 17645538L, 17517022L, 14239561L, 14209627L, 13133257L, 15083929L, 14589637L, 12385546L, 15486317L, 14857685L, 15615732L ), .Tsp = c(2003, 2014.33333333333, 12), 类 = "ts")14387677L, 13995377L, 14291285L, 15582335L, 16969590L, 16621336L, 13791714L, 13397785L, 12762536L, 14096567L, 13766673L, 12023339L, 15177069L, 14278932L, 15306328L, 16232176L, 17645538L, 17517022L, 14239561L, 14209627L, 13133257L, 15083929L, 14589637L, 12385546L, 15486317L, 14857685L, 15615732L), .Tsp = c(2003, 2014.33333333333, 12), 类 = "ts")14387677L, 13995377L, 14291285L, 15582335L, 16969590L, 16621336L, 13791714L, 13397785L, 12762536L, 14096567L, 13766673L, 12023339L, 15177069L, 14278932L, 15306328L, 16232176L, 17645538L, 17517022L, 14239561L, 14209627L, 13133257L, 15083929L, 14589637L, 12385546L, 15486317L, 14857685L, 15615732L), .Tsp = c(2003, 2014.33333333333, 12), 类 = "ts")

这是dput(IAP.res)(分解中的随机分量)的输出

dput(IAP.res) structure(c(NA, NA, NA, NA, NA, NA, -669127.347569446, -168943.285069446, 225871.456597222, 271337.106597223, 711896.11076389, 284583.435763889, 165401.360763887, 622993.194097221, -268299.21423611, -9406.73506944434, -233904.910069446, - 147124.755902779, -260973.055902776, -163628.243402778, -43056.7100694457, 121365.814930555, 205106.485763889, -107464.272569445, 247575.569097221, 279399.444097225, 309270.160763888, -166333.068402778, 129823.798263889, 22571.1190972265, -113455.59756944, -384199.160069444, 62061.8315972222, -155858.226736111, 13600.0274305546, -87564.1475694429, 71845.7357638887, 8145.86076388881 , 47627.494097226, 442212.72326389, 73639.5065972234, 60882.5774305568, -135204.389236112, -437744.576736112, 203832.581597222, -264145.435069444, 179945.61076389, 15812.1024305553, -49648.0975694434, -61460.8059027772, 89656.3690972241, 118205.931597224, -84196.4517361106, 4197.78576389072, -134118.722569442, -87234.4517361117, -126555.418402776, -57714.9350694417, 293250.152430556, 59462.6857638892, 10340.8190972245, 416646.652430557, 526459.702430556, -135041.068402776, 239767.631597222, 67034.9940972247, -221066.180902774, 207611.839930556, -424486.00173611 , -94779.3517361115, 89796.4857638886, 130285.644097223, 104776.152430555, 16099.8607638888, -317097.047569448, 335867.264930556, -796342.285069446, -446777.464236111, -93681.7225694442, 242962.798263888, -143380.293402778, 135423.439930556, 28934.7357638923, 186390.185763891, 116969.777430558, -113617.264236109, -39733.9225694438, -471572.526736109, 130389.423263891, 80446.7857638926、298895.444097222、38486.7982638846、143712.123263886、419260.898263889, -113385.347569445, -181233.730902779, -178686.680902779, -412733.597569445, -380106.797569444, 172783.973263888, 220863.173263891, 11443.2440972247, 392297.319097224, -62825.8267361117, 176278.664930556, 139372.439930556, -174159.88923611, -111755.439236109, -206233.264236111, -197431.097569445, -55065.5892361099, 48314.3065972236, - 6745.32673610683, 193492.494097225, 155009.569097224, 241747.214930556, 209670.99826389, -173438.47673611, -101510.63923611, -128948.689236113, -222773.597569443, -498474.472569441, 146856.619097224, -275463.026736109, 386273.214930557, 213400.994097223, 171865.11076389, 464391.381597217, 1489.99826388643, -9918.39340277936, -362009.847569447, NA, NA, NA, NA, NA, NA), .Tsp = c(2003, 2014.33333333333, 12), 类 = "ts")730902779, -178686.680902779, -412733.597569445, -380106.797569444, 172783.973263888, 220863.173263891, 11443.2440972247, 392297.319097224, -62825.8267361117, 176278.664930556, 139372.439930556, -174159.88923611, -111755.439236109, -206233.264236111, -197431.097569445, -55065.5892361099, 48314.3065972236, -6745.32673610683, 193492.494097225, 155009.569097224, 241747.214930556, 209670.99826389, -173438.47673611, -101510.63923611, -128948.689236113, -222773.597569443, -498474.472569441, 146856.619097224, -275463.026736109, 386273.214930557, 213400.994097223, 171865.11076389, 464391.381597217, 1489.99826388643, -9918.39340277936, -362009.847569447, NA, NA, NA, NA, NA, NA), .Tsp = c(2003, 2014.33333333333, 12), 类 = "ts")730902779, -178686.680902779, -412733.597569445, -380106.797569444, 172783.973263888, 220863.173263891, 11443.2440972247, 392297.319097224, -62825.8267361117, 176278.664930556, 139372.439930556, -174159.88923611, -111755.439236109, -206233.264236111, -197431.097569445, -55065.5892361099, 48314.3065972236, -6745.32673610683, 193492.494097225, 155009.569097224, 241747.214930556, 209670.99826389, -173438.47673611, -101510.63923611, -128948.689236113, -222773.597569443, -498474.472569441, 146856.619097224, -275463.026736109, 386273.214930557, 213400.994097223, 171865.11076389, 464391.381597217, 1489.99826388643, -9918.39340277936, -362009.847569447, NA, NA, NA, NA, NA, NA), .Tsp = c(2003, 2014.33333333333, 12), 类 = "ts")220863.173263891, 11443.2440972247, 392297.319097224, -62825.8267361117, 176278.664930556, 139372.439930556, -174159.88923611, -111755.439236109, -206233.264236111, -197431.097569445, -55065.5892361099, 48314.3065972236, -6745.32673610683, 193492.494097225, 155009.569097224, 241747.214930556, 209670.99826389, -173438.47673611, -101510.63923611, -128948.689236113, -222773.597569443, -498474.472569441, 146856.619097224, -275463.026736109, 386273.214930557, 213400.994097223, 171865.11076389, 464391.381597217, 1489.99826388643, -9918.39340277936, -362009.847569447, NA, NA, NA, NA, NA, NA), .Tsp = c(2003, 2014.33333333333, 12 ), 类 = "ts")220863.173263891, 11443.2440972247, 392297.319097224, -62825.8267361117, 176278.664930556, 139372.439930556, -174159.88923611, -111755.439236109, -206233.264236111, -197431.097569445, -55065.5892361099, 48314.3065972236, -6745.32673610683, 193492.494097225, 155009.569097224, 241747.214930556, 209670.99826389, -173438.47673611, -101510.63923611, -128948.689236113, -222773.597569443, -498474.472569441, 146856.619097224, -275463.026736109, 386273.214930557, 213400.994097223, 171865.11076389, 464391.381597217, 1489.99826388643, -9918.39340277936, -362009.847569447, NA, NA, NA, NA, NA, NA), .Tsp = c(2003, 2014.33333333333, 12 ), 类 = "ts")097569445, -55065.5892361099, 48314.3065972236, -6745.32673610683, 193492.494097225, 155009.569097224, 241747.214930556, 209670.99826389, -173438.47673611, -101510.63923611, -128948.689236113, -222773.597569443, -498474.472569441, 146856.619097224, -275463.026736109, 386273.214930557, 213400.994097223, 171865.11076389, 464391.381597217, 1489.99826388643, -9918.39340277936 , -362009.847569447, NA, NA, NA, NA, NA, NA), .Tsp = c(2003, 2014.33333333333, 12), 类 = "ts")097569445, -55065.5892361099, 48314.3065972236, -6745.32673610683, 193492.494097225, 155009.569097224, 241747.214930556, 209670.99826389, -173438.47673611, -101510.63923611, -128948.689236113, -222773.597569443, -498474.472569441, 146856.619097224, -275463.026736109, 386273.214930557, 213400.994097223, 171865.11076389, 464391.381597217, 1489.99826388643, -9918.39340277936 , -362009.847569447, NA, NA, NA, NA, NA, NA), .Tsp = c(2003, 2014.33333333333, 12), 类 = "ts")99826388643, -9918.39340277936, -362009.847569447, NA, NA, NA, NA, NA, NA), .Tsp = c(2003, 2014.33333333333, 12), 类 = "ts")99826388643, -9918.39340277936, -362009.847569447, NA, NA, NA, NA, NA, NA), .Tsp = c(2003, 2014.33333333333, 12), 类 = "ts")

1个回答

关于模型的比较,@forecaster 提出的想法可能会有所帮助,因为您的系列相对较长。

关于您关于如何获取趋势分量而不是整个系列的预测的最后一个问题:据我所知,CRAN 上没有将拟合的 ARIMA 模型分解为趋势和季节性分量的软件包。ArDec基于自回归分解了一系列,但我认为将其应用于 ARIMA 模型并不简单。

您通过移动平均线分解系列的想法可能是一个解决方案。由于您对预测趋势分量感兴趣,因此最好将 ARIMA 模型拟合到趋势分量而不是噪声分量,然后根据该模型获得预测。但是,这可能不是一种有效的方法,因为您使用的是数据的平滑版本,而不是使用观察到的数据。Nt

我会尝试一个结构化的时间序列模型,其中为每个组件明确定义了一个模型。例如,使用核心包中的函数StructTSstats我们得到以下分解:

fit1 <- StructTS(IAP, type = "BSM")
fit1
# Variances:
#     level      slope       seas    epsilon  
# 4.987e+10  0.000e+00  1.575e+11  0.000e+00  
plot(tsSmooth(fit1), main = "")
mtext(text = "decomposition of the basic structural model. StructTS() stats package", side = 3, adj = 0, line = 1)

基于StructTS的分解

该函数predict可用于获取预测,predict(fit1)但返回的是观测序列的预测,而不是分量的预测。要获得基于结构模型的组件预测,您可以使用包stsm

require("stsm")
require("stsm.class") # this package will be merged into package "stsm" 
m <- stsm.model(model = "BSM", y = IAP, transPars = "StructTS")
fit2 <- stsmFit(m, stsm.method = "maxlik.td.optim", method = "L-BFGS-B", 
  KF.args = list(P0cov = TRUE))
fit2
# Parameter estimates:
#              var1       var2       var3       var4
# Estimate    0.000  4.987e+10  0.000e+00  1.575e+11
# Std. error  3.201  1.194e+00  6.392e-06  8.366e-01
# Log-likelihood: -2048.649 
# Convergence code: 0 
# CONVERGENCE: REL_REDUCTION_OF_F <= FACTR*EPSMCH 
# Number of function calls: 16 
# Variance-covariance matrix: optimHessian

参数var1var2分别表示观测方程和水平、斜率和季节分量中扰动项的方差var3var4

基于拟合模型的组件由以下方式返回tsSmooth

fit2.comps <- tsSmooth(fit2, P0cov = FALSE)$states
plot(fit2.comps, main = "")
mtext(text = "decomposition of the basic structural model. stsm package", 
  side = 3, adj = 0, line = 1)

基于stsm的分解

请注意,基于StructTS和的参数估计stsm是相同的,但是,与基于 的估计相比,后一种情况下的估计分量看起来更好StructTS:趋势分量更平滑,样本开始时没有波动,季节性分量的方差在整个过程中更加稳定。图中出现这种差异的原因是stsm使用P0cov = FALSE(初始状态向量的对角协方差矩阵,但这是另一篇文章的主题)。

可以使用包 KFKSDS中的函数获得组件及其置信区间的预测,如下所示95%predict

require("KFKSDS")
m2 <- set.pars(m, pmax(fit2$par, .Machine$double.eps))
ss <- char2numeric(m2)
pred <- predict(ss, IAP, n.ahead = 12)

预测和置信区间图:

par(mfrow = c(3,1), mar = c(3,3,3,3))
# observed series
plot(cbind(IAP, pred$pred), type = "n", plot.type = "single", ylab = "", ylim = c(8283372, 19365461))
lines(IAP)
polygon(c(time(pred$pred), rev(time(pred$pred))), c(pred$pred + 2 * pred$se, rev(pred$pred)), col = "gray85", border = NA)
polygon(c(time(pred$pred), rev(time(pred$pred))), c(pred$pred - 2 * pred$se, rev(pred$pred)), col = " gray85", border = NA)
lines(pred$pred, col = "blue", lwd = 1.5)
mtext(text = "forecasts of the observed series", side = 3, adj = 0)
# level component
plot(cbind(IAP, pred$a[,1]), type = "n", plot.type = "single", ylab = "", ylim = c(8283372, 19365461))
lines(IAP)
polygon(c(time(pred$a[,1]), rev(time(pred$a[,1]))), c(pred$a[,1] + 2 * sqrt(pred$P[,1]), rev(pred$a[,1])), col = "gray85", border = NA)
polygon(c(time(pred$a[,1]), rev(time(pred$a[,1]))), c(pred$a[,1] - 2 * sqrt(pred$P[,1]), rev(pred$a[,1])), col = " gray85", border = NA)
lines(pred$a[,1], col = "blue", lwd = 1.5)
mtext(text = "forecasts of the level component", side = 3, adj = 0)
# seasonal component
plot(cbind(fit2.comps[,3], pred$a[,3]), type = "n", plot.type = "single", ylab = "", ylim = c(-3889253, 3801590))
lines(fit2.comps[,3])
polygon(c(time(pred$a[,3]), rev(time(pred$a[,3]))), c(pred$a[,3] + 2 * sqrt(pred$P[,3]), rev(pred$a[,3])), col = "gray85", border = NA)
polygon(c(time(pred$a[,3]), rev(time(pred$a[,3]))), c(pred$a[,3] - 2 * sqrt(pred$P[,3]), rev(pred$a[,3])), col = " gray85", border = NA)
lines(pred$a[,3], col = "blue", lwd = 1.5)
mtext(text = "forecasts of the seasonal component", side = 3, adj = 0)

系列和组件的预测