Residuals are the indicator of the model quality. Based on Rob J Hyndman's book "Forecasting: Principles & Practice", residuals in forecasting is difference between observed value and its forecast based on all previous observations. Residuals are useful in checking whether a model has adequately captured the information in the data. All the patterns should be in the model, only randomness remains in the residuals.
Therefore the ideal model has to be:
- uncorrelated
- has zero mean and useful properties are:
- constant variance
- be normally distributed
First I will activate some useful libraries we will be using.
library(forecast)
For our example I will use dowjones index as a data set. The idea will be to set up already well know simple models like: Mean Model, Naive model and Drift Model. In previous post I described it more detailed. Next, knowing what attributes the ideal model should have we can check which one of those 3 are quite good or definitely not good. Let's see the dowjones index and plot it.
As we can see it is a trend data, dataset is moving towards a direction.
Now time to set up our simple models:
Due to the nature of naive and drift methods we have at the front of the vector NA value. We need to delate it.
Let's plot our dowjones index with our 3 simple methods.
Let's say we build three very simple models now thanks to residuals we will be able to estimate quality of it.
First, we will check the Mean model.
- Variance and mean
[1] 30.31672
plot(meandjmodel$residuals,main="Residuals from Mean model") # it is how plot of residuals mean model looks like.
Conclusion: Relatively big value of variance doesn't look good however mean not far away from zero.
- Histogram of distribution will help us to check normal distribution
Conclusion: model is not normally distributed
Plotting Q-Q plot is also useful for determinising if the residuals follow the normal distribution. If the data values in the plot fall along a straight line at a 45-degree angle, then the data is normally distributed.
qqnorm(meandjmodel$residuals)
qqline(meandjmodel$residuals)
It confirms also that we don't have normal distribution here.
- Autocorrelation
Conclusion: There is correlation between the lag. There are several bars clearly above the threshold levels.
To check if the residuals are white noise we can use Box-Pierce test and Ljung-Box test. If the p-values are relatively large, we can conclude that the residuals are not distinguishable from a white noise series.
Box.test(meandjmodel$residuals,lag = 20,fitdf=0,type="Lj")
Conclusion: It doesn't look like a white noise, the p-value is small
It is useful to know that all of these methods for checking residuals are packaged into one R function checkresiduals(). It will produce a time plot, ACF plot and histogram of the residuals (with an overlaid normal distribution for comparison), and do a Ljung-Box test with the correct degrees of freedom.
Conclusion: The mean model seems to be a very weak one. Autocorrelation is present, it is not normally distributed and variance is large.
Checking the Naive model forecast:
- Variance and mean
Conclusion: Variance is relatively small and mean not so far away from zero although not zero.
- Histogram of distribution and Q-Q plot
qqnorm(naivedjwithoutNA)
qqline(naivedjwithoutNA)
Conclusion: Not normally distributed
- Autocorrelation
Conclusion: it seems no autocorrelation present. For uncorrelated data, we would expect each autocorrelation to be close to zero.
Test if the residuals are white noise. It can be done thanks to Box-Pierce test and Ljung-Box test
Conclusion: The naived model seems to be better one although not ideal. Mean is not zero, although variance is small and no autocorrelation present, unfortunately not normally distributed.
Checking the Drift model
- Variance and mean
Conclusion: Variance relatively small, mean not to close zero.
- Histogram of distribution and Q-Q plot
qqnorm(driftdjwithoutNA)
qqline(driftdjwithoutNA)
- Autocorrelation
Conclusion: The drift model is similar to naive model with mean more away from zero. All model are away of being ideal but from those 3 I would take naive one.
Comments
Post a Comment