Skip to main content

Simple Time Series Models as primitive forecast methods

Let's start with what is Time Series data. According to the famous book of Rob J Hyndman "Forecasting: Principles & Practice" it is a sequences of observations collected over time. One of the characteristic of Time series is that there is successive order for values in opposite to vector where a unique ID doesn't necessarily provide a specific order to the data. Forecasting however is estimating how the sequence of observations will continue into the future according to the book of Hyndman.

As we can imagine we can find Time Series data in various sector of business life and managing people would like to know what we can expect in the nearest future. It is crucial because by knowing in advance that something is coming we can prepare better and omit some losses. 

R program has develop Time Series forecasting pretty well. We can build advanced ARIMA models as well as Exponential Smoothing or even Neutral network models and more. Today I would like to focus on some basic primitive methods like Mean method, Naïve method, Seasonal naïve method, Drift method which works well with random data set. You can use this methods with library 'forecast' so it is better to install it right away. What is it all about?

Mean Method - returns the mean as forecast value with function meanf()

Naïve Method - returns the last observation as forecast value with function naive()

Seasonal Naïve Method - returns the last observation of the seasonal stage with function snaive()

Drift Method- carries the change over first and last observation into the future with function rwf()

Let's see how to use it now in practice. We will start with using simple time series methods on some random data set and most of the time those method are helpful in this type of data where there is no trend, no seasonality and other statistical characters of data.

library(forecast)
set.seed(60) # We want to make results reproducible

We are taking 400 randomly distributed numbers. Time series starts in Q1 of 1900 and it is quarterly data.

randomtseries= ts(rnorm(400),start=c(1900,1),frequency=4)
plot(randomtseries)

It is how random distributed data looks like. There is no particular trend, no seasonality. Now we are defining our 3 models by using functions I mentioned above. Model forecast will have 15 observations (parameter h).

meanrtsmodel<-meanf(randomtseries,h=15)
naivertsmodel<-naive(randomtseries,h=15)
driftrtsmodel<-rwf(randomtseries,h=15)

Now, each object contain the original data and 15 forecast values. We can plot original data and our " models in the same graph.



Comments

Popular posts from this blog

Model Residuals in Time Series Data

Residuals are the indicator of the model quality. Based on Rob J Hyndman's book "Forecasting: Principles & Practice", residuals in forecasting is difference between observed value and its forecast based on all previous observations. Residuals are useful in checking whether a model has adequately captured the information in the data. All the patterns should be in the model, only randomness remains in the residuals. Therefore the ideal model has to be: uncorrelated has zero mean and useful properties are: constant variance  be normally distributed First I will activate some useful libraries we will be using. library(fpp) library(forecast) For our example I will use dowjones index as a data set. The idea will be to set up already well know simple models like: Mean Model, Naive model and Drift Model. In previous post I described  it more detailed. Next, knowing what attributes  the ideal model should  have we can check which one of those 3 are quite good or  def...

Random number generators, reproducibility and sampling with dplyr

Let's assume that you want to take some random observations from your data set. Dplyr helps you with the function sample_n(). To make your code reproducible you seed the ID of a “random” set of values. You need to indicate number of rows you want to extract and specify if the rows should be replaced or not. To show you how it works I will use again mtcars dataset which is included in your base R program. Let's see first six rows of this data frame.  library(dplyr) data("mtcars") head(mtcars)                    mpg cyl disp  hp drat    wt  qsec vs am gear carb Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4 Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4 Datsun 710        22.8   4  108  93 3.85 2.320 18.61...

The Power of dplyr in R - part 3

Today I would like to present pipe operator which simplify our code and makes it more readable. As we can see all of the dplyr functions take a data frame (or tibble) as the first argument. Dplyr provides the %>% operator from magrittr that chains the functions so x %>% f(y) turns into f(x, y). Therefore  the result from one step is then “piped” into the next step. We will use pipe operator in further examples.  Additionally we will focus on grouping, ordering and summarising functions. As previously I will continue using mtcars dataset which is included in your R base program. count() #count the unique values of one or more variables   n()  n_distinct() #number of unique observation found in a category  group_by() # group by a column, allows to group operation in the “split-apply-combine" concept   library(dplyr) data("mtcars") head(mtcars)                    mpg cyl disp  hp drat...