Skip to main content

Basic Statistics in Time Series - examples

Let's use some of the statistics I mentioned before to describe some Time Series. We can start with Dow Jones dataset which are in fpp library. Dataset containing the Dow Jones Index is a stock market index that measures the stock performance of 30 large companies listed on stock exchanges in the United States.

library(fpp)
dowjones # It is our dataset, which has class ts so we don't have to convert it. 
Time Series:
Start = 1 
End = 78 
Frequency = 1 
 [1] 110.94 110.69 110.43 110.56 110.75 110.84 110.46 110.56 110.46 110.05 109.60 109.31 109.31 109.25
[15] 109.02 108.54 108.77 109.02 109.44 109.38 109.53 109.89 110.56 110.56 110.72 111.23 111.48 111.58
[29] 111.90 112.19 112.06 111.96 111.68 111.36 111.42 112.00 112.22 112.70 113.15 114.36 114.65 115.06
[43] 115.86 116.40 116.44 116.88 118.07 118.51 119.28 119.79 119.70 119.28 119.66 120.14 120.97 121.13
[57] 121.55 121.96 122.26 123.79 124.11 124.14 123.37 123.02 122.86 123.02 123.11 123.05 123.05 122.83
[71] 123.18 122.67 122.73 122.86 122.67 122.09 122.00 121.23

Let's check now some basic statistic on this data.

mean(dowjones)
[1] 115.6833
median(dowjones) 
[1] 113.755

Mean and median are close to each other.

sort(dowjones) # We can sort our time series
 [1] 108.54 108.77 109.02 109.02 109.25 109.31 109.31 109.38 109.44 109.53 109.60 109.89 110.05 110.43
[15] 110.46 110.46 110.56 110.56 110.56 110.56 110.69 110.72 110.75 110.84 110.94 111.23 111.36 111.42
[29] 111.48 111.58 111.68 111.90 111.96 112.00 112.06 112.19 112.22 112.70 113.15 114.36 114.65 115.06
[43] 115.86 116.40 116.44 116.88 118.07 118.51 119.28 119.28 119.66 119.70 119.79 120.14 120.97 121.13
[57] 121.23 121.55 121.96 122.00 122.09 122.26 122.67 122.67 122.73 122.83 122.86 122.86 123.02 123.02
[71] 123.05 123.05 123.11 123.18 123.37 123.79 124.11 124.14
quantile(dowjones)
      0%      25%      50%      75%     100% 
108.5400 110.5925 113.7550 121.8575 124.1400 

Extracting the deciles we can do as follow:

quantile(dowjones,prob=seq(0,1,length=11),type=5) 
   0%     10%     20%      30%         40%      50%      60%     70%      80%     90%     100% 
108.540   109.398   110.470   110.831   111.834   113.755  118.202  120.986  122.629 123.041 124.140 
var(dowjones)
[1] 30.31672

Visualization of Time Series 

plot(dowjones) 

It seems that this dataset is moving towards a direction. It has a trend.

We are checking now stationarity with Augmented Dickey-Fuller Test

adf.test(dowjones)
Augmented Dickey-Fuller Test
data:  dowjones
Dickey-Fuller = -1.8053, Lag order = 4, p-value = 0.6552
alternative hypothesis: stationary

As we can see the p-value is above 0.05 therefore data is not stationary.

Let's check the autocorrelation.

acf(dowjones)

Slowly deceasing ACF indicates trend, no seasonality.

pacf(dowjones)

It looks like no seasonal data but lets check it with one of our function
ggseasonplot(dowjones)
Error in ggseasonplot(dowjones) : Data are not seasonal

Let's take now the seasonal Time Series like usdeaths data. This time series present the monthly total of accidental deaths in the United States( Jan 1973-Dec 1978).

usdeaths
     Jan   Feb   Mar   Apr   May   Jun   Jul   Aug   Sep   Oct   Nov   Dec
1973  9007  8106  8928  9137 10017 10826 11317 10744  9713  9938  9161  8927
1974  7750  6981  8038  8422  8714  9512 10120  9823  8743  9129  8710  8680
1975  8162  7306  8124  7870  9387  9556 10093  9620  8285  8433  8160  8034
1976  7717  7461  7776  7925  8634  8945 10078  9179  8037  8488  7874  8647
1977  7792  6957  7726  8106  8890  9299 10625  9302  8314  8850  8265  8796
1978  7836  6892  7791  8129  9115  9434 10484  9827  9110  9070  8633  9240
mean(usdeaths)
[1] 8787.736
median(usdeaths) 
[1] 8728.5

Again mean and median close to each other

quantile(usdeaths)
      0%      25%      50%      75%     100% 
 6892.00  8089.00  8728.50  9323.25 11317.00 
var(usdeaths)
[1] 918411.7
plot(usdeaths) 

We can see seasonal data set, no trend.

adf.test(usdeaths)

Augmented Dickey-Fuller Test

data:  usdeaths
Dickey-Fuller = -3.8111, Lag order = 4, p-value = 0.02318
alternative hypothesis: stationary

Checking the stationary. The p-value is below 0.05, the data is stationary.

acf(usdeaths) # checking the autocorrelation

pacf(usdeaths)

ggseasonplot(usdeaths) #checking the seasonality

monthplot(usdeaths)

plot(decompose(usdeaths))



What conclusions can we have based on above plots? It seems seasonality is evident in all plots however no cyclicity or trend.


Comments

Popular posts from this blog

Random number generators, reproducibility and sampling with dplyr

Let's assume that you want to take some random observations from your data set. Dplyr helps you with the function sample_n(). To make your code reproducible you seed the ID of a “random” set of values. You need to indicate number of rows you want to extract and specify if the rows should be replaced or not. To show you how it works I will use again mtcars dataset which is included in your base R program. Let's see first six rows of this data frame.  library(dplyr) data("mtcars") head(mtcars)                    mpg cyl disp  hp drat    wt  qsec vs am gear carb Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4 Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4 Datsun 710        22.8   4  108  93 3.85 2.320 18.61...

Ggplot2 for data visualizations

 When I have started my adventure with R, immediately I've noticed that everybody was taking about ggplot2 and its importance.  Tap "ggplot2"  in google and check it by yourself.  You will see a lots of professional fancy graphs, articles, blogs and other great materials.  I was so impressed  that I was even trying to start my learning of R programming from ggplot2.  Soon I understood, that I needed some basics first and it is better to take time if you are starting from zero. Before jumping to the ggplot2 structure I will share with you some tips I find useful. First it is good to remember that there are some steps while you explore your data. Most of the time you have to collect data first,  do some pre-processing and exploration,  modelling & analysis and only after comes visualization. Of course in previous steps,  graphs also can be helpful to interpret the situation correctly however it is important that you have prepared, clea...

Basic Statistics for Time Series

What we can say about the time series data at the beginning? How we can describe it and what elements determinate the method we will use to forecast the data? For my own personal use I have prepared some notes which help me to answer questions above. I was using some definitions from the book of "Forecasting: Principles & Practice" by Rob J Hyndman like also some other blog's article like: https://towardsdatascience.com/descriptive-statistics-in-time-series-modelling Basic Statistics for Time Series When you make sure that your data has time series class, you can check the data with the basic functions we have in R. ts() is useful to build Time Series from scratch. mean() shows the average of a set of data. median() shows the middle value of the arranged set of data. plot() shows on the graph how the Time series looks like sort() sort the data quantile() function returns quantiles which are cut points dividing the range of a probability distribution into continuous ...