Basic Statistics in Time Series

Basic Statistics in Time Series - examples

Let's use some of the statistics I mentioned before to describe some Time Series. We can start with Dow Jones dataset which are in fpp library. Dataset containing the Dow Jones Index is a stock market index that measures the stock performance of 30 large companies listed on stock exchanges in the United States.

library(fpp)
dowjones # It is our dataset, which has class ts so we don't have to convert it.

Time Series:
Start = 1
End = 78
Frequency = 1
[1] 110.94 110.69 110.43 110.56 110.75 110.84 110.46 110.56 110.46 110.05 109.60 109.31 109.31 109.25
[15] 109.02 108.54 108.77 109.02 109.44 109.38 109.53 109.89 110.56 110.56 110.72 111.23 111.48 111.58
[29] 111.90 112.19 112.06 111.96 111.68 111.36 111.42 112.00 112.22 112.70 113.15 114.36 114.65 115.06
[43] 115.86 116.40 116.44 116.88 118.07 118.51 119.28 119.79 119.70 119.28 119.66 120.14 120.97 121.13
[57] 121.55 121.96 122.26 123.79 124.11 124.14 123.37 123.02 122.86 123.02 123.11 123.05 123.05 122.83
[71] 123.18 122.67 122.73 122.86 122.67 122.09 122.00 121.23

Let's check now some basic statistic on this data.

mean(dowjones)
[1] 115.6833
median(dowjones)
[1] 113.755

Mean and median are close to each other.

sort(dowjones) # We can sort our time series
[1] 108.54 108.77 109.02 109.02 109.25 109.31 109.31 109.38 109.44 109.53 109.60 109.89 110.05 110.43
[15] 110.46 110.46 110.56 110.56 110.56 110.56 110.69 110.72 110.75 110.84 110.94 111.23 111.36 111.42
[29] 111.48 111.58 111.68 111.90 111.96 112.00 112.06 112.19 112.22 112.70 113.15 114.36 114.65 115.06
[43] 115.86 116.40 116.44 116.88 118.07 118.51 119.28 119.28 119.66 119.70 119.79 120.14 120.97 121.13
[57] 121.23 121.55 121.96 122.00 122.09 122.26 122.67 122.67 122.73 122.83 122.86 122.86 123.02 123.02
[71] 123.05 123.05 123.11 123.18 123.37 123.79 124.11 124.14

quantile(dowjones)
0% 25% 50% 75% 100%
108.5400 110.5925 113.7550 121.8575 124.1400

Extracting the deciles we can do as follow:

quantile(dowjones,prob=seq(0,1,length=11),type=5)
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
108.540 109.398 110.470 110.831 111.834 113.755 118.202 120.986 122.629 123.041 124.140

var(dowjones)
[1] 30.31672

Visualization of Time Series

plot(dowjones)

It seems that this dataset is moving towards a direction. It has a trend.

We are checking now stationarity with Augmented Dickey-Fuller Test

adf.test(dowjones)

Augmented Dickey-Fuller Test
data: dowjones
Dickey-Fuller = -1.8053, Lag order = 4, p-value = 0.6552
alternative hypothesis: stationary

As we can see the p-value is above 0.05 therefore data is not stationary.

Let's check the autocorrelation.

acf(dowjones)

Slowly deceasing ACF indicates trend, no seasonality.

pacf(dowjones)

It looks like no seasonal data but lets check it with one of our function

ggseasonplot(dowjones)
Error in ggseasonplot(dowjones) : Data are not seasonal

Let's take now the seasonal Time Series like usdeaths data. This time series present the monthly total of accidental deaths in the United States( Jan 1973-Dec 1978).

usdeaths

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

1973 9007 8106 8928 9137 10017 10826 11317 10744 9713 9938 9161 8927

1974 7750 6981 8038 8422 8714 9512 10120 9823 8743 9129 8710 8680

1975 8162 7306 8124 7870 9387 9556 10093 9620 8285 8433 8160 8034

1976 7717 7461 7776 7925 8634 8945 10078 9179 8037 8488 7874 8647

1977 7792 6957 7726 8106 8890 9299 10625 9302 8314 8850 8265 8796

1978 7836 6892 7791 8129 9115 9434 10484 9827 9110 9070 8633 9240

mean(usdeaths)

[1] 8787.736

median(usdeaths) 

[1] 8728.5

Again mean and median close to each other

quantile(usdeaths)

      0%      25%      50%      75%     100% 

 6892.00  8089.00  8728.50  9323.25 11317.00 

var(usdeaths)

[1] 918411.7

plot(usdeaths) 

We can see seasonal data set, no trend.

adf.test(usdeaths)

	Augmented Dickey-Fuller Test

data:  usdeaths

Dickey-Fuller = -3.8111, Lag order = 4, p-value = 0.02318

alternative hypothesis: stationary

Checking the stationary. The p-value is below 0.05, the data is stationary.

acf(usdeaths) # checking the autocorrelation

pacf(usdeaths)

ggseasonplot(usdeaths) #checking the seasonality

monthplot(usdeaths)

plot(decompose(usdeaths))

What conclusions can we have based on above plots? It seems seasonality is evident in all plots however no cyclicity or trend.

My journey through the data science - by Karolina M'Goma

Search This Blog

Basic Statistics in Time Series - examples

Comments

Post a Comment

Popular posts from this blog

Random number generators, reproducibility and sampling with dplyr

Ggplot2 for data visualizations

Basic Statistics for Time Series