Skip to main content

Posts

Model Residuals in Time Series Data

Residuals are the indicator of the model quality. Based on Rob J Hyndman's book "Forecasting: Principles & Practice", residuals in forecasting is difference between observed value and its forecast based on all previous observations. Residuals are useful in checking whether a model has adequately captured the information in the data. All the patterns should be in the model, only randomness remains in the residuals. Therefore the ideal model has to be: uncorrelated has zero mean and useful properties are: constant variance  be normally distributed First I will activate some useful libraries we will be using. library(fpp) library(forecast) For our example I will use dowjones index as a data set. The idea will be to set up already well know simple models like: Mean Model, Naive model and Drift Model. In previous post I described  it more detailed. Next, knowing what attributes  the ideal model should  have we can check which one of those 3 are quite good or  def...
Recent posts

Basic Statistics in Time Series - examples

Let's use some of the statistics I mentioned before to describe some Time Series. We can start with Dow Jones dataset which are in fpp library.  Dataset containing the Dow Jones Index is a stock market index that measures the stock performance of 30 large companies listed on stock  exchanges in the United States. library(fpp) dowjones # It is our dataset, which has class ts so we don't have to convert it.  Time Series: Start = 1  End = 78  Frequency = 1   [1] 110.94 110.69 110.43 110.56 110.75 110.84 110.46 110.56 110.46 110.05 109.60 109.31 109.31 109.25 [15] 109.02 108.54 108.77 109.02 109.44 109.38 109.53 109.89 110.56 110.56 110.72 111.23 111.48 111.58 [29] 111.90 112.19 112.06 111.96 111.68 111.36 111.42 112.00 112.22 112.70 113.15 114.36 114.65 115.06 [43] 115.86 116.40 116.44 116.88 118.07 118.51 119.28 119.79 119.70 119.28 119.66 120.14 120.97 121.13 [57] 121.55 121.96 122.26 123.79 124.11 124.14 123.37 123.02 122.86 123.02 123.11 123.05 123.05...

Basic Statistics for Time Series

What we can say about the time series data at the beginning? How we can describe it and what elements determinate the method we will use to forecast the data? For my own personal use I have prepared some notes which help me to answer questions above. I was using some definitions from the book of "Forecasting: Principles & Practice" by Rob J Hyndman like also some other blog's article like: https://towardsdatascience.com/descriptive-statistics-in-time-series-modelling Basic Statistics for Time Series When you make sure that your data has time series class, you can check the data with the basic functions we have in R. ts() is useful to build Time Series from scratch. mean() shows the average of a set of data. median() shows the middle value of the arranged set of data. plot() shows on the graph how the Time series looks like sort() sort the data quantile() function returns quantiles which are cut points dividing the range of a probability distribution into continuous ...

Simple Time Series Models as primitive forecast methods

Let's start with what is Time Series data. According to the famous book of Rob J Hyndman "Forecasting: Principles & Practice" it is a sequences of observations collected over time. One of the characteristic of Time series is that there is successive order for values in opposite to vector where a unique ID doesn't necessarily provide a specific order to the data. Forecasting however is estimating how the sequence of observations will continue into the future according to the book of Hyndman. As we can imagine we can find Time Series data in various sector of business life and managing people would like to know what we can expect in the nearest future. It is crucial because by knowing in advance that something is coming we can prepare better and omit some losses.  R program has develop Time Series forecasting pretty well. We can build advanced ARIMA models as well as Exponential Smoothing or even Neutral network models and more. Today I would like to focus on some b...

Building my first R Shiny app -> Covid19 status for: France, UK, Italy, Germany, Poland, Spain.

 The report I will show here consists of two charts: line chart and scatterplot. On the left top corner you can choose country (one of six) and accordingly the report will present this two graphs for the selected country. On the right sight of the report you will see small table and check box for Scatterplot chart. You can choose scatterplot with or without regression line. The report looks like this   https://karolinamgoma.shinyapps.io/covid19/ I will explain you step by step how to build such shiny app by yourself. It will be useful to read my previous post so you can already be able to create your app.R script. If your app.R already exist we can add some libraries which I am going to use. Install them if you don't have it yet. Libraries library(shiny) # for building web app library(readr) #data import tool, part of the Tidyverse. library(dplyr) #perfect package for data manipulation, queries and much more, part of the Tidyverse. library(ggplot2) #package for data visua...

Interactive charts with R shiny app

Let's start with so motivational material. I recommend you to visit website: https://shiny.rstudio.com/gallery/   Impressive, isn't ? I was watching with open mouth all this visualizations, changing parameters and observing how it would change. Creating such reports are possible with R shiny app. Shiny is an open source R package for building web application. First install it on your computer. install.packages("shiny") R Shiny Framework Choose in your open RStudio: File-> New Project->New Directory->Shiny web Application. RStudio will create script app.R Delate the content and write shinyapp and press Shift+Tab, you should see the following: library(shiny) ui <- fluidPage( ) server <- function(input, output, session) {  } shinyApp(ui, server) This is the main structure when you are building your shiny app. You need user interface (ui), server and shinyApp() function. There are three pieces of an interactive component: 1. User interface will collect user...

Ggplot2 visualizations - examples - in the subject of Covid19 again!

 I have prepared example of two charts, Multiseries Line chart and Scatterplot to illustrate how ggplot2 is working.  Additionally I have put some formatting elements to show how we can improve looks of our charts. We need libraries below to create our graphs. Install them if you don't have it yet. library(readr) #data import tool, part of the Tidyverse. library(dplyr) #perfect package for data manipulation, queries and much more, part of the Tidyverse. library(ggplot2) #the subject of this post, package for data visualizations, part of the Tidyverse. library(RColorBrewer) #package which is helpful while we are choosing the colors. I downloaded data file "COVID_data_2021_01_19" from website: https://shiny.rstudio.com/gallery/covid19-tracker. Thanks to readr package I import dataset to R and transform it a bit. I used dplyr to pick some observations I wanted to visualized therefore I created "covid2" data.frame. covid<-COVID_data_2021_01_19 covid$country=as.fa...