Ggplot2 visualizations - examples - in the subject of Covid19 again!

I have prepared example of two charts, Multiseries Line chart and Scatterplot to illustrate how ggplot2 is working. Additionally I have put some formatting elements to show how we can improve looks of our charts.

We need libraries below to create our graphs. Install them if you don't have it yet.

library(readr) #data import tool, part of the Tidyverse.

library(dplyr) #perfect package for data manipulation, queries and much more, part of the Tidyverse.

library(ggplot2) #the subject of this post, package for data visualizations, part of the Tidyverse.

library(RColorBrewer) #package which is helpful while we are choosing the colors.

I downloaded data file "COVID_data_2021_01_19" from website: https://shiny.rstudio.com/gallery/covid19-tracker. Thanks to readr package I import dataset to R and transform it a bit. I used dplyr to pick some observations I wanted to visualized therefore I created "covid2" data.frame.

covid<-COVID_data_2021_01_19

covid$country=as.factor(covid$country) # We make sure that this variables is factor.

covid$date=as.Date(covid$date,format='%m%d%y') #We are putting the date format.

covid=as.data.frame(covid) # data set us data.frame

covid2<- covid%>%filter(country == c("France","Poland","UK","Germany","Spain","Italy"))

head(covid2)

The first six rows of my data.frame looks as following:

country date cumulative_cases new_cases_past_week cumulative_deaths new_

1 France 2020-02-04 6 2 0 0

2 UK 2020-02-25 34 15 0 0

3 France 2020-03-03 212 198 4 3

4 UK 2020-03-03 189 155 0 0

5 France 2020-03-10 1783 1571 33 29

6 Italy 2020-03-24 69176 37670 6820 4317

cumulative_cases_per_million new_cases_per_million_past_week cumul

1 0.1 0.0 0.0

2 0.5 0.2 0.0

3 3.2 3.0 0.1

4 2.8 2.3 0.0

5 27.3 24.1 0.5

6 1144.1 623.0 112.8

new_deaths_per_million_past_week

1 0.0

2 0.0

3 0.0

4 0.0

5 0.4

6 71.4

The covid2 data.frame contains country which I have chosen: France, Poland, UK, Germany, Spain, Italy. The variable "country" is categorical variables. The remaining variables are numeric except "date" variables.

Multiseries line chart

ggplot(data=covid2, aes(x=date, y=cumulative_cases,color=country))+ geom_line(size=1.25)+ theme_gray(base_size = 12)+

ggtitle("Number of covid 19 cases")+

theme(plot.title=element_text(hjust=0.5,face = "bold"))+xlab("Month")+

ylab("Number of cases")+

theme(panel.background = element_rect(fill="cornsilk"))+

guides(color=guide_legend(title = "Country", label.position = "right",reverse=T))

The results of this code looks like this:

Let me explain now what each of the ggplot2 layers means. We start with defining the data we are going to use which is covid2 data.frame. Then we define aesthetics. On x-axis we take date, on y-axis numeric variable "cumulative_cases" and additionally we will visualize categoric variable "country" by colour differentiation. We need to precise now what type of chart we want. I did it by using geom_line() which returns a line chart. If we want different type of chart we will use different geom function. To have better visibility I added size equal 1.25.

The remain function I used help me with formatting the chart. With ggtitle() I defined the main title and with theme() I bold this title and put it in the middle of chart. I add the name of the x and y axis with xlab() and ylab(). With theme() I changed the colour of the background to " cornsilk". At the end I format also the legend with the function guides().

Scatterplot

ggplot(data=covid2,
aes(x=cumulative_deaths,y=cumulative_cases,color=country))+
geom_point()+
geom_smooth(method="lm")+
ggtitle("Correlation between number of deaths vs. number of cases per country")+
xlab("Number of deaths")+ylab("Number of cases")+ theme_gray(base_size=12)+ theme(plot.title=element_text(hjust=0.5),title=element_text(face = "bold"))+theme(panel.background = element_rect(fill="cornsilk"))+guides(color=guide_legend(title = "Country",label.position = "left", reverse=T))

The results of this code looks like this:

Let me explain now what each of the ggplot2 layers means. We start with defining the data we are going to use which is covid2 data.frame. Then we define aesthetics. On x-axis we take numeric variable "cumulative_deaths", on y-axis numeric variable "cumulative_cases" and additionally we will visualize categoric variable "country" by colour differentiation. We need to precise now what type of chart we want. I did it by using geom_point() which returns a scatterplot. With geom_smooth() I add regression line.

The remain function I used, help me with formatting the chart. With ggtitle() I defined the main title. I add the name of the x and y axis with xlab() and ylab(). With theme() I bold the title and put it in the middle of chart, I changed the color of the background to " cornsilk". At the end I format also the legend with the function guides().

My journey through the data science - by Karolina M'Goma

Search This Blog

Ggplot2 visualizations - examples - in the subject of Covid19 again!

Comments

Post a Comment

Popular posts from this blog

Model Residuals in Time Series Data

Random number generators, reproducibility and sampling with dplyr

The Power of dplyr in R - part 3