My journey through the data science

Posts

Showing posts from November, 2020

The Power of dplyr in R - part 3

Today I would like to present pipe operator which simplify our code and makes it more readable. As we can see all of the dplyr functions take a data frame (or tibble) as the first argument. Dplyr provides the %>% operator from magrittr that chains the functions so x %>% f(y) turns into f(x, y). Therefore the result from one step is then “piped” into the next step. We will use pipe operator in further examples. Additionally we will focus on grouping, ordering and summarising functions. As previously I will continue using mtcars dataset which is included in your R base program. count() #count the unique values of one or more variables n() n_distinct() #number of unique observation found in a category group_by() # group by a column, allows to group operation in the “split-apply-combine" concept library(dplyr) data("mtcars") head(mtcars) mpg cyl disp hp drat...

The Power of dplyr in R - part 2

Let's continue our adventure with dplyr package. In the previous article I introduced select() function which select a subset of columns. Today we will focus on how to pick the observation and add a new column. We will continue using mtcars dataset which is included in your R base program. Also I would like to say that all the posts I publish here requires basic knowledge of R and R Studio program. If you are totally new in R and don't have it installed on your computer I strongly recommend you to find some on-line tutorials and start with R fundamentals. library(dplyr) data("mtcars") head(mtcars) Now, let's introduce function: filter() filter a subset of rows (pick the observation) head(filter(.data=mtcars,mpg>20 & vs==0)) #filter rows where mpg>20 and vs= 0 mpg cyl disp hp drat wt qsec vs am gear carb Mazda RX4 21 6 160.0 110 3.90 2.620 16.46...

The Power of dplyr in R - part 1

The dplyr is one of the library in Tidyverse package. In other word a collection of R libraries that work together in order to achieve clean and tidy data. I have started the discovery of its content while learning process of data pre-processing, data aggregation. It turns out to be very efficient, easy to use and fast tool so lot of people including me use it very often. It will help you with manipulation of data.frame, queries, sorting, summary statistics, joining tables and more. My math’s teacher used to say that when you are trying to solve the problem it matters which way you choose to achieve the goal. It is up to us to choose the most efficient tool so all the process will go smoothly. This is the reason why dplyr package is worth learning! It allows you not only to do your tasks but it will do it in quite easy and fast way. Pay attention for data you are taking while using dplyr - it can be tibble or data.frame. I will use mtcars dataset which is i...

My journey through the data science - by Karolina M'Goma

Search This Blog

Posts

The Power of dplyr in R - part 3

The Power of dplyr in R - part 2

The Power of dplyr in R - part 1