My journey through the data science

Posts

Showing posts from December, 2020

Random number generators, reproducibility and sampling with dplyr

Let's assume that you want to take some random observations from your data set. Dplyr helps you with the function sample_n(). To make your code reproducible you seed the ID of a “random” set of values. You need to indicate number of rows you want to extract and specify if the rows should be replaced or not. To show you how it works I will use again mtcars dataset which is included in your base R program. Let's see first six rows of this data frame. library(dplyr) data("mtcars") head(mtcars) mpg cyl disp hp drat wt qsec vs am gear carb Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 Datsun 710 22.8 4 108 93 3.85 2.320 18.61...

Joining observation units with dplyr

Today I would like to show examples of different ways you can join data frames. Let's define and display them first. In first data.frame I will collect some information's about certain. It will contain name, high and nationality. df1<-data.frame(name=c("Ania","Marek","Kamil","Joanna","Patrice"),high=c(178,190,175,168,175),nationality=c("polish","polish","polish","polish","french")) df1 name high nationality 1 Ania 178 polish 2 Marek 190 polish 3 Kamil 175 polish 4 Joanna 168 polish 5 Patrice 175 french In second data.frame I will put observation about other group, containing their name and weight. What does this two data.frame have in common ? We can see that both contain column with the name of the person and what is more some person like Ania and Pat...

My journey through the data science - by Karolina M'Goma

Search This Blog

Posts

Random number generators, reproducibility and sampling with dplyr

Joining observation units with dplyr