Skip to main content

Posts

Showing posts from December, 2020

Random number generators, reproducibility and sampling with dplyr

Let's assume that you want to take some random observations from your data set. Dplyr helps you with the function sample_n(). To make your code reproducible you seed the ID of a “random” set of values. You need to indicate number of rows you want to extract and specify if the rows should be replaced or not. To show you how it works I will use again mtcars dataset which is included in your base R program. Let's see first six rows of this data frame.  library(dplyr) data("mtcars") head(mtcars)                    mpg cyl disp  hp drat    wt  qsec vs am gear carb Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4 Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4 Datsun 710        22.8   4  108  93 3.85 2.320 18.61...

Joining observation units with dplyr

Today I would like to show examples of different ways you can join data frames. Let's define and display them first. In first data.frame I will collect some information's about certain. It will contain name, high and nationality. df1<-data.frame(name=c("Ania","Marek","Kamil","Joanna","Patrice"),high=c(178,190,175,168,175),nationality=c("polish","polish","polish","polish","french")) df1       name high nationality 1    Ania   178      polish 2   Marek   190      polish 3   Kamil   175      polish 4  Joanna   168      polish 5 Patrice   175      french In second data.frame I will put observation about other group, containing their name and weight. What does this two data.frame have in common  ? We can see that both contain column with the name of the person and what is more some person like Ania and Pat...