Data.table is an extremely fast and memory efficient package for transforming data. Many people use it while struggling a big dataset to save some time and memory space. The second class of data.table is data.frame which is a good news because it means that functions that work with data.frame also work with data.table. Data.table has sql like query commands. It looks like this:
dt[ i, j, by]
i= subset (rows) to be extracted based on a condition
j= calculation to be performed on the subset
by= grouping parameter that serves as a base for aggregation. Very often it is column or a vector.
Quering data with data.table
I will use again mtcars dataset which is included in your base R program to present some queries using data.table . Mtcars has 32 observations on 11 (numeric) variables.
library(data.table)
dt=data.table(mtcars)
By using commends bellow we will check the class of dt and of its content:
class(dt)
[1] "data.table" "data.frame"
sapply(dt,class)
mpg cyl disp hp drat wt qsec vs am gear carb
"numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric"
dt[3,] # subset of row nr 3
mpg cyl disp hp drat wt qsec vs am gear carb
1: 22.8 4 108 93 3.85 2.32 18.61 1 1 4 1
dt[,3] #subset of column number 3
disp
1: 160.0
2: 160.0
3: 108.0
4: 258.0
5: 360.0
6: 225.0
7: 360.0
8: 146.7
9: 140.8
10: 167.6
11: 167.6
12: 275.8
13: 275.8
14: 275.8
15: 472.0
16: 460.0
17: 440.0
18: 78.7
19: 75.7
20: 71.1
21: 120.1
22: 318.0
23: 304.0
24: 350.0
25: 400.0
26: 79.0
27: 120.3
28: 95.1
29: 351.0
30: 145.0
31: 301.0
32: 121.0
dt[cyl==4,] #subset of rows where cyl=4
mpg cyl disp hp drat wt qsec vs am gear carb
1: 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
2: 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
3: 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
4: 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
5: 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
6: 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
7: 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
8: 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
9: 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
10: 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
11: 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
dt[cyl==4 & gear > 4] # subset of rows where cyl=4 and gear >4
mpg cyl disp hp drat wt qsec vs am gear carb
1: 26.0 4 120.3 91 4.43 2.140 16.7 0 1 5 2
2: 30.4 4 95.1 113 3.77 1.513 16.9 1 1 5 2

Comments
Post a Comment