Skip to content

Instantly share code, notes, and snippets.

@vascoosx
Last active September 7, 2017 06:02
Show Gist options
  • Select an option

  • Save vascoosx/83f59e6fe598a73f9358373cb144b811 to your computer and use it in GitHub Desktop.

Select an option

Save vascoosx/83f59e6fe598a73f9358373cb144b811 to your computer and use it in GitHub Desktop.
data.table optimization tip
library(rbenchmark)
library(data.table)
n <- 100000
f <- sample(c("a","b","c"),n, replace=TRUE)
a <- data.table(a=runif(n,0,100),b=runif(n,0,100),f=f)
setindex(a,"a")
setindex(a,"b")
setindex(a,"f")
op = options(datatable.verbose=FALSE)
benchmark(a[a<49 & b >49 & f %in% c("a","b"),],replications = 1000)
# test replications elapsed relative user.self sys.self user.child
# 1 a[a < 49 & b > 49 & f %in% c("a", "b"), ] 1000 12.89 1 5.03 0.88 NA
# sys.child
# 1 NA
# use chaining to use index
benchmark(a[a<49,][b >49,][f %in% c("a","b"),],replications = 1000)
# test replications elapsed relative user.self sys.self
# 1 a[a < 49, ][b > 49, ][f %in% c("a", "b"), ] 1000 10.91 1 4.69 0.34
# user.child sys.child
# 1 NA NA
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment