Session informnation for reproducibility:

sessionInfo()
## R version 3.6.2 (2019-12-12)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS Catalina 10.15.3
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## loaded via a namespace (and not attached):
##  [1] compiler_3.6.2  magrittr_1.5    tools_3.6.2     htmltools_0.4.0 yaml_2.2.1      Rcpp_1.0.3      stringi_1.4.6  
##  [8] rmarkdown_2.1   knitr_1.28      stringr_1.4.0   xfun_0.12       digest_0.6.24   rlang_0.4.4     evaluate_0.14

Advanced R

To gain a deep understanding of how R works, the book Advanced R by Hadley Wickham is a must read. Read now to save numerous hours you might waste in future.

We cover select topics on coding style, benchmarking, profiling, debugging, parallel computing, byte code compiling, Rcpp, and package development.

Style

Benchmark

Sources:

In order to identify performance issue, we need to measure runtime accurately.

system.time

set.seed(280)
x <- runif(1e6)

system.time({sqrt(x)})
##    user  system elapsed 
##   0.005   0.003   0.007
system.time({x ^ 0.5})
##    user  system elapsed 
##   0.033   0.000   0.033
system.time({exp(log(x) / 2)})
##    user  system elapsed 
##   0.022   0.000   0.022

From William Dunlap:

“User CPU time” gives the CPU time spent by the current process (i.e., the current R session) and “system CPU time” gives the CPU time spent by the kernel (the operating system) on behalf of the current process. The operating system is used for things like opening files, doing input or output, starting other processes, and looking at the system clock: operations that involve resources that many processes must share. Different operating systems will have different things done by the operating system.

microbenchmark

library("microbenchmark")
library("ggplot2")

mbm <- microbenchmark(
  sqrt(x),
  x ^ 0.5,
  exp(log(x) / 2),
  times = 100
)
mbm
## Unit: milliseconds
##           expr       min        lq      mean    median        uq       max neval
##        sqrt(x)  2.183783  2.560901  3.667116  2.978123  3.284181  9.939705   100
##          x^0.5 22.404323 26.751552 28.667978 28.183113 30.500082 38.932607   100
##  exp(log(x)/2) 15.661730 18.651952 20.454934 20.053379 21.303626 26.354250   100

Results from microbenchmark can be nicely plotted in base R or ggplot2.

boxplot(mbm)

autoplot(mbm)
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.

bench

The bench package is another tool for microbenchmarking. The output is a tibble:

library(bench)

(lb <- bench::mark(
  sqrt(x),
  x ^ 0.5,
  exp(log(x) / 2)
))
## # A tibble: 3 x 6
##   expression         min   median `itr/sec` mem_alloc `gc/sec`
##   <bch:expr>    <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
## 1 sqrt(x)         2.05ms   2.75ms     343.     7.63MB    174. 
## 2 x^0.5          24.47ms  26.42ms      37.1    7.63MB     18.5
## 3 exp(log(x)/2)  17.33ms  18.91ms      51.8    7.63MB     25.9

To visualize the result

plot(lb)
## Loading required namespace: tidyr

It is colored according to GC (garbage collection) levels.

Profiling

Premature optimization is the root of all evil (or at least most of it) in programming.
-Don Knuth

Sources:

First example

library(profvis)

profvis({
  data(diamonds, package = "ggplot2")

  plot(price ~ carat, data = diamonds)
  m <- lm(price ~ carat, data = diamonds)
  abline(m, col = "red")
})

Example: profiling time

First generate test data:

times <- 4e5
cols <- 150
data <- as.data.frame(x = matrix(rnorm(times * cols, mean = 5), ncol = cols))
data <- cbind(id = paste0("g", seq_len(times)), data)
head(data)
##   id       V1       V2       V3       V4       V5       V6       V7       V8       V9      V10      V11      V12
## 1 g1 5.969774 3.386956 4.358790 6.237727 5.771275 5.174312 5.037446 4.430052 3.945362 6.991074 6.534173 6.317850
## 2 g2 4.921927 5.633818 5.694171 4.865483 4.153967 5.061355 4.371293 5.463183 5.313754 5.346877 5.352962 4.297411
## 3 g3 5.800428 5.549560 3.976693 4.171676 4.666550 7.123001 5.098244 5.293764 4.865347 4.281242 4.907693 4.644739
## 4 g4 7.898909 4.615669 4.377937 4.701138 5.101872 5.295811 4.312929 4.085740 3.162529 3.530056 3.844779 5.075276
## 5 g5 6.452353 5.880938 5.926650 4.594801 4.482622 5.954435 4.586131 3.444331 5.776090 3.128314 5.455446 5.906633
## 6 g6 4.469903 6.975048 4.217334 6.039186 3.682259 4.181690 5.154283 5.046682 6.352984 6.289363 5.674194 3.793744
##        V13      V14      V15      V16      V17      V18      V19      V20      V21      V22      V23      V24      V25
## 1 5.068947 5.795827 5.498428 4.983893 4.021139 5.065696 6.310364 5.490599 5.055548 4.238561 5.797920 4.650697 6.232738
## 2 3.143680 5.640137 5.002181 2.896327 4.800560 4.762682 3.607615 4.842496 6.059028 6.391931 7.143750 4.092829 5.430874
## 3 4.526487 5.236702 2.669363 4.749898 5.418325 4.843147 4.703208 5.158282 4.913269 5.997505 6.404710 5.901887 5.027204
## 4 4.674053 4.944213 5.056897 6.646479 4.596020 7.193654 5.363815 5.828282 4.825783 4.223200 4.941217 5.421718 6.068461
## 5 6.148552 4.399970 4.660836 3.235290 5.168906 3.363993 3.267161 6.299007 3.990615 4.938966 6.021251 4.856988 5.456461
## 6 3.325485 5.174891 4.695926 4.282914 3.766488 5.115333 3.881455 5.711343 5.260267 6.811890 6.109790 5.321400 6.915472
##        V26      V27      V28      V29      V30      V31      V32      V33      V34      V35      V36      V37      V38
## 1 5.156774 3.934398 3.779090 4.736236 3.174204 6.293351 5.757692 4.955279 6.137107 3.573192 4.722608 6.093101 5.095926
## 2 4.537045 6.011686 4.729784 4.512812 5.574587 5.457530 4.388429 4.955159 4.581849 5.322105 5.565669 5.911121 5.071331
## 3 4.896893 4.960644 4.795403 4.603998 4.743484 4.563350 3.839632 4.702261 5.063408 5.118905 2.957961 4.564707 2.932893
## 4 4.881969 5.554265 4.336586 6.011450 5.118163 4.219831 5.042094 4.108503 3.184510 4.287867 5.577840 5.290784 6.163332
## 5 6.307981 4.052145 5.367455 5.202699 4.090536 4.852753 3.555741 6.238919 4.414152 4.087591 4.255676 4.117047 5.126822
## 6 5.672714 3.100835 7.315498 5.420842 5.900181 4.055118 4.149567 5.278965 4.367095 2.899642 4.237844 6.699517 5.530437
##        V39      V40      V41      V42      V43      V44      V45      V46      V47      V48      V49      V50      V51
## 1 4.211594 5.593647 4.753439 5.169350 4.883482 3.657573 5.669892 4.241260 4.851962 5.456667 3.641526 5.823936 6.202179
## 2 3.647852 6.422367 6.187101 4.716270 5.597624 4.413653 4.230248 4.871157 5.293433 6.435590 5.709755 5.460538 4.667820
## 3 4.673558 4.913145 5.438515 4.850280 4.626902 4.459188 5.827907 5.663024 3.457956 5.721593 4.349273 5.278583 3.925600
## 4 5.269505 3.863132 5.892379 5.762037 5.179667 3.942267 5.778172 5.932535 5.524665 4.771691 6.172233 6.314465 6.106169
## 5 5.187829 5.426037 4.901255 3.280638 5.652811 4.206320 5.858897 6.658522 5.633509 4.963152 4.258391 6.554396 5.381009
## 6 5.831208 3.738705 3.961880 4.871338 5.252205 4.892354 5.132825 5.681368 7.427848 3.656597 5.196717 5.721946 6.507628
##        V52      V53      V54      V55      V56      V57      V58      V59      V60      V61      V62      V63      V64
## 1 4.682491 6.020682 2.888328 4.884376 5.080382 5.731241 3.716672 3.728804 4.225844 5.893676 5.236128 3.998496 5.052559
## 2 4.270379 4.517615 5.204095 4.635607 5.231288 3.800630 5.785667 3.602729 5.613489 4.841687 4.655119 5.548548 3.668189
## 3 5.579634 4.434822 5.277278 5.749743 5.783613 4.409653 4.344321 4.891546 4.541498 5.192576 5.162769 5.449993 4.251714
## 4 5.754318 5.706827 6.369271 5.991514 3.357682 5.511138 4.634418 4.294338 5.769028 5.738561 4.282375 5.683305 3.788938
## 5 4.826608 5.104414 5.925495 6.495626 3.528083 5.250808 6.254993 7.384261 5.711009 4.899243 4.683286 5.427239 4.776874
## 6 4.720609 4.800197 4.981737 4.804950 4.029324 4.003427 4.317158 5.344575 7.102476 3.520562 3.071539 5.494856 6.203147
##        V65      V66      V67      V68      V69      V70      V71      V72      V73      V74      V75      V76      V77
## 1 2.297626 5.044528 4.422164 4.317439 5.039205 5.306640 4.291307 5.752981 4.452937 4.556165 4.878821 5.494998 4.525805
## 2 5.042682 5.209917 6.233088 3.673400 5.550484 4.341351 5.621920 5.107727 5.582111 5.070413 3.553271 3.302500 4.883199
## 3 4.954122 3.964389 3.632149 5.902254 5.180800 5.905438 6.001834 4.165093 7.069034 5.077702 4.823577 3.950463 2.968174
## 4 4.699785 5.317041 4.387484 2.815468 5.945561 4.348289 4.949502 3.373685 3.325205 5.391997 5.576931 4.869578 5.218397
## 5 5.688282 4.922704 5.282254 5.433178 7.002321 5.236936 6.255325 3.910518 4.194588 5.936640 5.947325 6.494725 6.002182
## 6 6.325418 4.847684 7.013564 5.339205 3.909243 5.400927 5.143125 3.502081 3.909446 4.450924 4.379779 4.845526 5.499161
##        V78      V79      V80      V81      V82      V83      V84      V85      V86      V87      V88      V89      V90
## 1 5.598838 5.510052 4.335159 3.841143 4.854982 5.862762 6.055590 5.570398 5.605576 5.207664 4.721803 4.638047 4.648935
## 2 5.669165 6.934187 5.756974 6.290887 3.717678 5.352763 4.009332 4.243106 6.890147 6.300223 6.049974 5.293047 4.015187
## 3 2.981413 5.337635 5.745397 4.865970 3.963592 4.502974 4.874401 4.185517 4.027433 4.543304 4.727077 4.302911 4.715384
## 4 3.661650 4.783694 7.297424 5.910201 6.051057 3.789773 4.745911 4.428981 3.917806 4.718456 4.085395 6.146197 5.816736
## 5 4.566777 3.771062 5.382190 4.958616 2.225116 5.862221 4.738276 5.119915 3.244853 5.478629 5.415818 7.350988 5.664298
## 6 2.473238 4.043085 3.965896 3.419392 5.939352 4.620255 3.502446 7.679182 7.079275 4.400633 5.172185 5.316849 5.509925
##        V91      V92      V93      V94      V95      V96      V97      V98      V99     V100     V101     V102     V103
## 1 5.864618 4.781209 4.581879 4.869569 4.365018 3.933875 4.926531 5.451521 5.017248 5.636656 4.776153 6.359240 5.665790
## 2 6.509665 4.338551 4.430226 4.618964 4.688742 5.389455 4.288900 5.201093 5.817945 5.068234 5.083228 6.793019 5.144909
## 3 7.099666 4.696381 4.224483 3.484758 4.979978 5.162419 4.980340 3.123095 4.738060 4.542765 4.638650 5.814557 7.978221
## 4 2.364329 5.978111 3.916872 5.119176 6.135774 6.309547 4.931209 4.736936 3.554819 3.171603 4.795017 5.165080 4.827155
## 5 5.541353 6.000726 5.227135 5.361953 3.361382 4.935839 5.301755 4.509777 5.518396 4.667021 5.465201 5.009373 5.611512
## 6 4.510909 3.316033 5.467789 5.780469 4.307894 5.820533 3.036929 5.952751 3.954832 5.889513 5.194754 4.391003 5.226084
##       V104     V105     V106     V107     V108     V109     V110     V111     V112     V113     V114     V115     V116
## 1 6.934336 4.916882 3.682590 3.553191 6.178398 6.588264 3.722993 4.291036 6.209619 3.578261 6.869507 2.238355 4.383842
## 2 3.751889 5.029695 4.816148 3.061924 4.118335 5.784075 5.031129 4.164354 6.112392 5.941705 4.701242 3.920104 7.204458
## 3 5.774194 4.389153 4.837424 4.265490 5.130126 5.268602 5.803634 5.390342 6.347929 7.672582 3.462561 5.254552 4.463835
## 4 4.330540 4.128308 5.855140 4.941786 4.834643 4.263134 3.589032 5.405492 4.955846 4.916321 4.609474 5.251307 4.959063
## 5 5.683406 6.508588 4.252722 4.999025 4.962796 5.694972 4.879281 4.167013 4.673221 4.497916 6.566364 3.930197 4.736816
## 6 4.291935 5.436822 4.295753 3.682477 4.373195 5.968806 4.794256 4.186473 4.809896 4.757196 5.410635 5.014765 5.003775
##       V117     V118     V119     V120     V121     V122     V123     V124     V125     V126     V127     V128     V129
## 1 4.417523 6.048996 4.675138 5.842443 5.167029 5.750582 4.907758 5.865223 5.261798 4.956968 5.643643 5.812596 5.078079
## 2 4.485306 7.334596 5.865684 4.203705 3.606904 5.627711 5.587762 5.654150 5.269067 4.922916 4.823450 5.121979 5.395293
## 3 4.016373 6.311201 4.517438 4.700522 4.729896 4.529697 5.358662 5.112413 4.884113 6.012762 6.285833 4.477839 3.045543
## 4 3.014013 6.210320 4.482665 5.731177 3.996006 6.642559 4.243635 6.820607 6.025009 4.956143 5.423347 5.900742 6.472146
## 5 5.845934 6.098909 4.571074 6.489460 4.710165 5.496734 4.419287 4.525932 4.282508 2.885029 5.337152 4.868927 5.371391
## 6 4.754222 4.567107 3.474825 6.064451 5.758978 3.747149 4.523980 5.090233 4.284195 5.217521 5.638891 3.792649 7.184338
##       V130     V131     V132     V133     V134     V135     V136     V137     V138     V139     V140     V141     V142
## 1 6.353728 4.885178 5.647268 5.396727 4.887594 3.792632 5.802780 3.985640 5.488433 5.278389 6.098580 4.490176 4.915071
## 2 3.354223 5.184184 3.709929 4.930617 4.181895 5.545752 6.747582 5.242106 6.356912 4.750170 5.434490 5.296510 5.959623
## 3 4.332591 3.238270 5.711071 5.764195 4.717790 4.795234 5.768002 5.869586 5.333952 4.018724 7.537137 2.985009 6.095209
## 4 4.903896 3.245853 5.172849 5.101201 4.803509 6.542996 5.211635 6.129498 4.581313 5.013140 4.977602 5.929854 5.276177
## 5 4.888848 4.789275 4.892961 7.136214 6.075162 5.259250 4.575743 4.704665 4.872247 5.141363 6.538962 3.871012 5.762745
## 6 5.835156 4.187135 5.513385 2.575642 5.078832 4.306050 5.178530 4.919545 6.425867 3.329013 4.154823 3.671133 6.976441
##       V143     V144     V145     V146     V147     V148     V149     V150
## 1 4.637860 6.020174 3.860709 4.360477 4.848465 5.005363 6.678806 3.734484
## 2 5.673556 2.154634 6.513392 3.879315 4.890934 5.383120 5.062598 6.181172
## 3 4.210550 4.409794 5.687756 7.131847 6.448625 3.967368 5.851488 5.865080
## 4 4.564731 4.240755 3.590409 5.770194 5.654670 6.992913 6.333515 5.449932
## 5 5.553001 5.418704 4.207660 5.257442 4.447403 3.551037 4.092570 5.093482
## 6 5.135457 4.695289 5.825259 4.921798 3.497042 3.861398 4.491844 5.183752

Original code for centering columns of a dataframe:

profvis({
  # Store in another variable for this run
  data1 <- data
  
  # Get column means
  means <- apply(data1[, names(data1) != "id"], 2, mean)
  
  # Subtract mean from each column
  for (i in seq_along(means)) {
    data1[, names(data1) != "id"][, i] <-
      data1[, names(data1) != "id"][, i] - means[i]
  }
})

Profile apply vs colMeans vs lapply vs vapply:

profvis({
  data1 <- data
  # Four different ways of getting column means
  means <- apply(data1[, names(data1) != "id"], 2, mean)
  means <- colMeans(data1[, names(data1) != "id"])
  means <- lapply(data1[, names(data1) != "id"], mean)
  means <- vapply(data1[, names(data1) != "id"], mean, numeric(1))
})

We decide to use vapply:

profvis({
  data1 <- data
  means <- vapply(data1[, names(data1) != "id"], mean, numeric(1))

  for (i in seq_along(means)) {
    data1[, names(data1) != "id"][, i] <- data1[, names(data1) != "id"][, i] - means[i]
  }
})

Calculate mean and center in one pass:

profvis({
 data1 <- data
 
 # Given a column, normalize values and return them
 col_norm <- function(col) {
   col - mean(col)
 }
 
 # Apply the normalizer function over all columns except id
 data1[, names(data1) != "id"] <-
   lapply(data1[, names(data1) != "id"], col_norm)
})

Example: profiling memory

Original code for cumulative sums:

profvis({
  data <- data.frame(value = runif(1e5))

  data$sum[1] <- data$value[1]
  for (i in seq(2, nrow(data))) {
    data$sum[i] <- data$sum[i-1] + data$value[i]
  }
})

Write a function to avoid expensive indexing by $:

profvis({
  csum <- function(x) {
    if (length(x) < 2) return(x)

    sum <- x[1]
    for (i in seq(2, length(x))) {
      sum[i] <- sum[i-1] + x[i]
    }
    sum
  }
  data$sum <- csum(data$value)
})

Pre-allocate vector:

profvis({
  csum2 <- function(x) {
    if (length(x) < 2) return(x)

    sum <- numeric(length(x))  # Preallocate
    sum[1] <- x[1]
    for (i in seq(2, length(x))) {
      sum[i] <- sum[i-1] + x[i]
    }
    sum
  }
  data$sum <- csum2(data$value)
})

Advice

Modularize big projects into small functions. Profile functions as early and as frequently as possible.

Debugging

Learning sources:

Demo code: parlindrome.R, crazy-talk.R.