From 57445342ad1596b418796910c4bc22b4d847240b Mon Sep 17 00:00:00 2001 From: David Budzynski <56514985+davidbudzynski@users.noreply.github.com> Date: Fri, 3 Nov 2023 07:57:31 +0000 Subject: [PATCH] Update vignettes/datatable-benchmarking.Rmd Co-authored-by: Michael Chirico --- vignettes/datatable-benchmarking.Rmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/vignettes/datatable-benchmarking.Rmd b/vignettes/datatable-benchmarking.Rmd index 58c94afb6..b9741d6ee 100644 --- a/vignettes/datatable-benchmarking.Rmd +++ b/vignettes/datatable-benchmarking.Rmd @@ -24,7 +24,7 @@ sudo lshw -class disk sudo hdparm -t /dev/sda ``` -When comparing `fread` to non-R solutions be aware that R requires values of character columns to be added to _R's global string cache_. This takes time when reading data but later operations benefit since the character strings have already been cached. Consequently, as well as timing isolated tasks (such as `fread` alone), it's a good idea to benchmark a pipeline of tasks such as reading data, computing operators and producing final output and report the total time of the pipeline. +When comparing `fread` to non-R solutions be aware that R requires values of character columns to be added to _R's global string cache_. This takes time when reading data but later operations benefit since the character strings have already been cached. Consequently, in addition to timing isolated tasks (such as `fread` alone), it's a good idea to benchmark the total time of an end-to-end pipeline of tasks such as reading data, manipulating it, and producing final output. # subset: threshold for index optimization on compound queries