diff --git a/vignettes/datatable-benchmarking.Rmd b/vignettes/datatable-benchmarking.Rmd index 7198471b4..86a1a2391 100644 --- a/vignettes/datatable-benchmarking.Rmd +++ b/vignettes/datatable-benchmarking.Rmd @@ -80,7 +80,7 @@ Protecting your `data.table` from being updated by reference operations can be a If your benchmark is meant to be published it will be much more insightful if you will split it to measure time of atomic processes. This way your readers can see how much time was spent on reading data from source, cleaning, actual transformation, exporting results. Of course if your benchmark is meant to present to present an _end-to-end workflow_, then it makes perfect sense to present the overall timing. Nevertheless, separating out timing of individual steps is useful for understanding which steps are the main bottlenecks of a workflow. -There are another cases when it might not be desirable, for example when _reading a csv_, followed by _grouping_. R requires populating _R's global string cache_ which adds extra overhead when importing character data to an R session. On the other hand, the _global string cache_ might speed up processes like _grouping_. In such cases when comparing R to other languages it might be useful to include total timing. +There are other cases when it might not be desirable, for example when _reading a csv_, followed by _grouping_. R requires populating _R's global string cache_ which adds extra overhead when importing character data to an R session. On the other hand, the _global string cache_ might speed up processes like _grouping_. In such cases when comparing R to other languages it might be useful to include total timing. # avoid class coercion @@ -97,7 +97,7 @@ This is very valid. The smaller time measurement is the relatively bigger noise # multithreaded processing -One of the main factors that is likely to impact timings is a number of threads in your machine. In recent versions of `data.table` some functions have been parallelized. +One of the main factors that is likely to impact timings is the number of threads in your machine. In recent versions of `data.table`, some functions have been parallelized. You can control the number of threads you want to use with `setDTthreads`. ```r diff --git a/vignettes/datatable-importing.Rmd b/vignettes/datatable-importing.Rmd index 31b19581a..283be9596 100644 --- a/vignettes/datatable-importing.Rmd +++ b/vignettes/datatable-importing.Rmd @@ -21,7 +21,7 @@ Importing `data.table` is no different from importing other R packages. This vig ## Why to import `data.table` -One of the biggest features of `data.table` is its concise syntax which makes exploratory analysis faster and easier to write and perceive; this convenience can drive packages authors to use `data.table` in their own packages. Another perhaps more important reason is high performance. When outsourcing heavy computing tasks from your package to `data.table`, you usually get top performance without needing to re-invent any high of these numerical optimization tricks on your own. +One of the biggest features of `data.table` is its concise syntax which makes exploratory analysis faster and easier to write and perceive; this convenience can drive packages authors to use `data.table` in their own packages. Another, perhaps more important reason is high performance. When outsourcing heavy computing tasks from your package to `data.table`, you usually get top performance without needing to re-invent any of these numerical optimization tricks on your own. ## Importing `data.table` is easy