Skip to content

Commit

Permalink
fix typos and adhere to British English spelling
Browse files Browse the repository at this point in the history
  • Loading branch information
davidbudzynski committed Jun 5, 2022
1 parent bcbdf93 commit b0036bc
Showing 1 changed file with 5 additions and 5 deletions.
10 changes: 5 additions & 5 deletions vignettes/datatable-keys-fast-subset.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ knitr::opts_chunk$set(
collapse = TRUE)
```

This vignette is aimed at those who are already familiar with *data.table* syntax, its general form, how to subset rows in `i`, select and compute on columns, add/modify/delete columns *by reference* in `j` and group by using `by`. If you're not familiar with these concepts, please read the *"Introduction to data.table"* and *"Reference semantics"* vignettes first.
This vignette is aimed at those who are already familiar with *data.table* syntax, its general form, how to subset rows in `i`, select and compute on columns, add/modify/delete columns *by reference* in `j` and group by using `by`. If you're not familiar with these concepts, please read the *"Introduction to data.table"* and *"Reference semantics"* vignettes first.

***

Expand Down Expand Up @@ -146,7 +146,7 @@ head(flights)

#### set* and `:=`:

In *data.table*, the `:=` operator and all the `set*` (e.g., `setkey`, `setorder`, `setnames` etc..) functions are the only ones which modify the input object *by reference*.
In *data.table*, the `:=` operator and all the `set*` (e.g., `setkey`, `setorder`, `setnames` etc...) functions are the only ones which modify the input object *by reference*.

Once you *key* a *data.table* by certain columns, you can subset by querying those key columns using the `.()` notation in `i`. Recall that `.()` is an *alias to* `list()`.

Expand Down Expand Up @@ -238,7 +238,7 @@ flights[.(unique(origin), "MIA")]

#### What's happening here?

* Read [this](#multiple-key-point) again. The value provided for the second key column *"MIA"* has to find the matching values in `dest` key column *on the matching rows provided by the first key column `origin`*. We can not skip the values of key columns *before*. Therefore we provide *all* unique values from key column `origin`.
* Read [this](#multiple-key-point) again. The value provided for the second key column *"MIA"* has to find the matching values in `dest` key column *on the matching rows provided by the first key column `origin`*. We can not skip the values of key columns *before*. Therefore, we provide *all* unique values from key column `origin`.

* *"MIA"* is automatically recycled to fit the length of `unique(origin)` which is *3*.

Expand Down Expand Up @@ -307,7 +307,7 @@ key(flights)

* And on those row indices, we replace the `key` column with the value `0`.

* Since we have replaced values on the *key* column, the *data.table* `flights` isn't sorted by `hour` any more. Therefore, the key has been automatically removed by setting to NULL.
* Since we have replaced values on the *key* column, the *data.table* `flights` isn't sorted by `hour` anymore. Therefore, the key has been automatically removed by setting to NULL.

Now, there shouldn't be any *24* in the `hour` column.

Expand Down Expand Up @@ -393,7 +393,7 @@ flights[origin == "JFK" & dest == "MIA"]

One advantage very likely is shorter syntax. But even more than that, *binary search based subsets* are **incredibly fast**.

As the time goes `data.table` gets new optimization and currently the latter call is automatically optimized to use *binary search*.
As the time goes `data.table` gets new optimisation and currently the latter call is automatically optimized to use *binary search*.
To use slow *vector scan* key needs to be removed.

```{r eval = FALSE}
Expand Down

0 comments on commit b0036bc

Please sign in to comment.