pivot_wider: unexpected behavior specifying id_cols to omit that are included in names/values_from #1506

jwhendy · 2023-06-28T17:06:19Z

I was running pivot_wider on some data and was surprised by the inability to use -c(col1, col2) to choose my id_cols, resulting in the error:

Error in `pivot_wider()`:
`id_cols` can't select a column already selected by `names_from`.
Column `type` has already been selected.

Repro:

library(dplyr)
library(tidyr)

tmp <- data.frame(id1 = c("a", "a", "b", "b"),
                  unused = c(NA, NA, NA, NA),
                  type = c("c", "d", "c", "d"),
                  values = c(1, 2, 3, 4))

Base case:

tmp %>% pivot_wider(id_cols = id1, names_from = type, values_from = values)

# A tibble: 2 × 3
  id1       c     d
  <chr> <dbl> <dbl>
1 a         1     2
2 b         3     4

But say you had a lot of columns; it's more concise to remove a few than name them all. Neither of these work, and produce the error above:

tmp %>% pivot_wider(id_cols = -c(type, values, unused), names_from = type, values_from = values)
tmp %>% pivot_wider(id_cols = c(-type, -values, -unused), names_from = type, values_from = values)

My failure mode may be covered by this statement from the docs:

id_cols [...] Defaults to all columns in data except for the columns specified through names_from and values_from. If a tidyselect expression is supplied, it will be evaluated on data after removing the columns specified through names_from and values_from.

This is why I included the "unused" column, as for data with many columns, one would have to think about "ok, I'm removing type and values 'for free' since they are used in other args, but I do need to remember to remove those other columns."

tmp %>% pivot_wider(id_cols = -unused, names_from = type, values_from = values)

# A tibble: 2 × 3
  id1       c     d
  <chr> <dbl> <dbl>
1 a         1     2
2 b         3     4

Thoughts:

this is a bug, in that there should be no problem specifying columns to drop, even if they are implicitly dropped by being passed to names_from or values_from
this is not a bug, but documentation could be improved. I was confused by the message: "Column type has already been selected"... by what/how?" It was non-intuitive to me that it's "selected" when I'm trying to explicitly not select it as an id_col
not a bug, and the documentation is perfectly clear. I admittedly don't use pivot functions that often, so it could be a misunderstanding on my part.

The text was updated successfully, but these errors were encountered:

hadley · 2023-11-01T18:52:55Z

Could you please rework your reproducible example to use the reprex package ? That makes it easier to see both the input and the output, formatted in such a way that I can easily re-run in a local session. Thanks!

jwhendy · 2023-11-01T20:41:37Z

@hadley I admit I haven't used this before, so hopefully the below is the RightWay. I don't see much advantage other than the setup code and the failing code aren't separated by one line of my dialog.

Here's a reprex for the case that surprised me:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(tidyr)

tmp <- data.frame(id1 = c("a", "a", "b", "b"),
                  unused = c(NA, NA, NA, NA),
                  type = c("c", "d", "c", "d"),
                  values = c(1, 2, 3, 4))
tmp %>% pivot_wider(id_cols = -c(type, values, unused), names_from = type, values_from = values)
#> Error in `pivot_wider()`:
#> ! `id_cols` can't select a column already selected by `names_from`.
#> ℹ Column `type` has already been selected.
#> Backtrace:
#>      ▆
#>   1. ├─tmp %>% ...
#>   2. ├─tidyr::pivot_wider(...)
#>   3. ├─tidyr:::pivot_wider.data.frame(., id_cols = -c(type, values, unused), names_from = type, values_from = values)
#>   4. │ └─tidyr:::build_wider_id_cols_expr(...)
#>   5. │   └─tidyr:::select_wider_id_cols(...)
#>   6. │     ├─rlang::try_fetch(...)
#>   7. │     │ └─base::withCallingHandlers(...)
#>   8. │     └─tidyselect::eval_select(...)
#>   9. │       └─tidyselect:::eval_select_impl(...)
#>  10. │         ├─tidyselect:::with_subscript_errors(...)
#>  11. │         │ └─rlang::try_fetch(...)
#>  12. │         │   └─base::withCallingHandlers(...)
#>  13. │         └─tidyselect:::vars_select_eval(...)
#>  14. │           └─tidyselect:::walk_data_tree(expr, data_mask, context_mask)
#>  15. │             └─tidyselect:::eval_minus(expr, data_mask, context_mask, error_call)
#>  16. │               └─tidyselect:::eval_bang(expr, data_mask, context_mask)
#>  17. │                 └─tidyselect:::walk_data_tree(expr[[2]], data_mask, context_mask)
#>  18. │                   └─tidyselect:::eval_c(expr, data_mask, context_mask)
#>  19. │                     └─tidyselect:::reduce_sels(node, data_mask, context_mask, init = init)
#>  20. │                       └─tidyselect:::walk_data_tree(new, data_mask, context_mask)
#>  21. │                         └─tidyselect:::as_indices_sel_impl(...)
#>  22. │                           └─tidyselect:::as_indices_impl(...)
#>  23. │                             └─tidyselect:::chr_as_locations(x, vars, call = call, arg = arg)
#>  24. │                               └─vctrs::vec_as_location(...)
#>  25. ├─vctrs (local) `<fn>`()
#>  26. │ └─vctrs:::stop_subscript_oob(...)
#>  27. │   └─vctrs:::stop_subscript(...)
#>  28. │     └─rlang::abort(...)
#>  29. │       └─rlang:::signal_abort(cnd, .file)
#>  30. │         └─base::signalCondition(cnd)
#>  31. ├─rlang (local) `<fn>`(`<vctrs___>`)
#>  32. │ └─handlers[[1L]](cnd)
#>  33. │   └─rlang::cnd_signal(cnd)
#>  34. │     └─rlang:::signal_abort(cnd)
#>  35. │       └─base::signalCondition(cnd)
#>  36. └─rlang (local) `<fn>`(`<vctrs___>`)
#>  37.   └─handlers[[1L]](cnd)
#>  38.     └─tidyr:::rethrow_id_cols_oob(...)
#>  39.       └─tidyr:::stop_id_cols_oob(i, "names_from", call = call)
#>  40.         └─cli::cli_abort(...)
#>  41.           └─rlang::abort(...)
Created on 2023-11-01 with [reprex v2.0.2](https://reprex.tidyverse.org/)

hadley · 2023-11-01T21:06:29Z

@jwhendy it doesn't look like you copied and pasted it correctly. And it does make my life easier having all the code in one block, because there's just one thing to copy and paste, rather than having to stitch together multiple pieces. You can also remove the call to dplyr, because that doesn't seem necessary.

library(tidyr)

tmp <- data.frame(
  id1 = c("a", "a", "b", "b"),
  unused = c(NA, NA, NA, NA),
  type = c("c", "d", "c", "d"),
  values = c(1, 2, 3, 4)
)
tmp |> pivot_wider(id_cols = -c(type, values, unused), names_from = type, values_from = values)
#> Error in `pivot_wider()`:
#> ! `id_cols` can't select a column already selected by `names_from`.
#> ℹ Column `type` has already been selected.

^{Created on 2023-11-01 with reprex v2.0.2}

The error message doesn't seem correct because you're not actually selecting type; you're unselecting it.

jwhendy · 2023-11-02T00:23:00Z

Understood, and don't want to be a hassle! Boy, the docs are ~~confusing~~ (edit: not a good word choice... just mean for how simple the steps are, to not have obtained the right result is a bummer) for how simple this is supposed to be!

Let’s say you copy this code onto your clipboard (or, on RStudio Server or Cloud, select it):

I selected this, then copied to clip board.

library(dplyr)
library(tidyr)

tmp <- data.frame(id1 = c("a", "a", "b", "b"),
                  unused = c(NA, NA, NA, NA),
                  type = c("c", "d", "c", "d"),
                  values = c(1, 2, 3, 4))
tmp %>% pivot_wider(id_cols = -c(type, values, unused), names_from = type, values_from = values)

Then call reprex(), where the default target venue is GitHub:

So I ran reprex() in the rstudio R prompt.

The relevant bit of GitHub-flavored Markdown is ready to be pasted from your clipboard.

Coming back here to paste:

tmp %>% pivot_wider(id_cols = -c(type, values, unused), names_from = type, values_from = values)
#> Error in tmp %>% pivot_wider(id_cols = -c(type, values, unused), names_from = type, : could not find function "%>%"
library(tidyr)

tmp <- data.frame(id1 = c("a", "a", "b", "b"),
                  unused = c(NA, NA, NA, NA),
                  type = c("c", "d", "c", "d"),
                  values = c(1, 2, 3, 4))
tmp %>% pivot_wider(id_cols = -c(type, values, unused), names_from = type, values_from = values)
#> Error in `pivot_wider()`:
#> ! `id_cols` can't select a column already selected by `names_from`.
#> ℹ Column `type` has already been selected.
#> Backtrace:
#>      ▆
#>   1. ├─tmp %>% ...

### snipped for brevity, but same as above

#>  41.           └─rlang::abort(...)

<sup>Created on 2023-11-01 with [reprex v2.0.2](https://reprex.tidyverse.org)</sup>

At least the output seems reproducible 2x in a row :)

You can also remove the call to dplyr, because that doesn't seem necessary.

My bad. I thought %>% came from dplyr.

The error message doesn't seem correct because you're not actually selecting type; you're unselecting it.

That was my thinking, though I regularly consider myself in noob territory despite having used R a long time! At the least, I thought the message could clarify why this is problematic. The example seems trivial, but at the time, there were a bunch of columns, so I'd much rather id_cols = -c(start_col:end_col).

Thanks for taking a look!

hadley · 2023-11-02T12:22:41Z

Do you have the latest version of reprex? And how are you running R? (e.g. in RStudio on your desktop?)

jwhendy · 2023-11-02T12:35:57Z

Would you like me to create a ticket in the reprex repo? I just installed it yesterday. After installing, I wasn't sure if any loaded environment objects would goof things up, so my process was:

install.packages("reprex")
quit/restart Rstudio
put the code above into a random .Rmd file, select, cmd+c
reprex()
copy the output here

> sessionInfo()
R version 4.2.3 (2023-03-15)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Ventura 13.6.1

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] tibble_3.2.1     vroom_1.6.1      readr_2.1.4      writexl_1.4.2    stringr_1.5.0    readxl_1.4.2     openxlsx_4.2.5.2 odbc_1.3.4      
 [9] dotenv_1.0.3     DBI_1.1.3        tidyr_1.3.0      dplyr_1.1.2      reprex_2.0.2    

loaded via a namespace (and not attached):
 [1] zip_2.3.0        Rcpp_1.0.10      cellranger_1.1.0 compiler_4.2.3   pillar_1.9.0     tools_4.2.3      digest_0.6.31    bit_4.0.5       
 [9] evaluate_0.20    lifecycle_1.0.3  pkgconfig_2.0.3  rlang_1.1.1      cli_3.6.1        rstudioapi_0.14  yaml_2.3.7       xfun_0.39       
[17] fastmap_1.1.1    withr_2.5.0      knitr_1.42       generics_0.1.3   fs_1.6.1         vctrs_0.6.3      hms_1.1.3        bit64_4.0.5     
[25] tidyselect_1.2.0 glue_1.6.2       R6_2.5.1         processx_3.8.2   fansi_1.0.4      rmarkdown_2.21   tzdb_0.3.0       purrr_1.0.1     
[33] callr_3.7.3      clipr_0.8.0      blob_1.2.4       magrittr_2.0.3   ps_1.7.5         htmltools_0.5.5  utf8_1.2.3       stringi_1.7.12  
[41] crayon_1.5.2

hadley · 2023-11-02T14:16:51Z

@jwhendy yes please!

jwhendy · 2023-11-11T00:47:47Z

@hadley I'm delayed, but tis done. Thanks!

hadley added the reprex needs a minimal reproducible example label Nov 1, 2023

hadley added feature a feature request or enhancement pivoting ♻️ pivot rectangular data to different "shapes" and removed reprex needs a minimal reproducible example labels Nov 1, 2023

jwhendy mentioned this issue Nov 11, 2023

Output when following documentation instructions does not match expected result tidyverse/reprex#451

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pivot_wider: unexpected behavior specifying id_cols to omit that are included in names/values_from #1506

pivot_wider: unexpected behavior specifying id_cols to omit that are included in names/values_from #1506

jwhendy commented Jun 28, 2023

hadley commented Nov 1, 2023

jwhendy commented Nov 1, 2023

hadley commented Nov 1, 2023 •

edited

Loading

jwhendy commented Nov 2, 2023 •

edited

Loading

hadley commented Nov 2, 2023

jwhendy commented Nov 2, 2023

hadley commented Nov 2, 2023

jwhendy commented Nov 11, 2023

pivot_wider: unexpected behavior specifying id_cols to omit that are included in names/values_from #1506

pivot_wider: unexpected behavior specifying id_cols to omit that are included in names/values_from #1506

Comments

jwhendy commented Jun 28, 2023

hadley commented Nov 1, 2023

jwhendy commented Nov 1, 2023

hadley commented Nov 1, 2023 • edited Loading

jwhendy commented Nov 2, 2023 • edited Loading

hadley commented Nov 2, 2023

jwhendy commented Nov 2, 2023

hadley commented Nov 2, 2023

jwhendy commented Nov 11, 2023

hadley commented Nov 1, 2023 •

edited

Loading

jwhendy commented Nov 2, 2023 •

edited

Loading