diff --git a/articles/data-validation-rules.html b/articles/data-validation-rules.html
index 0b2c212..65fceeb 100644
--- a/articles/data-validation-rules.html
+++ b/articles/data-validation-rules.html
@@ -79,7 +79,7 @@
Duncan
Garmonsway
- 2023-10-28
+ 2023-10-29
Source: vignettes/data-validation-rules.Rmd
data-validation-rules.Rmd
diff --git a/articles/smells.html b/articles/smells.html
index 96673ef..ad87b57 100644
--- a/articles/smells.html
+++ b/articles/smells.html
@@ -79,7 +79,7 @@
Duncan
Garmonsway
- 2023-10-28
+ 2023-10-29
Source: vignettes/smells.Rmd
smells.Rmd
diff --git a/articles/tidyxl.html b/articles/tidyxl.html
index fb3ca25..6bd2e89 100644
--- a/articles/tidyxl.html
+++ b/articles/tidyxl.html
@@ -79,7 +79,7 @@
Duncan
Garmonsway
- 2023-10-28
+ 2023-10-29
Source: vignettes/tidyxl.Rmd
tidyxl.Rmd
@@ -103,7 +103,7 @@ tidyxl
position, formatting and comments in a tidy structure for further
manipulation, especially by the unpivotr package. It
supports the xml-based file formats ‘.xlsx’ and ‘.xlsm’ via the embedded
-RapidXML C++ library. It
+RapidXML C++ library. It
does not support the binary file formats ‘.xlsb’ or ‘.xls’.
It also provides a function xlex()
for tokenizing
formulas. See the vignette
diff --git a/index.html b/index.html
index 3c2030e..a475bba 100644
--- a/index.html
+++ b/index.html
@@ -8,7 +8,7 @@
Read Untidy Excel Files • tidyxl
@@ -19,7 +19,7 @@
-tidyxl imports non-tabular data from Excel files into R. It exposes cell content, position, formatting and comments in a tidy structure for further manipulation, especially by the unpivotr package. It supports the xml-based file formats ‘.xlsx’ and ‘.xlsm’ via the embedded RapidXML C++ library. It does not support the binary file formats ‘.xlsb’ or ‘.xls’.
+tidyxl imports non-tabular data from Excel files into R. It exposes cell content, position, formatting and comments in a tidy structure for further manipulation, especially by the unpivotr package. It supports the xml-based file formats ‘.xlsx’ and ‘.xlsm’ via the embedded RapidXML C++ library. It does not support the binary file formats ‘.xlsb’ or ‘.xls’.
It also provides a function xlex()
for tokenizing formulas. See the vignette for details. It is useful for detecting ‘spreadsheet smells’ (poor practice such as embedding constants in formulas, or using deep levels of nesting), and for understanding the dependency structures within spreadsheets.
@@ -311,18 +311,18 @@
Formulas c("address", "formula", "is_array", "formula_ref", "formula_group",
"error", "logical", "numeric", "date", "character")]
#> # A tibble: 32 × 10
-#> address formula is_array formula_ref formula_group error logical numeric date character
-#> <chr> <chr> <lgl> <chr> <int> <chr> <lgl> <dbl> <dttm> <chr>
-#> 1 A1 "1/0" FALSE <NA> NA #DIV/0! NA NA NA <NA>
-#> 2 A14 "1=1" FALSE <NA> NA <NA> TRUE NA NA <NA>
-#> 3 A15 "A4+1" FALSE <NA> NA <NA> NA 1338 NA <NA>
-#> 4 A16 "DATE(2017,1,18)" FALSE <NA> NA <NA> NA NA 2017-01-18 00:00:00 <NA>
+#> address formula is_array formula_ref formula_group error logical numeric date character
+#> <chr> <chr> <lgl> <chr> <int> <chr> <lgl> <dbl> <dttm> <chr>
+#> 1 A1 "1/0" FALSE <NA> NA #DIV/0! NA NA NA <NA>
+#> 2 A14 "1=1" FALSE <NA> NA <NA> TRUE NA NA <NA>
+#> 3 A15 "A4+1" FALSE <NA> NA <NA> NA 1338 NA <NA>
+#> 4 A16 "DATE(2017,1,18)" FALSE <NA> NA <NA> NA NA 2017-01-18 00:00:00 <NA>
#> 5 A17 "\"Hello, World!\"" FALSE <NA> NA <NA> NA NA NA Hello, Wo…
-#> 6 A19 "$A$18+1" FALSE <NA> NA <NA> NA 2 NA <NA>
-#> 7 B19 "A18+2" FALSE <NA> NA <NA> NA 3 NA <NA>
-#> 8 A20 "$A$18+1" FALSE A20:A21 0 <NA> NA 2 NA <NA>
-#> 9 B20 "A19+2" FALSE B20:B21 1 <NA> NA 4 NA <NA>
-#> 10 A21 "$A$18+1" FALSE <NA> 0 <NA> NA 2 NA <NA>
+#> 6 A19 "$A$18+1" FALSE <NA> NA <NA> NA 2 NA <NA>
+#> 7 B19 "A18+2" FALSE <NA> NA <NA> NA 3 NA <NA>
+#> 8 A20 "$A$18+1" FALSE A20:A21 0 <NA> NA 2 NA <NA>
+#> 9 B20 "A19+2" FALSE B20:B21 1 <NA> NA 4 NA <NA>
+#> 10 A21 "$A$18+1" FALSE <NA> 0 <NA> NA 2 NA <NA>
#> # … with 22 more rows
The top five cells show that the results of formulas are available as usual in the columns error
, logical
, numeric
, date
, and character
.
@@ -351,7 +351,7 @@
Tokenizing formulas
x <- xlex("MIN(3,MAX(2,A1))")
x
-#> root
+#> root
#> ¦-- MIN function
#> °-- ( fun_open
#> ¦-- 3 number
@@ -388,21 +388,21 @@ Data validation rulesxlsx_validation(examples)
#> # A tibble: 15 × 14
#> sheet ref type opera…¹ formu…² formu…³ allow…⁴ show_…⁵ promp…⁶ promp…⁷ show_…⁸ error…⁹ error…˟ error…˟
-#> <chr> <chr> <chr> <chr> <chr> <chr> <lgl> <lgl> <chr> <chr> <lgl> <chr> <chr> <chr>
-#> 1 Sheet1 A106 whole between 0 9 TRUE TRUE messag… messag… TRUE error … error … stop
+#> <chr> <chr> <chr> <chr> <chr> <chr> <lgl> <lgl> <chr> <chr> <lgl> <chr> <chr> <chr>
+#> 1 Sheet1 A106 whole between 0 9 TRUE TRUE messag… messag… TRUE error … error … stop
#> 2 Sheet1 A108 list <NA> $B$108 <NA> TRUE TRUE <NA> <NA> TRUE <NA> <NA> warning
-#> 3 Sheet1 A110 date between 2017-0… 2017-0… TRUE TRUE <NA> <NA> TRUE <NA> <NA> stop
-#> 4 Sheet1 A111 time between 00:00:… 09:00:… TRUE TRUE <NA> <NA> TRUE <NA> <NA> stop
-#> 5 Sheet1 A112 textLe… between 0 9 TRUE TRUE <NA> <NA> TRUE <NA> <NA> stop
-#> 6 Sheet1 A114 whole notBet… 0 9 TRUE TRUE <NA> <NA> TRUE <NA> <NA> stop
-#> 7 Sheet1 A115,A121:A122 whole equal 0 <NA> TRUE TRUE <NA> <NA> TRUE <NA> <NA> stop
-#> 8 Sheet1 A116 whole notEqu… 0 <NA> TRUE TRUE <NA> <NA> TRUE <NA> <NA> stop
-#> 9 Sheet1 A117 whole greate… 0 <NA> TRUE TRUE <NA> <NA> TRUE <NA> <NA> stop
-#> 10 Sheet1 A119 whole greate… 0 <NA> TRUE TRUE <NA> <NA> TRUE <NA> <NA> stop
-#> 11 Sheet1 A120 whole lessTh… 0 <NA> TRUE TRUE <NA> <NA> TRUE <NA> <NA> stop
-#> 12 Sheet1 A118 whole lessTh… 0 <NA> TRUE TRUE <NA> <NA> TRUE <NA> <NA> stop
-#> 13 Sheet1 A107 decimal notBet… 0 9 FALSE FALSE <NA> <NA> FALSE <NA> <NA> stop
-#> 14 Sheet1 A113 custom <NA> A113<=… <NA> TRUE TRUE <NA> <NA> TRUE <NA> <NA> stop
+#> 3 Sheet1 A110 date between 2017-0… 2017-0… TRUE TRUE <NA> <NA> TRUE <NA> <NA> stop
+#> 4 Sheet1 A111 time between 00:00:… 09:00:… TRUE TRUE <NA> <NA> TRUE <NA> <NA> stop
+#> 5 Sheet1 A112 textLe… between 0 9 TRUE TRUE <NA> <NA> TRUE <NA> <NA> stop
+#> 6 Sheet1 A114 whole notBet… 0 9 TRUE TRUE <NA> <NA> TRUE <NA> <NA> stop
+#> 7 Sheet1 A115,A121:A122 whole equal 0 <NA> TRUE TRUE <NA> <NA> TRUE <NA> <NA> stop
+#> 8 Sheet1 A116 whole notEqu… 0 <NA> TRUE TRUE <NA> <NA> TRUE <NA> <NA> stop
+#> 9 Sheet1 A117 whole greate… 0 <NA> TRUE TRUE <NA> <NA> TRUE <NA> <NA> stop
+#> 10 Sheet1 A119 whole greate… 0 <NA> TRUE TRUE <NA> <NA> TRUE <NA> <NA> stop
+#> 11 Sheet1 A120 whole lessTh… 0 <NA> TRUE TRUE <NA> <NA> TRUE <NA> <NA> stop
+#> 12 Sheet1 A118 whole lessTh… 0 <NA> TRUE TRUE <NA> <NA> TRUE <NA> <NA> stop
+#> 13 Sheet1 A107 decimal notBet… 0 9 FALSE FALSE <NA> <NA> FALSE <NA> <NA> stop
+#> 14 Sheet1 A113 custom <NA> A113<=… <NA> TRUE TRUE <NA> <NA> TRUE <NA> <NA> stop
#> 15 Sheet1 A109 list <NA> $B$108 <NA> TRUE TRUE <NA> <NA> TRUE <NA> <NA> inform…
#> # … with abbreviated variable names ¹operator, ²formula1, ³formula2, ⁴allow_blank, ⁵show_input_message, ⁶prompt_title,
#> # ⁷prompt_body, ⁸show_error_message, ⁹error_title, ˟error_body, ˟error_symbol
diff --git a/pkgdown.yml b/pkgdown.yml
index 54006e7..5752193 100644
--- a/pkgdown.yml
+++ b/pkgdown.yml
@@ -5,7 +5,7 @@ articles:
data-validation-rules: data-validation-rules.html
smells: smells.html
tidyxl: tidyxl.html
-last_built: 2023-10-28T23:46Z
+last_built: 2023-10-29T00:10Z
urls:
reference: https://nacnudus.github.io/tidyxl/reference
article: https://nacnudus.github.io/tidyxl/articles
diff --git a/search.json b/search.json
index 8af42da..cc07351 100644
--- a/search.json
+++ b/search.json
@@ -1 +1 @@
-[{"path":"https://nacnudus.github.io/tidyxl/LICENSE.html","id":null,"dir":"","previous_headings":"","what":"MIT License","title":"MIT License","text":"Copyright (c) 2020 Duncan Garmonsway Permission hereby granted, free charge, person obtaining copy software associated documentation files (“Software”), deal Software without restriction, including without limitation rights use, copy, modify, merge, publish, distribute, sublicense, /sell copies Software, permit persons Software furnished , subject following conditions: copyright notice permission notice shall included copies substantial portions Software. SOFTWARE PROVIDED “”, WITHOUT WARRANTY KIND, EXPRESS IMPLIED, INCLUDING LIMITED WARRANTIES MERCHANTABILITY, FITNESS PARTICULAR PURPOSE NONINFRINGEMENT. EVENT SHALL AUTHORS COPYRIGHT HOLDERS LIABLE CLAIM, DAMAGES LIABILITY, WHETHER ACTION CONTRACT, TORT OTHERWISE, ARISING , CONNECTION SOFTWARE USE DEALINGS SOFTWARE.","code":""},{"path":"https://nacnudus.github.io/tidyxl/articles/data-validation-rules.html","id":"what-data-validation-rules-are","dir":"Articles","previous_headings":"","what":"What data validation rules are","title":"Data Validation Rules","text":"Data validation rules control constants can entered cell, e.g. whole number 0 9, one several values another part spreadsheet. ‘xlsx_validation()’ returns data validation rules xlsx file, ranges cells rule applies. rule restricts input integers 0 9 inclusive, value (blank). value attempted, error message displayed imaginative title “message title”, informative body text “message body”, “stop” symbol. gamut possible rules given examples xlsx_validation().","code":"library(tidyxl) library(dplyr) #> #> Attaching package: 'dplyr' #> The following objects are masked from 'package:stats': #> #> filter, lag #> The following objects are masked from 'package:base': #> #> intersect, setdiff, setequal, union library(tidyr) examples <- system.file(\"extdata/examples.xlsx\", package = \"tidyxl\") glimpse(xlsx_validation(examples)[1, ]) #> Rows: 1 #> Columns: 14 #> $ sheet \"Sheet1\" #> $ ref \"A106\" #> $ type \"whole\" #> $ operator \"between\" #> $ formula1 \"0\" #> $ formula2 \"9\" #> $ allow_blank TRUE #> $ show_input_message TRUE #> $ prompt_title \"message title\" #> $ prompt_body \"message body\" #> $ show_error_message TRUE #> $ error_title \"error title\" #> $ error_body \"error body\" #> $ error_symbol \"stop\" as.data.frame(xlsx_validation(examples)) #> sheet ref type operator formula1 #> 1 Sheet1 A106 whole between 0 #> 2 Sheet1 A108 list $B$108 #> 3 Sheet1 A110 date between 2017-01-01 00:00:00 #> 4 Sheet1 A111 time between 00:00:00 #> 5 Sheet1 A112 textLength between 0 #> 6 Sheet1 A114 whole notBetween 0 #> 7 Sheet1 A115,A121:A122 whole equal 0 #> 8 Sheet1 A116 whole notEqual 0 #> 9 Sheet1 A117 whole greaterThan 0 #> 10 Sheet1 A119 whole greaterThanOrEqual 0 #> 11 Sheet1 A120 whole lessThanOrEqual 0 #> 12 Sheet1 A118 whole lessThan 0 #> 13 Sheet1 A107 decimal notBetween 0 #> 14 Sheet1 A113 custom A113<=LEN(B113) #> 15 Sheet1 A109 list $B$108 #> formula2 allow_blank show_input_message prompt_title #> 1 9 TRUE TRUE message title #> 2 TRUE TRUE #> 3 2017-01-09 09:00:00 TRUE TRUE #> 4 09:00:00 TRUE TRUE #> 5 9 TRUE TRUE #> 6 9 TRUE TRUE #> 7 TRUE TRUE #> 8 TRUE TRUE #> 9 TRUE TRUE #> 10 TRUE TRUE #> 11 TRUE TRUE #> 12 TRUE TRUE #> 13 9 FALSE FALSE #> 14 TRUE TRUE #> 15 TRUE TRUE #> prompt_body show_error_message error_title error_body error_symbol #> 1 message body TRUE error title error body stop #> 2 TRUE warning #> 3 TRUE stop #> 4 TRUE stop #> 5 TRUE stop #> 6 TRUE stop #> 7 TRUE stop #> 8 TRUE stop #> 9 TRUE stop #> 10 TRUE stop #> 11 TRUE stop #> 12 TRUE stop #> 13 FALSE stop #> 14 TRUE stop #> 15 TRUE information"},{"path":"https://nacnudus.github.io/tidyxl/articles/data-validation-rules.html","id":"joining-rules-to-cells","dir":"Articles","previous_headings":"","what":"Joining rules to cells","title":"Data Validation Rules","text":"built-functions joining ranges like A1:D5,G8 single cells like B3. now, use snippets section. future might develop dplyr-like join function (hard currently dplyr doesn’t yet join arbitrary functions, even standard inequalities like >=). Help advice gratefully accepted! join rules cells, naive method use sheet ref columns match sheet address columns output xlsx_cells(). Notice 9 cells joined, even though 15 rules defined. Surely least 15 cells joined? reason cells 6 rules don’t exist – rules can defined cells value, cells value returned xlsx_cells(), otherwise 17179869184 cells worksheet must returned. subtle reason certain cells joined successfully ref column rules sometimes refers one cell, can even refer several, non-contiguous ranges cells. Specifically, seventh rule’s ref column A115,A121:A122. Special treatment needed . Ideally, kind join function defined can compare indidual cells ranges. haven’t written one, follows workaround. First, two ranges cells must unnested A115 A121:122. range A121:122 must ‘unranged’ A121 A122. unnest_ref() function can also defined whole data frames, unnesting column references. Finally new data frame rules can joined data frame cells usual ways, via sheet ref columns. Problems approach occur rules defined large ranges cells: ‘unnesting’ ranges results long vectors individual cell addresses, (worse) huge data frames rules. cases commonplace, rules often defined entire columns spreadsheet, column 1048576 rows.","code":"rules <- xlsx_validation(examples) cells <- filter(xlsx_cells(examples), row >= 106, col == 1) rules #> # A tibble: 15 × 14 #> sheet ref type operator formula1 formula2 allow_blank show_input_message #> #> 1 Sheet1 A106 whole between 0 9 TRUE TRUE #> 2 Sheet1 A108 list NA $B$108 NA TRUE TRUE #> 3 Sheet1 A110 date between 2017-01… 2017-01… TRUE TRUE #> 4 Sheet1 A111 time between 00:00:00 09:00:00 TRUE TRUE #> 5 Sheet1 A112 text… between 0 9 TRUE TRUE #> 6 Sheet1 A114 whole notBetw… 0 9 TRUE TRUE #> 7 Sheet1 A115,… whole equal 0 NA TRUE TRUE #> 8 Sheet1 A116 whole notEqual 0 NA TRUE TRUE #> 9 Sheet1 A117 whole greater… 0 NA TRUE TRUE #> 10 Sheet1 A119 whole greater… 0 NA TRUE TRUE #> 11 Sheet1 A120 whole lessTha… 0 NA TRUE TRUE #> 12 Sheet1 A118 whole lessThan 0 NA TRUE TRUE #> 13 Sheet1 A107 deci… notBetw… 0 9 FALSE FALSE #> 14 Sheet1 A113 cust… NA A113<=L… NA TRUE TRUE #> 15 Sheet1 A109 list NA $B$108 NA TRUE TRUE #> # ℹ 6 more variables: prompt_title , prompt_body , #> # show_error_message , error_title , error_body , #> # error_symbol cells #> # A tibble: 93 × 24 #> sheet address row col is_blank content data_type error logical numeric #> #> 1 Sheet1 A106 106 1 FALSE 0 numeric NA NA 0 #> 2 Sheet1 A107 107 1 FALSE 0.1 numeric NA NA 0.1 #> 3 Sheet1 A108 108 1 FALSE 137 character NA NA NA #> 4 Sheet1 A109 109 1 FALSE 137 character NA NA NA #> 5 Sheet1 A110 110 1 FALSE 42736 date NA NA NA #> 6 Sheet1 A111 111 1 FALSE 0.354166… date NA NA NA #> 7 Sheet1 A112 112 1 FALSE 149 character NA NA NA #> 8 Sheet1 A113 113 1 FALSE 10 numeric NA NA 10 #> 9 Sheet1 A114 114 1 FALSE -1 numeric NA NA -1 #> 10 Sheet1 A115 115 1 FALSE 0 numeric NA NA 0 #> # ℹ 83 more rows #> # ℹ 14 more variables: date , character , #> # character_formatted , formula , is_array , #> # formula_ref , formula_group , comment , height , #> # width , row_outline_level , col_outline_level , #> # style_format , local_format_id inner_join(rules, cells, by = c(\"sheet\" = \"sheet\", \"ref\" = \"address\")) #> # A tibble: 9 × 36 #> sheet ref type operator formula1 formula2 allow_blank show_input_message #> #> 1 Sheet1 A106 whole between 0 9 TRUE TRUE #> 2 Sheet1 A108 list NA $B$108 NA TRUE TRUE #> 3 Sheet1 A110 date between 2017-01… 2017-01… TRUE TRUE #> 4 Sheet1 A111 time between 00:00:00 09:00:00 TRUE TRUE #> 5 Sheet1 A112 textLe… between 0 9 TRUE TRUE #> 6 Sheet1 A114 whole notBetw… 0 9 TRUE TRUE #> 7 Sheet1 A107 decimal notBetw… 0 9 FALSE FALSE #> 8 Sheet1 A113 custom NA A113<=L… NA TRUE TRUE #> 9 Sheet1 A109 list NA $B$108 NA TRUE TRUE #> # ℹ 28 more variables: prompt_title , prompt_body , #> # show_error_message , error_title , error_body , #> # error_symbol , row , col , is_blank , content , #> # data_type , error , logical , numeric , date , #> # character , character_formatted , formula , is_array , #> # formula_ref , formula_group , comment , height , #> # width , row_outline_level , col_outline_level , … unrange <- function(x) { limits <- cellranger::as.cell_limits(x) rows <- seq(limits$ul[1], limits$lr[1]) cols <- seq(limits$ul[2], limits$lr[2]) rowcol <- expand.grid(rows, cols) cell_addrs <- cellranger::cell_addr(rowcol[[1]], rowcol[[2]]) cellranger::to_string(cell_addrs, fo = \"A1\", strict = FALSE) } unnest_ref <- function(x, ref) { UseMethod(\"unnest_ref\") } unnest_ref.default <- function(x, ref_col = ref) { stopifnot(is.character(x), length(x) == 1L) refs <- unlist(strsplit(x, \",\", fixed = TRUE)) unlist(lapply(refs, unrange)) } unrange(\"A121:A122\") #> [1] \"A121\" \"A122\" unnest_ref(\"A115,A121:A122\") #> [1] \"A115\" \"A121\" \"A122\" unnest_ref.data.frame <- function(x, ref_col) { ref <- rlang::enquo(ref_col) x[[rlang::quo_name(ref)]] <- lapply(x[[rlang::quo_name(ref)]], unnest_ref) tidyr::unnest(x, rlang::UQ(ref)) } (nested_rule <- slice(rules, 7)) #> # A tibble: 1 × 14 #> sheet ref type operator formula1 formula2 allow_blank show_input_message #> #> 1 Sheet1 A115,A… whole equal 0 NA TRUE TRUE #> # ℹ 6 more variables: prompt_title , prompt_body , #> # show_error_message , error_title , error_body , #> # error_symbol unnest_ref(nested_rule, ref) #> Warning: Prefixing `UQ()` with the rlang namespace is deprecated as of rlang 0.3.0. #> Please use the non-prefixed form or `!!` instead. #> #> # Bad: rlang::expr(mean(rlang::UQ(var) * 100)) #> #> # Ok: rlang::expr(mean(UQ(var) * 100)) #> #> # Good: rlang::expr(mean(!!var * 100)) #> This warning is displayed once every 8 hours. #> # A tibble: 3 × 14 #> sheet ref type operator formula1 formula2 allow_blank show_input_message #> #> 1 Sheet1 A115 whole equal 0 NA TRUE TRUE #> 2 Sheet1 A121 whole equal 0 NA TRUE TRUE #> 3 Sheet1 A122 whole equal 0 NA TRUE TRUE #> # ℹ 6 more variables: prompt_title , prompt_body , #> # show_error_message , error_title , error_body , #> # error_symbol "},{"path":"https://nacnudus.github.io/tidyxl/articles/smells.html","id":"inspecting-the-parse-tree","dir":"Articles","previous_headings":"","what":"Inspecting the parse tree","title":"Detecting Spreadsheet Smells with xlex()","text":"’s example simple formula MIN(3,MAX(2,A1)) (= symbol beginning formula implied, Excel doesn’t write file).","code":"library(tidyxl) x <- xlex(\"MIN(3,MAX(2,A1))\") x ## root ## ¦-- MIN function ## °-- ( fun_open ## ¦-- 3 number ## ¦-- , separator ## ¦-- MAX function ## °-- ( fun_open ## ¦-- 2 number ## ¦-- , separator ## °-- A1 ref ## °-- ) fun_close ## °-- ) fun_close"},{"path":"https://nacnudus.github.io/tidyxl/articles/smells.html","id":"detecting-constants-inside-formulas","dir":"Articles","previous_headings":"","what":"Detecting constants inside formulas","title":"Detecting Spreadsheet Smells with xlex()","text":"smelly spreadsheet distributed tidyxl package. comes famous Enron subpoena, made available Felienne Hermans. look glance? ’s screenshot part one sheet, showing formulas rather cell values. ’s financial plan, using formulas forecast rest year, plan following year. want see whether formulas embedded constants; ones hidden attention, driving forecasts. read formula, one one, lot easier visualise ones containing constants. can xlex() graph plotting library like ggplot2. first step, importing spreadsheet, tokenize formulas, using xlsx(). Let’s tokenize one formula see looks like.","code":""},{"path":"https://nacnudus.github.io/tidyxl/articles/smells.html","id":"one-formula","dir":"Articles","previous_headings":"Detecting constants inside formulas","what":"One formula","title":"Detecting Spreadsheet Smells with xlex()","text":"formula (C8/7)*12-48000, xlex() separates components. parentheses, operators (division, multiplication, subtraction), reference another cell (C8), numeric constants: 7, 12, 48000. ? 7 probably 7th month, July, column header “July YTD”. year--date figure divided 7, multiplied 12 forecast year-end figure. 48000 mysterious – perhaps future payment expected. Embedding constants inside formula bad practice. Better practice put constants cells, annotated meaning, perhaps even named. formulas refer , name, e.g.","code":"library(dplyr) ## ## Attaching package: 'dplyr' ## The following objects are masked from 'package:stats': ## ## filter, lag ## The following objects are masked from 'package:base': ## ## intersect, setdiff, setequal, union library(tidyr) library(purrr) library(ggplot2) # The original filename was \"barry_tycholiz__848__2002 Plan Worksheet CC107322.xlsx\" sheet <- tidy_xlsx(system.file(\"extdata/enron-constants.xlsx\", package = \"tidyxl\"), \"Detail Breakdown\")$data[[1]] ## Warning: 'tidy_xlsx()' is deprecated. ## Use 'xlsx_cells()' or 'xlsx_formats()' instead. sheet$formula[22] ## [1] \"(C8/7)*12-48000\" xlex(sheet$formula[22]) ## root ## °-- ( paren_open ## ¦-- C8 ref ## ¦-- / operator ## °-- 7 number ## ¦-- ) paren_close ## ¦-- * operator ## ¦-- 12 number ## ¦-- - operator ## °-- 48000 number (Compensation/MonthsToDate)*12Months-FuturePayments"},{"path":"https://nacnudus.github.io/tidyxl/articles/smells.html","id":"many-formulas","dir":"Articles","previous_headings":"Detecting constants inside formulas","what":"Many formulas","title":"Detecting Spreadsheet Smells with xlex()","text":"xlex() function isn’t vectorized (returns data frame), map formulas create ‘nest’ column individual data frames. can unnest data frames filter tokens constants, find cells constants formulas. constants common? Unsurprisingly, 12 7 almost equally abundant, also lots 6s 9s – two- three-quarterly figures? 150000s familiar 48000s, followed fractions look like percentages, several one-offs.","code":"tokens <- sheet %>% filter(!is.na(formula)) %>% select(row, col, formula) %>% mutate(tokens = map(formula, xlex)) %>% select(-formula) tokens ## # A tibble: 154 × 3 ## row col tokens ## ## 1 8 4 ## 2 9 4 ## 3 10 4 ## 4 12 4 ## 5 13 4 ## 6 14 3 ## 7 14 4 ## 8 14 5 ## 9 14 6 ## 10 17 4 ## # ℹ 144 more rows constants <- tokens %>% unnest(tokens) %>% filter(type %in% c(\"error\", \"bool\", \"number\", \"text\")) constants ## # A tibble: 201 × 5 ## row col level type token ## ## 1 8 4 1 number 7 ## 2 8 4 0 number 12 ## 3 8 4 0 number 48000 ## 4 9 4 1 number 7 ## 5 9 4 0 number 12 ## 6 10 4 1 number 7 ## 7 10 4 0 number 12 ## 8 12 4 1 number 7 ## 9 12 4 0 number 12 ## 10 13 4 1 number 7 ## # ℹ 191 more rows constants %>% count(token, sort = TRUE) %>% print(n = Inf) ## # A tibble: 24 × 2 ## token n ## ## 1 12 59 ## 2 7 58 ## 3 6 30 ## 4 9 30 ## 5 150000 4 ## 6 48000 2 ## 7 0.05 1 ## 8 0.1 1 ## 9 0.35 1 ## 10 0.5 1 ## 11 1.05 1 ## 12 10 1 ## 13 10000 1 ## 14 12000 1 ## 15 13000 1 ## 16 15000 1 ## 17 2000 1 ## 18 25000 1 ## 19 5000 1 ## 20 5320 1 ## 21 7314 1 ## 22 7800 1 ## 23 866 1 ## 24 95000 1"},{"path":"https://nacnudus.github.io/tidyxl/articles/smells.html","id":"visualising-constants","dir":"Articles","previous_headings":"Detecting constants inside formulas","what":"Visualising constants","title":"Detecting Spreadsheet Smells with xlex()","text":"final step visualize spreadsheet, highlighting cells hide constants formulas. already data frame cells constants, join back full dataset, pass result ggplot. time doesn’t seem particular pattern, perhaps suspicious .","code":"has_constants <- constants %>% distinct(row, col) %>% mutate(has_constant = TRUE) %>% right_join(sheet, by = c(\"row\", \"col\")) %>% filter(!is_blank) %>% select(row, col, has_constant) %>% replace_na(list(has_constant = FALSE)) has_constants ## # A tibble: 412 × 3 ## row col has_constant ## ## 1 8 4 TRUE ## 2 9 4 TRUE ## 3 10 4 TRUE ## 4 12 4 TRUE ## 5 13 4 TRUE ## 6 17 4 TRUE ## 7 17 6 TRUE ## 8 18 4 TRUE ## 9 19 4 TRUE ## 10 20 4 TRUE ## # ℹ 402 more rows has_constants %>% # filter(row <= 28) %>% ggplot(aes(col, row, fill = has_constant)) + geom_tile() + scale_y_reverse() + theme(legend.position = \"top\")"},{"path":"https://nacnudus.github.io/tidyxl/articles/smells.html","id":"detecting-deeply-nested-formulas","dir":"Articles","previous_headings":"","what":"Detecting deeply nested formulas","title":"Detecting Spreadsheet Smells with xlex()","text":"Using techniques detecting constants, map xlex() formulas spreadsheet, unnest result, filter tokens particular properties. case, interested level token, tells deeply token nested functions expressions. time, use another spreadsheet Enron corpus. First, illustration. Notice inside first function, level increases 1. Inside second function, level increases 2. Now let’s apply test formulas sheet. deepest level nesting turns 7, seen cells row 171. wonder formulas look like?","code":"xlex(\"MAX(3,MIN(2,4))\") ## root ## ¦-- MAX function ## °-- ( fun_open ## ¦-- 3 number ## ¦-- , separator ## ¦-- MIN function ## °-- ( fun_open ## ¦-- 2 number ## ¦-- , separator ## °-- 4 number ## °-- ) fun_close ## °-- ) fun_close # The original filename was \"albert_meyers__1__1-25act.xlsx\" sheet <- tidy_xlsx(system.file(\"extdata/enron-nested.xlsx\", package = \"tidyxl\"), \"Preschedule\")$data[[1]] ## Warning: 'tidy_xlsx()' is deprecated. ## Use 'xlsx_cells()' or 'xlsx_formats()' instead. deepest <- sheet %>% filter(!is.na(formula)) %>% mutate(tokens = map(formula, xlex)) %>% select(row, col, tokens) %>% unnest(tokens) %>% filter(level == max(level)) %>% distinct(row, col, level) deepest ## # A tibble: 48 × 3 ## row col level ## ## 1 171 2 7 ## 2 171 3 7 ## 3 171 4 7 ## 4 171 5 7 ## 5 171 6 7 ## 6 171 7 7 ## 7 171 8 7 ## 8 171 9 7 ## 9 171 10 7 ## 10 171 11 7 ## # ℹ 38 more rows sheet %>% filter(row == 171, col == 2) %>% pull(formula) # Aaaaaaaaaaarghhhhhhhh! ## [1] \"((IF((103-B$89)=103,0,(103-B$89)))+(IF((200-B$95)=200,0,(200-B$95)))+(IF((196-B$98)=196,0,(196-B$98)))+(IF((200-B$101)=200,0,(200-B$101)))+(IF((70-B$104)=70,0,(MIN(40,(70-B$104))))+(IF((78-B$109)=78,0,(MIN(50,(78-B$109)))))+(IF((103-B$114)=103,0,(MIN(66,(103-B$114)))))+(IF((195-B$119-B$124-B$129-B$134-B$139)=195,0,(MIN(70,(195-B$119-B$124-B$129-B$134-B$139)))))+(IF((64-B$144)=64,0,(MIN(50,(64-B$144)))))+(IF((48-B$149)=48,0,(MIN(20,(48-B$149)))))+(IF((44-B$154)=44,0,(MIN(20,(44-B$154)))))+(IF((130-B$159)=130,0,(MIN(20,(130-B$159)))))))\""},{"path":"https://nacnudus.github.io/tidyxl/articles/tidyxl.html","id":"tidyxl","dir":"Articles","previous_headings":"","what":"tidyxl","title":"Tidyxl","text":"tidyxl imports non-tabular data Excel files R. exposes cell content, position, formatting comments tidy structure manipulation, especially unpivotr package. supports xml-based file formats ‘.xlsx’ ‘.xlsm’ via embedded RapidXML C++ library. support binary file formats ‘.xlsb’ ‘.xls’. also provides function xlex() tokenizing formulas. See vignette details. useful detecting ‘spreadsheet smells’ (poor practice embedding constants formulas, using deep levels nesting), understanding dependency structures within spreadsheets.","code":""},{"path":"https://nacnudus.github.io/tidyxl/articles/tidyxl.html","id":"mailing-list","dir":"Articles","previous_headings":"tidyxl","what":"Mailing list","title":"Tidyxl","text":"bugs /issues, create new issue GitHub questions comments, please subscribe tidyxl-devel mailing list. must member post messages, anyone can read archived discussions.","code":""},{"path":"https://nacnudus.github.io/tidyxl/articles/tidyxl.html","id":"installation","dir":"Articles","previous_headings":"tidyxl","what":"Installation","title":"Tidyxl","text":"","code":"devtools::install_github(\"nacnudus/tidyxl\")"},{"path":"https://nacnudus.github.io/tidyxl/articles/tidyxl.html","id":"examples","dir":"Articles","previous_headings":"tidyxl","what":"Examples","title":"Tidyxl","text":"package includes spreadsheet, ‘titanic.xlsx’, contains following pivot table: multi-row column headers make difficult import. popular package importing spreadsheets coerces pivot table dataframe. treats second header row though observations. tidyxl doesn’t coerce pivot table data frame. Instead, represents cell row, describes cell’s address, value properties. structure, cells can found filtering. Specific sheets can requested using xlsx_cells(file, sheet), names sheets file given xlsx_sheet_names().","code":"ftable(Titanic, row.vars = 1:2) #> Age Child Adult #> Survived No Yes No Yes #> Class Sex #> 1st Male 0 5 118 57 #> Female 0 1 4 140 #> 2nd Male 0 11 154 14 #> Female 0 13 13 80 #> 3rd Male 35 13 387 75 #> Female 17 14 89 76 #> Crew Male 0 0 670 192 #> Female 0 0 3 20 titanic <- system.file(\"extdata/titanic.xlsx\", package = \"tidyxl\") readxl::read_excel(titanic) #> New names: #> • `` -> `...1` #> • `` -> `...2` #> • `` -> `...5` #> • `` -> `...7` #> # A tibble: 10 × 7 #> ...1 ...2 Age Child ...5 Adult ...7 #> #> 1 NA NA Survived No Yes No Yes #> 2 Class Sex NA NA NA NA NA #> 3 1st Male NA 0 5 118 57 #> 4 NA Female NA 0 1 4 140 #> 5 2nd Male NA 0 11 154 14 #> 6 NA Female NA 0 13 13 80 #> 7 3rd Male NA 35 13 387 75 #> 8 NA Female NA 17 14 89 76 #> 9 Crew Male NA 0 0 670 192 #> 10 NA Female NA 0 0 3 20 library(tidyxl) x <- xlsx_cells(titanic) dplyr::glimpse(x) #> Rows: 60 #> Columns: 24 #> $ sheet \"Sheet1\", \"Sheet1\", \"Sheet1\", \"Sheet1\", \"Sheet1\", … #> $ address \"C1\", \"D1\", \"E1\", \"F1\", \"G1\", \"C2\", \"D2\", \"E2\", \"F… #> $ row 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 4, 4, 4, 4, 4,… #> $ col 3, 4, 5, 6, 7, 3, 4, 5, 6, 7, 1, 2, 1, 2, 4, 5, 6,… #> $ is_blank FALSE, FALSE, TRUE, FALSE, TRUE, FALSE, FALSE, FAL… #> $ content \"0\", \"1\", NA, \"2\", NA, \"3\", \"4\", \"5\", \"4\", \"5\", \"6… #> $ data_type \"character\", \"character\", \"blank\", \"character\", \"b… #> $ error NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA… #> $ logical NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA… #> $ numeric NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA… #> $ date NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N… #> $ character \"Age\", \"Child\", NA, \"Adult\", NA, \"Survived\", \"No\",… #> $ character_formatted [], [], , [… #> $ formula NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA… #> $ is_array FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, F… #> $ formula_ref NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA… #> $ formula_group NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA… #> $ comment NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA… #> $ height 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15… #> $ width 8.38, 8.38, 8.38, 8.38, 8.38, 8.38, 8.38, 8.38, 8.… #> $ row_outline_level 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,… #> $ col_outline_level 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,… #> $ style_format \"Normal\", \"Normal\", \"Normal\", \"Normal\", \"Normal\", … #> $ local_format_id 2, 3, 3, 3, 3, 2, 3, 3, 3, 3, 2, 2, 3, 3, 1, 1, 1,… x[x$data_type == \"character\", c(\"address\", \"character\")] #> # A tibble: 22 × 2 #> address character #> #> 1 C1 Age #> 2 D1 Child #> 3 F1 Adult #> 4 C2 Survived #> 5 D2 No #> 6 E2 Yes #> 7 F2 No #> 8 G2 Yes #> 9 A3 Class #> 10 B3 Sex #> # ℹ 12 more rows x[x$row == 4, c(\"address\", \"character\", \"numeric\")] #> # A tibble: 6 × 3 #> address character numeric #> #> 1 A4 1st NA #> 2 B4 Male NA #> 3 D4 NA 0 #> 4 E4 NA 5 #> 5 F4 NA 118 #> 6 G4 NA 57"},{"path":"https://nacnudus.github.io/tidyxl/articles/tidyxl.html","id":"formatting","dir":"Articles","previous_headings":"tidyxl > Examples","what":"Formatting","title":"Tidyxl","text":"original spreadsheet formatting applied cells. can also retrieved using tidyxl, xlsx_formats() function. Formatting available using columns local_format_id style_format indexes separate list--lists structure. ‘Local’ formatting common kind, applied individual cells. ‘Style’ formatting usually applied blocks cells, defines several formats . screenshot styles buttons Excel. Formatting can looked follows. see available kinds formats, use str(formats).","code":"# Bold formats <- xlsx_formats(titanic) formats$local$font$bold #> [1] FALSE TRUE FALSE FALSE x[x$local_format_id %in% which(formats$local$font$bold), c(\"address\", \"character\")] #> # A tibble: 4 × 2 #> address character #> #> 1 C1 Age #> 2 C2 Survived #> 3 A3 Class #> 4 B3 Sex # Yellow fill formats$local$fill$patternFill$fgColor$rgb #> [1] NA NA NA \"FFFFFF00\" x[x$local_format_id %in% which(formats$local$fill$patternFill$fgColor$rgb == \"FFFFFF00\"), c(\"address\", \"numeric\")] #> # A tibble: 2 × 2 #> address numeric #> #> 1 F11 3 #> 2 G11 20 # Styles by name formats$style$font$name[\"Normal\"] #> Normal #> \"Calibri\" head(x[x$style_format == \"Normal\", c(\"address\", \"character\")]) #> # A tibble: 6 × 2 #> address character #> #> 1 C1 Age #> 2 D1 Child #> 3 E1 NA #> 4 F1 Adult #> 5 G1 NA #> 6 C2 Survived # In-cell formatting is available in the `character_formatted` column as a data # frame, one row per substring. examples <- system.file(\"/extdata/examples.xlsx\", package = \"tidyxl\") xlsx_cells(examples)$character_formatted[77] #> [[1]] #> # A tibble: 16 × 14 #> character bold italic underline strike vertAlign size color_rgb color_theme #>