From 07f19df84c499c3aa27df447677f0327499b0551 Mon Sep 17 00:00:00 2001 From: nrennie Date: Mon, 2 Dec 2024 11:47:51 +0000 Subject: [PATCH] update readme --- README.md | 243 +++--------------------------------------------------- 1 file changed, 11 insertions(+), 232 deletions(-) diff --git a/README.md b/README.md index 2a11706..acae4df 100644 --- a/README.md +++ b/README.md @@ -8,7 +8,13 @@ When teaching examples using R, instructors often using *nice* datasets - but th ## Installation -Install from GitHub using: +Install from CRAN using: + +```r +install.packages("messy") +``` + +Install development version from GitHub using: ```r remotes::install_github("nrennie/messy") @@ -16,7 +22,9 @@ remotes::install_github("nrennie/messy") ## Usage -### `messy()` +For more in-depth usage instructions, see the package documentation at [nrennie.rbind.io/messy](https://nrennie.rbind.io/messy/) which has examples of each function. + +The simplest way to use the {messy} package is applying the `messy()` function: ```r set.seed(1234) @@ -37,208 +45,7 @@ messy(ToothGrowth[1:10,]) 10 7 VC 0.5 ``` -Increase how *messy* the data is: - -```r -set.seed(1234) -messy(ToothGrowth[1:10,], messiness = 0.7) -``` - -```r - len supp dose -1 -2 11.5 -3 -4 5.8 -5 .v*c -6 -7 -8 0.5 -9 v@c -10 -``` - -### `add_whitespace()` - -Randomly adds a whitespace to the ends of some values, meaning that numeric columns may be converted to characters: - -```r -set.seed(1234) -add_whitespace(ToothGrowth[1:10,]) -``` - -```r - len supp dose -1 4.2 VC 0.5 -2 11.5 VC 0.5 -3 7.3 VC 0.5 -4 5.8 VC 0.5 -5 6.4 VC 0.5 -6 10 VC 0.5 -7 11.2 VC 0.5 -8 11.2 VC 0.5 -9 5.2 VC 0.5 -10 7 VC 0.5 -``` - -Apply to only some columns: - -```r -set.seed(1234) -add_whitespace(ToothGrowth[1:10,], cols = "supp") -``` - -```r - len supp dose -1 4.2 VC 0.5 -2 11.5 VC 0.5 -3 7.3 VC 0.5 -4 5.8 VC 0.5 -5 6.4 VC 0.5 -6 10.0 VC 0.5 -7 11.2 VC 0.5 -8 11.2 VC 0.5 -9 5.2 VC 0.5 -10 7.0 VC 0.5 -``` - -### `change_case()` - -Randomly switches the case between upper case, lower case, and no change of character or factor columns: - -```r -set.seed(1234) -change_case(ToothGrowth[1:10,], messiness = 0.5) -``` - -```r - len supp dose -1 4.2 vc 0.5 -2 11.5 VC 0.5 -3 7.3 VC 0.5 -4 5.8 VC 0.5 -5 6.4 VC 0.5 -6 10.0 VC 0.5 -7 11.2 vc 0.5 -8 11.2 vc 0.5 -9 5.2 VC 0.5 -10 7.0 VC 0.5 -``` - -By default, the case of the entire string is changes. Alternatively, you can specify to change the case of each individual letter: - -```r -set.seed(1234) -change_case(ToothGrowth[1:10,], messiness = 0.5, case_type = "letter") -``` - -```r - len supp dose -1 4.2 VC 0.5 -2 11.5 VC 0.5 -3 7.3 vC 0.5 -4 5.8 VC 0.5 -5 6.4 VC 0.5 -6 10.0 VC 0.5 -7 11.2 Vc 0.5 -8 11.2 Vc 0.5 -9 5.2 VC 0.5 -10 7.0 VC 0.5 -``` - -### `add_special_chars()` - -Randomly add special characters to character strings: - -```r -set.seed(1234) -add_special_chars(ToothGrowth[1:10,]) -``` - -```r - len supp dose -1 4.2 VC 0.5 -2 11.5 VC 0.5 -3 7.3 VC 0.5 -4 5.8 (VC 0.5 -5 6.4 VC 0.5 -6 10.0 VC 0.5 -7 11.2 VC 0.5 -8 11.2 VC 0.5 -9 5.2 VC 0.5 -10 7.0 VC 0.5 -``` - -### `make_missing()` - -Randomly make some values missing using `NA`: - -```r -set.seed(1234) -make_missing(ToothGrowth[1:10,]) -``` - -```r - len supp dose -1 4.2 VC 0.5 -2 11.5 VC NA -3 7.3 VC 0.5 -4 5.8 VC 0.5 -5 6.4 VC 0.5 -6 10.0 VC 0.5 -7 NA VC 0.5 -8 11.2 VC NA -9 5.2 VC 0.5 -10 7.0 VC 0.5 -``` - -Add a different missing value representation for some columns: - -```r -set.seed(1234) -make_missing(ToothGrowth[1:10,], cols = "supp", missing = "999") -``` - -```r - len supp dose -1 4.2 VC 0.5 -2 11.5 VC 0.5 -3 7.3 VC 0.5 -4 5.8 VC 0.5 -5 6.4 VC 0.5 -6 10.0 VC 0.5 -7 11.2 999 0.5 -8 11.2 VC 0.5 -9 5.2 VC 0.5 -10 7.0 VC 0.5 -``` - -### `messy_colnames()` - -Create messy column names: - -```r -set.seed(1234) -messy_colnames(ToothGrowth[1:10,]) -``` - -```r - )len s(upp dose -1 4.2 VC 0.5 -2 11.5 VC 0.5 -3 7.3 VC 0.5 -4 5.8 VC 0.5 -5 6.4 VC 0.5 -6 10.0 VC 0.5 -7 11.2 VC 0.5 -8 11.2 VC 0.5 -9 5.2 VC 0.5 -10 7.0 VC 0.5 -``` - -### Combining functions - -You can pipe together multiple functions to create custom messy transformations: +You can vary the amount of *messiness* for each function, and chain together different functions to create customised messy data: ```r set.seed(1234) @@ -262,31 +69,3 @@ ToothGrowth[1:10,] |> 9 5.2 !VC 0.5 10 7.0 VC* 0.5 ``` - -If you're adding `messy_colnames()` to a chain (and you specify only some columns in other functions), make sure `messy_colnames()` comes at the end: - -```r -set.seed(1234) -ToothGrowth[1:10,] |> - make_missing(cols = "supp", missing = " ") |> - make_missing(cols = c("len", "dose"), missing = c(NA, 999)) |> - add_whitespace(cols = "supp", messiness = 0.5) |> - add_special_chars(cols = "supp") |> - messy_colnames() -``` - -```r - !l_e)n S^UPP d^o)se -1 4.2 VC 0.5 -2 11.5 VC NA -3 7.3 VC 0.5 -4 5.8 *VC 0.5 -5 6.4 VC 0.5 -6 10.0 VC 0.5 -7 11.2 0.5 -8 11.2 V#C NA -9 5.2 !VC 0.5 -10 7.0 VC* 0.5 -``` - -Otherwise, the column names you try to select may no longer exist!