Skip to content

Commit

Permalink
update readme
Browse files Browse the repository at this point in the history
  • Loading branch information
nrennie committed Dec 2, 2024
1 parent 6f82346 commit 07f19df
Showing 1 changed file with 11 additions and 232 deletions.
243 changes: 11 additions & 232 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,15 +8,23 @@ When teaching examples using R, instructors often using *nice* datasets - but th

## Installation

Install from GitHub using:
Install from CRAN using:

```r
install.packages("messy")
```

Install development version from GitHub using:

```r
remotes::install_github("nrennie/messy")
```

## Usage

### `messy()`
For more in-depth usage instructions, see the package documentation at [nrennie.rbind.io/messy](https://nrennie.rbind.io/messy/) which has examples of each function.

The simplest way to use the {messy} package is applying the `messy()` function:

```r
set.seed(1234)
Expand All @@ -37,208 +45,7 @@ messy(ToothGrowth[1:10,])
10 7 VC 0.5
```

Increase how *messy* the data is:

```r
set.seed(1234)
messy(ToothGrowth[1:10,], messiness = 0.7)
```

```r
len supp dose
1 <NA> <NA> <NA>
2 11.5 <NA> <NA>
3 <NA> <NA> <NA>
4 5.8 <NA> <NA>
5 <NA> .v*c <NA>
6 <NA> <NA> <NA>
7 <NA> <NA> <NA>
8 <NA> <NA> 0.5
9 <NA> v@c <NA>
10 <NA> <NA> <NA>
```

### `add_whitespace()`

Randomly adds a whitespace to the ends of some values, meaning that numeric columns may be converted to characters:

```r
set.seed(1234)
add_whitespace(ToothGrowth[1:10,])
```

```r
len supp dose
1 4.2 VC 0.5
2 11.5 VC 0.5
3 7.3 VC 0.5
4 5.8 VC 0.5
5 6.4 VC 0.5
6 10 VC 0.5
7 11.2 VC 0.5
8 11.2 VC 0.5
9 5.2 VC 0.5
10 7 VC 0.5
```

Apply to only some columns:

```r
set.seed(1234)
add_whitespace(ToothGrowth[1:10,], cols = "supp")
```

```r
len supp dose
1 4.2 VC 0.5
2 11.5 VC 0.5
3 7.3 VC 0.5
4 5.8 VC 0.5
5 6.4 VC 0.5
6 10.0 VC 0.5
7 11.2 VC 0.5
8 11.2 VC 0.5
9 5.2 VC 0.5
10 7.0 VC 0.5
```

### `change_case()`

Randomly switches the case between upper case, lower case, and no change of character or factor columns:

```r
set.seed(1234)
change_case(ToothGrowth[1:10,], messiness = 0.5)
```

```r
len supp dose
1 4.2 vc 0.5
2 11.5 VC 0.5
3 7.3 VC 0.5
4 5.8 VC 0.5
5 6.4 VC 0.5
6 10.0 VC 0.5
7 11.2 vc 0.5
8 11.2 vc 0.5
9 5.2 VC 0.5
10 7.0 VC 0.5
```

By default, the case of the entire string is changes. Alternatively, you can specify to change the case of each individual letter:

```r
set.seed(1234)
change_case(ToothGrowth[1:10,], messiness = 0.5, case_type = "letter")
```

```r
len supp dose
1 4.2 VC 0.5
2 11.5 VC 0.5
3 7.3 vC 0.5
4 5.8 VC 0.5
5 6.4 VC 0.5
6 10.0 VC 0.5
7 11.2 Vc 0.5
8 11.2 Vc 0.5
9 5.2 VC 0.5
10 7.0 VC 0.5
```

### `add_special_chars()`

Randomly add special characters to character strings:

```r
set.seed(1234)
add_special_chars(ToothGrowth[1:10,])
```

```r
len supp dose
1 4.2 VC 0.5
2 11.5 VC 0.5
3 7.3 VC 0.5
4 5.8 (VC 0.5
5 6.4 VC 0.5
6 10.0 VC 0.5
7 11.2 VC 0.5
8 11.2 VC 0.5
9 5.2 VC 0.5
10 7.0 VC 0.5
```

### `make_missing()`

Randomly make some values missing using `NA`:

```r
set.seed(1234)
make_missing(ToothGrowth[1:10,])
```

```r
len supp dose
1 4.2 VC 0.5
2 11.5 VC NA
3 7.3 VC 0.5
4 5.8 VC 0.5
5 6.4 VC 0.5
6 10.0 VC 0.5
7 NA VC 0.5
8 11.2 VC NA
9 5.2 VC 0.5
10 7.0 VC 0.5
```

Add a different missing value representation for some columns:

```r
set.seed(1234)
make_missing(ToothGrowth[1:10,], cols = "supp", missing = "999")
```

```r
len supp dose
1 4.2 VC 0.5
2 11.5 VC 0.5
3 7.3 VC 0.5
4 5.8 VC 0.5
5 6.4 VC 0.5
6 10.0 VC 0.5
7 11.2 999 0.5
8 11.2 VC 0.5
9 5.2 VC 0.5
10 7.0 VC 0.5
```

### `messy_colnames()`

Create messy column names:

```r
set.seed(1234)
messy_colnames(ToothGrowth[1:10,])
```

```r
)len s(upp dose
1 4.2 VC 0.5
2 11.5 VC 0.5
3 7.3 VC 0.5
4 5.8 VC 0.5
5 6.4 VC 0.5
6 10.0 VC 0.5
7 11.2 VC 0.5
8 11.2 VC 0.5
9 5.2 VC 0.5
10 7.0 VC 0.5
```

### Combining functions

You can pipe together multiple functions to create custom messy transformations:
You can vary the amount of *messiness* for each function, and chain together different functions to create customised messy data:

```r
set.seed(1234)
Expand All @@ -262,31 +69,3 @@ ToothGrowth[1:10,] |>
9 5.2 !VC 0.5
10 7.0 VC* 0.5
```

If you're adding `messy_colnames()` to a chain (and you specify only some columns in other functions), make sure `messy_colnames()` comes at the end:

```r
set.seed(1234)
ToothGrowth[1:10,] |>
make_missing(cols = "supp", missing = " ") |>
make_missing(cols = c("len", "dose"), missing = c(NA, 999)) |>
add_whitespace(cols = "supp", messiness = 0.5) |>
add_special_chars(cols = "supp") |>
messy_colnames()
```

```r
!l_e)n S^UPP d^o)se
1 4.2 VC 0.5
2 11.5 VC NA
3 7.3 VC 0.5
4 5.8 *VC 0.5
5 6.4 VC 0.5
6 10.0 VC 0.5
7 11.2 0.5
8 11.2 V#C NA
9 5.2 !VC 0.5
10 7.0 VC* 0.5
```

Otherwise, the column names you try to select may no longer exist!

0 comments on commit 07f19df

Please sign in to comment.