-
Notifications
You must be signed in to change notification settings - Fork 189
/
README.Rmd
145 lines (106 loc) · 5.55 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "README-"
)
library(stringr)
```
# stringr <a href='https://stringr.tidyverse.org'><img src='man/figures/logo.png' align="right" height="139" /></a>
<!-- badges: start -->
[![CRAN status](https://www.r-pkg.org/badges/version/stringr)](https://cran.r-project.org/package=stringr)
[![R-CMD-check](https://github.com/tidyverse/stringr/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/tidyverse/stringr/actions/workflows/R-CMD-check.yaml)
[![Codecov test coverage](https://codecov.io/gh/tidyverse/stringr/branch/main/graph/badge.svg)](https://app.codecov.io/gh/tidyverse/stringr?branch=main)
[![Lifecycle: stable](https://img.shields.io/badge/lifecycle-stable-brightgreen.svg)](https://lifecycle.r-lib.org/articles/stages.html#stable)
<!-- badges: end -->
## Overview
Strings are not glamorous, high-profile components of R, but they do play a big role in many data cleaning and preparation tasks. The stringr package provides a cohesive set of functions designed to make working with strings as easy as possible. If you're not familiar with strings, the best place to start is the [chapter on strings](https://r4ds.hadley.nz/strings) in R for Data Science.
stringr is built on top of [stringi](https://github.com/gagolews/stringi), which uses the [ICU](https://icu.unicode.org) C library to provide fast, correct implementations of common string manipulations. stringr focusses on the most important and commonly used string manipulation functions whereas stringi provides a comprehensive set covering almost anything you can imagine. If you find that stringr is missing a function that you need, try looking in stringi. Both packages share similar conventions, so once you've mastered stringr, you should find stringi similarly easy to use.
## Installation
```r
# The easiest way to get stringr is to install the whole tidyverse:
install.packages("tidyverse")
# Alternatively, install just stringr:
install.packages("stringr")
```
## Cheatsheet
<a href="https://github.com/rstudio/cheatsheets/blob/main/strings.pdf"><img src="https://raw.githubusercontent.com/rstudio/cheatsheets/main/pngs/thumbnails/strings-cheatsheet-thumbs.png" width="630" height="242"/></a>
## Usage
All functions in stringr start with `str_` and take a vector of strings as the first argument:
```{r}
x <- c("why", "video", "cross", "extra", "deal", "authority")
str_length(x)
str_c(x, collapse = ", ")
str_sub(x, 1, 2)
```
Most string functions work with regular expressions, a concise language for describing patterns of text. For example, the regular expression `"[aeiou]"` matches any single character that is a vowel:
```{r}
str_subset(x, "[aeiou]")
str_count(x, "[aeiou]")
```
There are seven main verbs that work with patterns:
* `str_detect(x, pattern)` tells you if there's any match to the pattern:
```{r}
str_detect(x, "[aeiou]")
```
* `str_count(x, pattern)` counts the number of patterns:
```{r}
str_count(x, "[aeiou]")
```
* `str_subset(x, pattern)` extracts the matching components:
```{r}
str_subset(x, "[aeiou]")
```
* `str_locate(x, pattern)` gives the position of the match:
```{r}
str_locate(x, "[aeiou]")
```
* `str_extract(x, pattern)` extracts the text of the match:
```{r}
str_extract(x, "[aeiou]")
```
* `str_match(x, pattern)` extracts parts of the match defined by parentheses:
```{r}
# extract the characters on either side of the vowel
str_match(x, "(.)[aeiou](.)")
```
* `str_replace(x, pattern, replacement)` replaces the matches with new text:
```{r}
str_replace(x, "[aeiou]", "?")
```
* `str_split(x, pattern)` splits up a string into multiple pieces:
```{r}
str_split(c("a,b", "c,d,e"), ",")
```
As well as regular expressions (the default), there are three other pattern matching engines:
* `fixed()`: match exact bytes
* `coll()`: match human letters
* `boundary()`: match boundaries
## RStudio Addin
The [RegExplain RStudio addin](https://www.garrickadenbuie.com/project/regexplain/) provides a friendly interface for working with regular expressions and functions from stringr. This addin allows you to interactively build your regexp, check the output of common string matching functions, consult the interactive help pages, or use the included resources to learn regular expressions.
This addin can easily be installed with devtools:
```r
# install.packages("devtools")
devtools::install_github("gadenbuie/regexplain")
```
## Compared to base R
R provides a solid set of string operations, but because they have grown organically over time, they can be inconsistent and a little hard to learn. Additionally, they lag behind the string operations in other programming languages, so that some things that are easy to do in languages like Ruby or Python are rather hard to do in R.
* Uses consistent function and argument names. The first argument is always
the vector of strings to modify, which makes stringr work particularly well
in conjunction with the pipe:
```{r}
letters %>%
.[1:10] %>%
str_pad(3, "right") %>%
str_c(letters[2:11])
```
* Simplifies string operations by eliminating options that you don't need
95% of the time.
* Produces outputs than can easily be used as inputs. This includes ensuring
that missing inputs result in missing outputs, and zero length inputs
result in zero length outputs.
Learn more in `vignette("from-base")`