-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
9 changed files
with
259 additions
and
0 deletions.
There are no files selected for viewing
26 changes: 26 additions & 0 deletions
26
docs/document/Regular Expression/docs/1. Basics/1. Match Literals.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
# Match Literals | ||
|
||
## Escaping | ||
|
||
## Case-insensitive | ||
|
||
Regex is case-sensitive by default, to ignore case, add leading `(?i)`. | ||
|
||
:::code-group | ||
|
||
```regex | ||
(?i)ascii | ||
``` | ||
|
||
```cs | ||
_ = new Regex(@"ascii", RegexOptions.IgnoreCase) | ||
``` | ||
|
||
::: | ||
|
||
To partially ignore case, close partial regex using `(?i)<regex>(?-i)` | ||
The following matches `ASCIIascciiascii` but not `asciiasciiASCII` | ||
|
||
```regex | ||
ASCII(?i)aScIi(?-i)ascii | ||
``` |
57 changes: 57 additions & 0 deletions
57
docs/document/Regular Expression/docs/1. Basics/2. Match Nonprintable.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,57 @@ | ||
# Match Nonprintable Characters | ||
|
||
## ASCII control characters | ||
|
||
Seven commonly used ASCII control characters can be escaped by: | ||
|
||
- escape like `\a` | ||
- hex like `x07` | ||
|
||
|Character|Meaning|Hex| | ||
|---|---|---| | ||
|`\a`|alert|`0x07`| | ||
|`\e`|escape|`0x1B`| | ||
|`\f`|form feed|`0x0C`| | ||
|`\n`|new line|`0x0A`| | ||
|`\r`|carriage return|`0x0D`| | ||
|`\t`|horizontal tab|`0x09`| | ||
|`\v`|vertical tab|`0x0B`| | ||
|
||
:::details click to check all ASCII control characters | ||
|
||
|Control Character|Description|Keymap (Ctrl +)|Regex|Explanation| | ||
|---|---|---|---|---| | ||
|NUL|Null Character|`^@`|`\c@`|Ctrl + @| | ||
|SOH|Start of Header|`^A`|`\cA`|Ctrl + A| | ||
|STX|Start of Text|`^B`|`\cB`|Ctrl + B| | ||
|ETX|End of Text|`^C`|`\cC`|Ctrl + C| | ||
|EOT|End of Transmission|`^D`|`\cD`|Ctrl + D| | ||
|ENQ|Enquiry|`^E`|`\cE`|Ctrl + E| | ||
|ACK|Acknowledge|`^F`|`\cF`|Ctrl + F| | ||
|BEL|Bell|`^G`|`\cG`|Ctrl + G| | ||
|BS|Backspace |`^H`|`\cH`|Ctrl + H| | ||
|TAB|Horizontal Tab|`^I`|`\cI`|Ctrl + I| | ||
|LF|Line Feed |`^J`|`\cJ`|Ctrl + J| | ||
|VT|Vertical Tab|`^K`|`\cK`|Ctrl + K| | ||
|FF|Form Feed |`^L`|`\cL`|Ctrl + L| | ||
|CR|Carriage Return|`^M`|`\cM`|Ctrl + M| | ||
|SO|Shift Out |`^N`|`\cN`|Ctrl + N| | ||
|SI|Shift In |`^O`|`\cO`|Ctrl + O| | ||
|DLE|Data Link Escape|`^P`|`\cP`|Ctrl + P| | ||
|DC1|Device Control 1|`^Q`|`\cQ`|Ctrl + Q| | ||
|DC2|Device Control 2|`^R`|`\cR`|Ctrl + R| | ||
|DC3|Device Control 3|`^S`|`\cS`|Ctrl + S| | ||
|DC4|Device Control 4|`^T`|`\cT`|Ctrl + T| | ||
|NAK|Negative Acknowledge|`^U`|`\cU`|Ctrl + U| | ||
|SYN|Synchronous Idle|`^V`|`\cV`|Ctrl + V| | ||
|ETB|End of Transmission Block|`^W`|`\cW`|Ctrl + W| | ||
|CAN|Cancel|`^X`|`\cX`|Ctrl + X| | ||
|EM|End of Medium|`^Y`|`\cY`|Ctrl + Y| | ||
|SUB|Substitute|`^Z`|`\cZ`|Ctrl + Z| | ||
|ESC|Escape|`^[`|`\c[`|Ctrl + [| | ||
|FS|File Separator|`^\`|`\c\`|Ctrl + \\| | ||
|GS|Group Separator|`^]`|`\c]`|Ctrl + ]| | ||
|RS|Record Separator|`^^`|`\c^`|Ctrl + ^| | ||
|US|Unit Separator|`^_`|`\c_`|Ctrl + _| | ||
|
||
::: |
69 changes: 69 additions & 0 deletions
69
docs/document/Regular Expression/docs/1. Basics/3. Match One of Many Characters.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,69 @@ | ||
# Match One of Many Characters | ||
|
||
## Character class | ||
|
||
Create a *character class* to match one occurrence inside a `[]` | ||
|
||
```regex | ||
c[ae]l[ae]nd[ae]r | ||
``` | ||
|
||
## Range operator | ||
|
||
Create a certain range using `-` | ||
|
||
Match one of hexadecimal characters: | ||
|
||
```regex | ||
[a-fA-F0-9] | ||
``` | ||
|
||
> Reversed range like `[f-a]` are not valid | ||
## Negation operator | ||
|
||
Negate a range using leading `^` | ||
|
||
Match Non-hexadecimal characters: | ||
|
||
```regex | ||
[^a-fA-F0-9] | ||
``` | ||
|
||
## Escape inside character class | ||
|
||
There's four special characters may need to be escaped: | ||
|
||
- `-` range operator | ||
- `^` negation operator | ||
- `[` and `]` start and end of character class | ||
|
||
For any character that are not one of above, is not required to be escaped: | ||
|
||
```regex | ||
[$()*+.?{|] | ||
``` | ||
|
||
For `^`s not act as negation are not required to be escaped: | ||
|
||
```regex | ||
[a-f^A-F\^0-9] | ||
``` | ||
|
||
Also for `-` and `[`/`]` | ||
|
||
:::hint | ||
It's recommended to always escape metacharacters in character classes | ||
::: | ||
|
||
## Shorthand character classes | ||
|
||
- `\d` matches any single *digit*, equivalent to `[\d]` and `[0-9]` | ||
- `\D` matches any character that is *not a digit*, equivalent to `[^\d]` and `[^0-9]` | ||
- `\w` matches any *word character*, equivalent to `[a-zA-Z0-9_]` | ||
- `\W` matches any character that is *not a word character*, equivalent to `[^\w]` | ||
- `\s` matches any *whitespace character*, like tabs, spaces, line breaks. | ||
- `\S` matches any character that is *not a whitespace character* | ||
|
||
> In `.NET`, `\w` matches not only `[a-zA-Z0-9_]`, it also includes other letters like Cyrillic and Thai. | ||
> `\s` also matches whitespace characters in Unicode in `.NET` and `JavaScript` |
20 changes: 20 additions & 0 deletions
20
docs/document/Regular Expression/docs/1. Basics/4. Match Any Character.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
# Match Any Character | ||
|
||
- `.` matches any character except line breaks | ||
- `.` matches any character with api option | ||
|
||
```cs | ||
_ = new Regex(@".", RegexOptions.Singleline); | ||
// or | ||
_ = new Regex(@"(?s)."); | ||
``` | ||
|
||
- `[\s\S]` and `[\w\W]` and `[\d\D]` match any character | ||
|
||
:::Warning | ||
Use `.` only when you really want to allow any character. Use a character class or negated character class in any other situation. | ||
::: | ||
|
||
## Mode modifier | ||
|
||
Use `(?s)`/`(?-s)` to enable/disable `singleline` mode in regex literal |
30 changes: 30 additions & 0 deletions
30
docs/document/Regular Expression/docs/1. Basics/5. Match Start and End of Line.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
# Match Start and End of Line | ||
|
||
- `^abc` and `\Aabc` match `abc` at the start of whole string | ||
- `abc$` and `abc\Z` match `abc` at the start of whole string | ||
- `^abc` match `abc` at the start of each line | ||
- `abc$` match `abc` at the start of each line | ||
|
||
```cs | ||
_ = new Regex(@"^abcefg$", RegexOptions.Multiline); | ||
``` | ||
|
||
> `^` always matches after `\n`, so `\n^` is redundant | ||
> `$` always matches before `\n`, so `$\n` is redundant | ||
> `\A\Z` matches empty string and empty string with a single new line | ||
> `\A\z` matches only empty string | ||
:::hint | ||
Always use `\A` and `\Z` instead of `^` and `$` when to match start/end of a whole string | ||
::: | ||
|
||
## Mode modifier | ||
|
||
Use `(?m)`/`(?-m)` to enable/disable `multiline` mode in regex literal | ||
|
||
## Conclusion | ||
|
||
- `\A` and `\Z` always match the start and end of a subject string | ||
- `(?-m)^abc` and `(?-m)$abc` are equivalent to `\Aabc` and `\Zabc` | ||
- `\z` matches the end of the subject string | ||
- `abc\Z` matches before line break while `abc\z` won't match if line break exists after `abc` |
25 changes: 25 additions & 0 deletions
25
docs/document/Regular Expression/docs/1. Basics/6. Match Whole Words.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
# Match Whole Words | ||
|
||
## Word boundary | ||
|
||
`\b<word>\b` matches a whole word | ||
|
||
`\b` strictly matches the following positions: | ||
|
||
- Before the first character in subject string | ||
- After the last character in subject string | ||
- Between a word character and a character that is not a word character in subject string | ||
|
||
- `\b<wordchar>` and `<nonwordchar>\b` only match at the start of a word | ||
- `<wordchar>\b` and `\b<nonwordchar>` only match at the end of a word | ||
- `\b<wordchar>\b` and `\b<nonwordchar>\b` match nothing | ||
|
||
## Nonboundary | ||
|
||
`\B` strictly matches the following positions: | ||
|
||
- Before the first character in subject string if it's not a word character | ||
- After the last character in subject string if it's not a word character | ||
- Between two word characters | ||
- Between two nonword characters | ||
- Empty string |
10 changes: 10 additions & 0 deletions
10
...document/Regular Expression/docs/1. Basics/7. Match One of Many Alternatives.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
# Match One of Many Alternatives | ||
|
||
`cat|dog|bird` matches one of `cat`, `dog` and `bird`. | ||
|
||
> The order of the alternatives in the regex matters only when two of them *can match at the same position* in the string. | ||
Alternatives are *short-circuited*(or *eager*). If the previous alternative matches, the rest won't continue to match at current position. | ||
|
||
So `Jane|Janet` can't match `Janet` in `Her name is Janet`, only `Jane` is matched. | ||
To match word by word, use `\bJane\b|\bJanet\b` instead. |
22 changes: 22 additions & 0 deletions
22
docs/document/Regular Expression/docs/1. Basics/8. Grouping and Captured Groups.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
# Grouping and Captured Groups | ||
|
||
## Grouping alternatives | ||
|
||
A better syntax for `\bJane\b|\bJanet\b` is `\b(Jane|Janet)\b` using `()` for grouping alternatives. | ||
However this also creates a captured group, if you don't need any captured group to reuse, see [Noncapturing](#noncapturing-groups) | ||
|
||
## Noncapturing groups | ||
|
||
`\b(?:Jane|Janet)\b` disables group capturing using `(?:)`, it won't capture the group when matching, benefit to better performance. | ||
|
||
## Group with mode modifier | ||
|
||
Use any of `(?i:<regex>)` or `(?s:<regex>)` or `(?m:<regex>)` to annotate grouped alternatives | ||
|
||
- `\b(?i:Jane|Janet)\b` | ||
|
||
To combine different modes: | ||
|
||
- `(?ism:<regex>)` enables `case-insensitive`, `singleline` and `multiline` | ||
- `(?-ism:<regex>)` disables `case-insensitive`, `singleline` and `multiline` | ||
- `(?i-sm:<regex>)` enables `case-sensitive` and disables `singleline` and `multiline` |
Empty file.