update docs

sharpchen · Jun 15, 2024 · 2c12341 · 2c12341
1 parent 8b1341a
commit 2c12341
Show file tree

Hide file tree

Showing 9 changed files with 259 additions and 0 deletions.
diff --git a/docs/document/Regular Expression/docs/1. Basics/1. Match Literals.md b/docs/document/Regular Expression/docs/1. Basics/1. Match Literals.md
@@ -0,0 +1,26 @@
+# Match Literals
+
+## Escaping
+
+## Case-insensitive
+
+Regex is case-sensitive by default, to ignore case, add leading `(?i)`.
+
+:::code-group
+
+```regex
+(?i)ascii
+```
+
+```cs
+_ = new Regex(@"ascii", RegexOptions.IgnoreCase)
+```
+
+:::
+
+To partially ignore case, close partial regex using `(?i)<regex>(?-i)`
+The following matches `ASCIIascciiascii` but not `asciiasciiASCII`
+
+```regex
+ASCII(?i)aScIi(?-i)ascii
+```
diff --git a/docs/document/Regular Expression/docs/1. Basics/2. Match Nonprintable.md b/docs/document/Regular Expression/docs/1. Basics/2. Match Nonprintable.md
@@ -0,0 +1,57 @@
+# Match Nonprintable Characters
+
+## ASCII control characters
+
+Seven commonly used ASCII control characters can be escaped by:
+
+- escape like `\a`
+- hex like `x07`
+
+|Character|Meaning|Hex|
+|---|---|---|
+|`\a`|alert|`0x07`|
+|`\e`|escape|`0x1B`|
+|`\f`|form feed|`0x0C`|
+|`\n`|new line|`0x0A`|
+|`\r`|carriage return|`0x0D`|
+|`\t`|horizontal tab|`0x09`|
+|`\v`|vertical tab|`0x0B`|
+
+:::details click to check all ASCII control characters
+
+|Control Character|Description|Keymap (Ctrl +)|Regex|Explanation|
+|---|---|---|---|---|
+|NUL|Null Character|`^@`|`\c@`|Ctrl + @|
+|SOH|Start of Header|`^A`|`\cA`|Ctrl + A|
+|STX|Start of Text|`^B`|`\cB`|Ctrl + B|
+|ETX|End of Text|`^C`|`\cC`|Ctrl + C|
+|EOT|End of Transmission|`^D`|`\cD`|Ctrl + D|
+|ENQ|Enquiry|`^E`|`\cE`|Ctrl + E|
+|ACK|Acknowledge|`^F`|`\cF`|Ctrl + F|
+|BEL|Bell|`^G`|`\cG`|Ctrl + G|
+|BS|Backspace |`^H`|`\cH`|Ctrl + H|
+|TAB|Horizontal Tab|`^I`|`\cI`|Ctrl + I|
+|LF|Line Feed |`^J`|`\cJ`|Ctrl + J|
+|VT|Vertical Tab|`^K`|`\cK`|Ctrl + K|
+|FF|Form Feed |`^L`|`\cL`|Ctrl + L|
+|CR|Carriage Return|`^M`|`\cM`|Ctrl + M|
+|SO|Shift Out |`^N`|`\cN`|Ctrl + N|
+|SI|Shift In  |`^O`|`\cO`|Ctrl + O|
+|DLE|Data Link Escape|`^P`|`\cP`|Ctrl + P|
+|DC1|Device Control 1|`^Q`|`\cQ`|Ctrl + Q|
+|DC2|Device Control 2|`^R`|`\cR`|Ctrl + R|
+|DC3|Device Control 3|`^S`|`\cS`|Ctrl + S|
+|DC4|Device Control 4|`^T`|`\cT`|Ctrl + T|
+|NAK|Negative Acknowledge|`^U`|`\cU`|Ctrl + U|
+|SYN|Synchronous Idle|`^V`|`\cV`|Ctrl + V|
+|ETB|End of Transmission Block|`^W`|`\cW`|Ctrl + W|
+|CAN|Cancel|`^X`|`\cX`|Ctrl + X|
+|EM|End of Medium|`^Y`|`\cY`|Ctrl + Y|
+|SUB|Substitute|`^Z`|`\cZ`|Ctrl + Z|
+|ESC|Escape|`^[`|`\c[`|Ctrl + [|
+|FS|File Separator|`^\`|`\c\`|Ctrl + \\|
+|GS|Group Separator|`^]`|`\c]`|Ctrl + ]|
+|RS|Record Separator|`^^`|`\c^`|Ctrl + ^|
+|US|Unit Separator|`^_`|`\c_`|Ctrl + _|
+
+:::
diff --git a/docs/document/Regular Expression/docs/1. Basics/3. Match One of Many Characters.md b/docs/document/Regular Expression/docs/1. Basics/3. Match One of Many Characters.md
@@ -0,0 +1,69 @@
+# Match One of Many Characters
+
+## Character class
+
+Create a *character class* to match one occurrence inside a `[]`
+
+```regex
+c[ae]l[ae]nd[ae]r
+```
+
+## Range operator
+
+Create a certain range using `-`
+
+Match one of hexadecimal characters:
+
+```regex
+[a-fA-F0-9]
+```
+
+> Reversed range like `[f-a]` are not valid
+
+## Negation operator
+
+Negate a range using leading `^`
+
+Match Non-hexadecimal characters:
+
+```regex
+[^a-fA-F0-9]
+```
+
+## Escape inside character class
+
+There's four special characters may need to be escaped:
+
+- `-` range operator
+- `^` negation operator
+- `[` and `]` start and end of character class
+
+For any character that are not one of above, is not required to be escaped:
+
+```regex
+[$()*+.?{|]
+```
+
+For `^`s not act as negation are not required to be escaped:
+
+```regex
+[a-f^A-F\^0-9]
+```
+
+Also for `-` and `[`/`]`
+
+:::hint
+It's recommended to always escape metacharacters in character classes
+:::
+
+## Shorthand character classes
+
+- `\d` matches any single *digit*, equivalent to `[\d]` and `[0-9]`
+- `\D` matches any character that is *not a digit*, equivalent to `[^\d]` and `[^0-9]`
+- `\w` matches any *word character*, equivalent to `[a-zA-Z0-9_]`
+- `\W` matches any character that is *not a word character*, equivalent to `[^\w]`
+- `\s` matches any *whitespace character*, like tabs, spaces, line breaks.
+- `\S` matches any character that is *not a whitespace character*
+
+> In `.NET`, `\w` matches not only `[a-zA-Z0-9_]`, it also includes other letters like Cyrillic and Thai.
+> `\s` also matches whitespace characters in Unicode in `.NET` and `JavaScript`
diff --git a/docs/document/Regular Expression/docs/1. Basics/4. Match Any Character.md b/docs/document/Regular Expression/docs/1. Basics/4. Match Any Character.md
@@ -0,0 +1,20 @@
+# Match Any Character
+
+- `.` matches any character except line breaks
+- `.` matches any character with api option
+
+```cs
+_ = new Regex(@".", RegexOptions.Singleline);
+// or
+_ = new Regex(@"(?s).");
+```
+
+- `[\s\S]` and `[\w\W]` and `[\d\D]` match any character
+
+:::Warning
+Use `.` only when you really want to allow any character. Use a character class or negated character class in any other situation.
+:::
+
+## Mode modifier
+
+Use `(?s)`/`(?-s)` to enable/disable `singleline` mode in regex literal
diff --git a/docs/document/Regular Expression/docs/1. Basics/5. Match Start and End of Line.md b/docs/document/Regular Expression/docs/1. Basics/5. Match Start and End of Line.md
@@ -0,0 +1,30 @@
+# Match Start and End of Line
+
+- `^abc` and `\Aabc` match `abc` at the start of whole string
+- `abc$` and `abc\Z` match `abc` at the start of whole string
+- `^abc` match `abc` at the start of each line
+- `abc$` match `abc` at the start of each line
+
+```cs
+_ = new Regex(@"^abcefg$", RegexOptions.Multiline);
+```
+
+> `^` always matches after `\n`, so `\n^` is redundant
+> `$` always matches before `\n`, so `$\n` is redundant
+> `\A\Z` matches empty string and empty string with a single new line
+> `\A\z` matches only empty string
+
+:::hint
+Always use `\A` and `\Z` instead of `^` and `$` when to match start/end of a whole string
+:::
+
+## Mode modifier
+
+Use `(?m)`/`(?-m)` to enable/disable `multiline` mode in regex literal
+
+## Conclusion
+
+- `\A` and `\Z` always match the start and end of a subject string
+- `(?-m)^abc` and `(?-m)$abc` are equivalent to `\Aabc` and `\Zabc`
+- `\z` matches the end of the subject string
+- `abc\Z` matches before line break while `abc\z` won't match if line break exists after `abc`
diff --git a/docs/document/Regular Expression/docs/1. Basics/6. Match Whole Words.md b/docs/document/Regular Expression/docs/1. Basics/6. Match Whole Words.md
@@ -0,0 +1,25 @@
+# Match Whole Words
+
+## Word boundary
+
+`\b<word>\b` matches a whole word
+
+`\b`  strictly matches the following positions:
+
+- Before the first character in subject string
+- After the last character in subject string
+- Between a word character and a character that is not a word character in subject string
+
+- `\b<wordchar>` and `<nonwordchar>\b` only match at the start of a word
+- `<wordchar>\b` and `\b<nonwordchar>` only match at the end of a word
+- `\b<wordchar>\b` and `\b<nonwordchar>\b` match nothing
+
+## Nonboundary
+
+`\B` strictly matches the following positions:
+
+- Before the first character in subject string if it's not a word character
+- After the last character in subject string if it's not a word character
+- Between two word characters
+- Between two nonword characters
+- Empty string
diff --git a/...document/Regular Expression/docs/1. Basics/7. Match One of Many Alternatives.md b/...document/Regular Expression/docs/1. Basics/7. Match One of Many Alternatives.md
@@ -0,0 +1,10 @@
+# Match One of Many Alternatives
+
+`cat|dog|bird` matches one of `cat`, `dog` and `bird`.
+
+> The order of the alternatives in the regex matters only when two of them *can match at the same position* in the string.
+
+Alternatives are *short-circuited*(or *eager*). If the previous alternative matches, the rest won't continue to match at current position.
+
+So `Jane|Janet` can't match `Janet` in `Her name is Janet`, only `Jane` is matched.
+To match word by word, use `\bJane\b|\bJanet\b` instead.
diff --git a/docs/document/Regular Expression/docs/1. Basics/8. Grouping and Captured Groups.md b/docs/document/Regular Expression/docs/1. Basics/8. Grouping and Captured Groups.md
@@ -0,0 +1,22 @@
+# Grouping and Captured Groups
+
+## Grouping alternatives
+
+A better syntax for `\bJane\b|\bJanet\b` is `\b(Jane|Janet)\b` using `()` for grouping alternatives.
+However this also creates a captured group, if you don't need any captured group to reuse, see [Noncapturing](#noncapturing-groups)
+
+## Noncapturing groups
+
+`\b(?:Jane|Janet)\b` disables group capturing using `(?:)`, it won't capture the group when matching, benefit to better performance.
+
+## Group with mode modifier
+
+Use any of `(?i:<regex>)` or `(?s:<regex>)` or `(?m:<regex>)` to annotate grouped alternatives
+
+- `\b(?i:Jane|Janet)\b`
+
+To combine different modes:
+
+- `(?ism:<regex>)` enables `case-insensitive`, `singleline` and `multiline`
+- `(?-ism:<regex>)` disables `case-insensitive`, `singleline` and `multiline`
+- `(?i-sm:<regex>)` enables `case-sensitive` and disables `singleline` and `multiline`
diff --git a/docs/document/Regular Expression/docs/1.md b/docs/document/Regular Expression/docs/1.md