A pure Swift implementation of a Regular Expression Engine
Trying again with V2 using DFAs instead of NFAs to get grep-like performance
To avoid compiling overhead it is possible to create a Regex
instance
// Compile the expression
let regex = try ! Regex ( pattern: " [a-zA-Z]+ " )
let string = " RegEx is tough, but useful. "
// Search for matches
let words = regex. match ( string)
/*
words = [
RegexMatch(match: "RegEx", groups: []),
RegexMatch(match: "is", groups: []),
RegexMatch(match: "tough", groups: []),
RegexMatch(match: "but", groups: []),
RegexMatch(match: "useful", groups: []),
]
*/
If compiling overhead is not an issue it is possible to use the =~
operator to match a string
let fourLetterWords = " drink beer, it's very nice! " =~ " \\ b \\ w{4} \\ b " ?? [ ]
/*
fourLetterWords = [
RegexMatch(match: "beer", groups: []),
RegexMatch(match: "very", groups: []),
RegexMatch(match: "nice", groups: []),
]
*/
By default the Global
flag is active. To change which flag are active, add a /
at the start of the pattern, and add /<flags>
at the end. The available flags are:
g
Global
- Allows multiple matches
i
Case Insensitive
- Case insensitive matching
m
Multiline
- ^
and $
also match the begining and end of a line
// Global and Case Insensitive search
let regex = try ! Regex ( pattern: " / \\ w+/ig " )
Pattern
Description
Supported
.
[^\n\r]
[^]
[\s\S]
\w
[A-Za-z0-9_]
\W
[^A-Za-z0-9_]
\d
[0-9]
\D
[^0-9]
\s
[\ \r\n\t\v\f]
\S
[^\ \r\n\t\v\f]
[ABC]
Any in the set
[^ABC]
Any not in the set
[A-Z]
Any in the range inclusively
Anchors (Match positions not characters)
Pattern
Description
Supported
^
Beginning of string
$
End of string
\b
Word boundary
\B
Not word boundary
Pattern
Description
Supported
\0
Octal escaped character
\00
Octal escaped character
\000
Octal escaped character
\xFF
Hex escaped character
\uFFFF
Unicode escaped character
\cA
Control character
\t
Tab
\n
Newline
\v
Vertical tab
\f
Form feed
\r
Carriage return
\0
Null
\.
.
\\
\
\+
+
\*
*
\?
?
\^
^
\$
$
\{
{
\}
}
\[
[
\]
]
\(
(
\)
)
\/
/
|
`
`
Pattern
Description
Supported
(ABC)
Capture group
(<name>ABC)
Named capture group
\1
Back reference
\'name'
Named back reference
(?:ABC)
Non-capturing group
(?=ABC)
Positive lookahead
(?!ABC)
Negative lookahead
(?<=ABC)
Positive lookbehind
(?<!ABC)
Negative lookbehing
Pattern
Description
Supported
+
One or more
*
Zero or more
?
Optional
{n}
n
{,}
Same as *
{,n}
n or less
{n,}
n or more
{n,m}
n to m
Pattern
Description
Supported
+?
One or more
*?
Zero or more
??
Optional
{n}?
n
{,n}?
n or less
{n,}?
n or more
{n,m}?
n to m
Pattern
Description
Supported
|
Everything before or everything after
Pattern
Description
Supported
i
Case insensitive
g
Global
m
Multiline
(Similar to before)
Lexer (String input to Tokens)
Parser (Tokens to NFA)
Compiler (NFA to DFA)
Optimizer (Simplify DFA (eg. char(a), char(b)
-> string(ab)
) for better performance)
Engine (Matches an input String using the DFA)
Swift treats \r\n
as a single Character
. Use \n\r
to have both.