Skip to content

Latest commit

 

History

History
188 lines (150 loc) · 6.82 KB

README.md

File metadata and controls

188 lines (150 loc) · 6.82 KB

Regex (V2 WIP) Swift Version Platforms Build Status Codebeat Status Codecov

A pure Swift implementation of a Regular Expression Engine

Trying again with V2 using DFAs instead of NFAs to get grep-like performance

Usage

To avoid compiling overhead it is possible to create a Regex instance

// Compile the expression
let regex = try! Regex(pattern: "[a-zA-Z]+")

let string = "RegEx is tough, but useful."

// Search for matches
let words = regex.match(string)

/*
words = [
	RegexMatch(match: "RegEx", groups: []),
	RegexMatch(match: "is", groups: []),
	RegexMatch(match: "tough", groups: []),
	RegexMatch(match: "but", groups: []),
	RegexMatch(match: "useful", groups: []),
]
*/

If compiling overhead is not an issue it is possible to use the =~ operator to match a string

let fourLetterWords = "drink beer, it's very nice!" =~ "\\b\\w{4}\\b" ?? []

/*
fourLetterWords = [
	RegexMatch(match: "beer", groups: []),
	RegexMatch(match: "very", groups: []),
	RegexMatch(match: "nice", groups: []),
]
*/

By default the Global flag is active. To change which flag are active, add a / at the start of the pattern, and add /<flags> at the end. The available flags are:

  • g Global - Allows multiple matches
  • i Case Insensitive - Case insensitive matching
  • m Multiline - ^ and $ also match the begining and end of a line
// Global and Case Insensitive search
let regex = try! Regex(pattern: "/\\w+/ig")

Supported Operations

Character Classes

Pattern Description Supported
. [^\n\r]
[^] [\s\S]
\w [A-Za-z0-9_]
\W [^A-Za-z0-9_]
\d [0-9]
\D [^0-9]
\s [\ \r\n\t\v\f]
\S [^\ \r\n\t\v\f]
[ABC] Any in the set
[^ABC] Any not in the set
[A-Z] Any in the range inclusively

Anchors (Match positions not characters)

Pattern Description Supported
^ Beginning of string
$ End of string
\b Word boundary
\B Not word boundary

Escaped Characters

Pattern Description Supported
\0 Octal escaped character
\00 Octal escaped character
\000 Octal escaped character
\xFF Hex escaped character
\uFFFF Unicode escaped character
\cA Control character
\t Tab
\n Newline
\v Vertical tab
\f Form feed
\r Carriage return
\0 Null
\. .
\\ \
\+ +
\* *
\? ?
\^ ^
\$ $
\{ {
\} }
\[ [
\] ]
\( (
\) )
\/ /
| ` `

Groups and Lookaround

Pattern Description Supported
(ABC) Capture group
(<name>ABC) Named capture group
\1 Back reference
\'name' Named back reference
(?:ABC) Non-capturing group
(?=ABC) Positive lookahead
(?!ABC) Negative lookahead
(?<=ABC) Positive lookbehind
(?<!ABC) Negative lookbehing

Greedy Quantifiers

Pattern Description Supported
+ One or more
* Zero or more
? Optional
{n} n
{,} Same as *
{,n} n or less
{n,} n or more
{n,m} n to m

Lazy Quantifiers

Pattern Description Supported
+? One or more
*? Zero or more
?? Optional
{n}? n
{,n}? n or less
{n,}? n or more
{n,m}? n to m

Alternation

Pattern Description Supported
| Everything before or everything after

Flags

Pattern Description Supported
i Case insensitive
g Global
m Multiline

Inner Workings

(Similar to before)

  • Lexer (String input to Tokens)
  • Parser (Tokens to NFA)
  • Compiler (NFA to DFA)
  • Optimizer (Simplify DFA (eg. char(a), char(b) -> string(ab)) for better performance)
  • Engine (Matches an input String using the DFA)

Note

Swift treats \r\n as a single Character. Use \n\r to have both.

Resources