Skip to content

Regexes

DJMcMayhem edited this page Jun 8, 2017 · 9 revisions

V's regex system is very similar to vim's regex system, which is very similar to perl's regex. A nice simple introduction to vim regex can be found at vimregex.com.

Commands

There are 5 commands that use regex in V. Each one is a regex up until the first /. What comes after the slash depends on the command, but will either be an offset, a substitute or a sequence of V commands. Each command ends with a carriage return (enter), but if it is at the end of the program or a macro, this carriage return will be provided implicitly.

Search

The two commands for searching are / and ?. / will move forward until the next match, whereas ? will move backwards until the previous match. If the regex does not match anywhere, it will throw a breaking error. After using one of these commands, you may use n to repeat the search, or N to repeat the search in the opposite direction.

After the first slash (or '?') comes an offset, which is a more advanced feature so I won't go into too much detail here. More information can be found in the vim help docs.

Substitute

There are 4 substitute commands. Each one is a minor variation on vim's "Substitute" command. The syntax of the substitute command is

<command><pattern>/<string>

Depending on which command you use, this will look for matches of <pattern> and replace it with <string>.

Each of these apply to either the current line, or every line. If the substitute command is "global" it will replace every match, otherwise it will just replace the first one.

Command Mnemonic Line(s) Global
ó <M-s> Current No
Ó <M-S> Current Yes
í <M-m> All No
Í <M-M> All Yes

If you are familiar with vim's "substitute" command, it is worth nothing that any extra slashes are filled in for you. For example, if you want to remove every single digit from a file in vim, you would need

:%s/\d//g

This is two extraneous slashes just to specify that the operation is global. In V, you can just do

Ó\d

And the //g will be filled in for you.

Sort

Sorting in vim is done using the :sort command. By default, all lines are sorted by ASCII values. There are some flags that allow you to change the sorting behavior. A good overview of the available flags can be found here.

There is also an optional {pattern} that will get skipped over if matched. For example, if you have some text with letters followed by numbers, like this:

a 3
b 2
c 1

If you call the default sort, this will sort the lines by the letters, because they come first. To sort the lines by the numbers, specify the pattern to skip. For example:

:sort /\a/

In V the sorting command supports the same regex compression as the other commands, so this would be

ú/á

Note that you need to / to specify where the flags end and where the pattern starts, although you do not need the second slash like vim does.

Command Mnemonic Description
ú <M-z> Sort, but you can specify flags and/or a pattern after you call it
Ú <M-Z> Standard sort, shortcut of :sort<cr>
úú <M-z><M-z> Sort the line

Count

The count operator will replace a line with the number of matches of a certain regex. The sort command is ø, or <M-x>. You can also use the uppercase variant, Ø, or <M-X>, which replaces the entire buffer with the number of matches of a given regex.

The syntax is simply:

ø<regex>

Global

Our last regex-based command is "global". This command applies a set of V-commands to every line that matches a given regex. The syntax is

<command><regex>/<commands>

The two commands are ç (Mnemonic: <M-g>) and Ç (Mnemonic: <M-G>). They are functionally equivalent except that Ç is backwards. That is, whereas ç applies commands to every matching line, Ç applies commands to all non-matching lines.

If you are familiar with vim's "Global" command, it is worth noting that V's global command defaults to normal mode commands, not ex commands.

Magicness

Vim's regex system has a feature known as magicness, which determines which characters need to be escaped and which ones do not. For example, with default magicness a dot character . matches any character except for a newline. To match an actual . character, you must precede it with a backslash, and the search is

/\.

There are also some characters know as "metachars", which is pretty standard to most regex engines. For example, \w will match any "word character", e.g. [0-9A-Za-z_] So let's say you want to match any full word, and capture it in a group. This would be:

/\(\w*\)

Note the backslashes before the parenthesis. This is because without the backslash, it will match literal "(" and ")" characters. Now, when you're writing a regex system to be easy to work with and remember, this is fine. However, it's not ideal for golf. So, this is shortened. To get the alternate meaning of a character, rather than preceding it with a backslash, just set the high bit. That ends up looking like this:

/¨÷*©

Or, in more readable vim-key syntax,

/<M-(><M-w>*<M-)>

Note: This is the most efficient way, but not the most readable. If you choose to use a backslash instead, that works just fine.

By default, these are the only ASCII characters that have special meaning:

$	    matches end-of-line
.	    matches any character
*	    any number of the previous atom
~	    latest substitute string

Every other character requires the high-bit or preceding backslash to get special meaning.

\(\)    grouping into an atom
\|	    separating alternatives
\+      one or more of the previous atom
\\	    literal backslash

Shortcuts

Certain regex idioms take several bytes, even with this more efficient method of storing alternate meanings of a character. For example, \zs and \ze (or with the high-bit set, ús and úe) mean "selection start" and "selection end". For example, if you want to match the second digit of a two digit combo, you could do this:

/\d\zs\d

This can be compressed down to

/äúsä

Since ä is <M-d> and expands to \d, and ú is <M-z> and expands to \z. Since selection start and selection end are both very useful, but both take multiple bytes, they have shortcuts. Here is the full list of regex shortcuts, and their hex values:

Hex value Regex
0x81 .*
0x82 .+
0x83 .\{-}
0x84 [^
0x85 \ze
0x87 \{-}
0x88 \(.\)
0x93 \zs
Clone this wiki locally