WARNING: This is currently in beta as I finalize out the API, write docs, and examples.
Teg is a tiny declarative parser toolkit written in Typescript. It aims to be a semantic and approachable library for parsing. Teg's semantics are mostly based off PEGS: Parsing Expression Grammers
- 0 dependencies
- Browser or Node
- 4.4kb minified (but highly tree-shakeable!)
- Well-tested
- Helpful error messages
- Straightforward and semantic by default
- But also powerful and composable API.
npm install teg-parser
import { template, line } from "teg-parser"
/** Parse markdown level 1 headings */
const h1Parser = template`# ${line}`
const result = h1Parser.run("# heading\n")
assert(result.isSuccess())
assert.deepEqual(result.value, ["heading"])
const failResult = h1Parser.run("not a heading")
assert(failResult.isFailure())
console.log(failResult)
/**
* Logs
Parse Failure
| not a heading
| ^
Failed at index 0: Char did not match "#"
In middle of parsing text("#") at 0
In middle of parsing text("# ") at 0
In middle of parsing template(text("# "), line, text("")) at 0
*/
Often, you'll want to do some processing on a successful parse. To make this ergonomic, parsers define a map
function that will let you transform successfully parsed content.
import { template, maybe, zeroOrMore, line, takeUntilAfter } from "teg-parser"
type Blockquote = {
content: string
}
const blockquote: Parser<Blockquote> = zeroOrMore(template`> ${line}`)
.map((lines) => lines.map(([line]) => line).join("\n"))
.map((content) => ({ content }))
const result = blockquote.run(`> Line 1\n> Line 2\n> Line 3`)
assert(result.isSuccess())
assert.deepEqual(result.value, {
content: "Line 1\nLine 2\nLine 3",
})
Since it's written in typescript, types are inferred as much as possible.
Much of the idea comes from Chet Corcos's article on parsers. Although Parser
s currently implement bimap
, fold
, and chain
methods as described in the article, I haven't found them as useful in real-world usage, and may remove them or change them.
There are some examples available in the examples
directory. It's TODO to build out more; help out if you want!
- Markdown
- CLI args
- Unordered list
- JSON
- LaTeX
You can also see an example of a bigger parser I use for my custom blog post format here: https://github.com/tanishqkancharla/tk-parser/blob/main/src/index.ts (although it's using an older version of teg right now).
/** Matches a text string */
export const text = <T extends string>(value: T) => Parser<T>
/**
* Tagged template text for parsing.
*
* "template`# ${line}`" will parse "# Heading" to ["Heading"]
*
* Can use multiple parsers together. Keep in mind parsers run greedily,
* so "template`${word}content`" will fail on "textcontent" b/c the `word` parser
* will match "textcontent", and then it will try to match the text "content"
*/
export const template
/**
* Match the given parser n or more times, with an optional delimiter parser
* in between.
*/
const nOrMore: <T, D>(
n: number,
parser: Parser<T>,
delimiter?: Parser<D>
) => Parser<T[]>
/**
* Match the given parser zero or more times, with an optional delimiter
* NOTE: this will always succeed.
*/
const zeroOrMore: <T, D>(parser: Parser<T>, delimiter?: Parser<D>) => Parser<T[]>
/**
* Match the given parser one or more times, with an optional delimiter
*/
const oneOrMore: <T, D>(parser: Parser<T>, delimiter?: Parser<D>) => Parser<T[]>
/** Matches exactly one of the given parsers, checked in the given order */
const oneOf: <ParserArray extends Parser<any>[]>(
parsers: ParserArray
) => ParserArray[number]
/**
* Match the given parsers in sequence
*
* @example
* sequence([text("a"), text("b"), text("c")]) => Parser<"abc">
*/
const sequence: (
parsers: Parser[],
delimiter?: Parser
) => Parser
/**
* Look ahead in the stream to match the given parser.
* NOTE: This consumes no tokens.
*/
const lookahead: <T>(parser: Parser<T>) => Parser<T>
/**
* Tries matching a parser, returns undefined if it fails
* NOTE: This parser always succeeds
*/
const maybe: <T>(parser: Parser<T>) => Parser<T | undefined>
/**
* Keep consuming until the given parser succeeds.
* Returns all the characters that were consumed before the parser succeded.
*
* @example
* `takeUntilAfter(text("\n"))` takes until after the newline but
* doesn't include the newline itself in the result
*/
const takeUntilAfter: <T>(parser: Parser<T>) => Parser<string>
/**
* Keep consuming until before the given parser succeeds.
* Returns all the characters that were consumed before the parser succeded.
*
* @example
* `takeUpTo(text("\n"))` takes all chars until before the newline
*/
export const takeUpTo: <T>(parser: Parser<T>): Parser<string>
/**
* Takes the first sentence in the stream
* i.e. up to (and including) the first newline
*/
const line = takeUntilAfter(text("\n"));
/** Matches a single lowercase English letter */
const lower: Parser<string>
/** Matches a single uppercase English letter */
const upper: Parser<string>
/** Matches a single English letter, case insensitive */
const letter: Parser<string>
/**
* Match an English word
*/
const word: Parser<string>
/** Match a single digit from 0 to 9 */
const digit: Parser<string>
const integer: Parser<number>
/** Match a single hexadecimal digit (0-9, A-F), case insensitive */
const hexDigit: Parser<string>
/** Match a single English letter or digit */
const alphaNumeric: Parser<string>
const custom = new Parser((stream) => {
// ... logic
return new ParseSuccess(result, stream)
// or
return new ParseFailure(errorMessage, stream)
})
All primitive parsers and combinators are built using these constructors, so you can look at those for examples.
Teg ships utilities to test parsers at teg-parser/testParser
. It is used like this:
import { testParser } from "teg-parser/testParser";
const test = testParser(parser)
/** Assert the content passed in completely parses to the expected value */
test.parses(content, expected)
/**
* Assert the content gets parsed to the expected value, but without asserting
* all the content is consumed
*/
test.parsePartial(content, expected)
/** Assert the parser successfully matches the given content */
test.matches(content)
/** Assert the parser fails on the given content */
test.fails(content)
teg
comes with out of the box support for both ESM and CJS. The correct format will be used depending on whether you use import
(ESM) or require
(CJS). However, a lot of parsers in teg are just simple utilities, so if you use ESM, you will be probably be able to tree-shake away a significant portion of the library.
(Tiny or Typed) Parser Expression Grammer
Please make an issue on Github or email/dm me if you have feedback or suggestions!