feat(parse): Kickstart with `vte`s code base #41

epage · 2023-03-02T16:37:15Z

In #4, I researched different libs I could use for this but none fit the
bill. I approached vte about making changes to fit our needs but it
didn't seem like there was interest
(alacritty/vte#82), so instead we are using it
as the base for what we need.

When modifying table.rs.in, `cargo run` must be run in the `codegen` crate. The result of expansion is included in the source tree so that consumers don't need to pull in syntex just to compile.

This shows the variant in addition to the packed value - much more helpful when debugging.

Debugging

Includes an example `parselog` which prints all of the actions a Parser implementation is given the opportunity to handle. One way to test this is to pipe vim into it: vim | target/release/examples/parselog And type `:q` to quit. Vim won't show up, but it still accepts input. This version of the parser doesn't handle UTF-8. It's implemented as described by http://vt100.net/emu/dec_ansi_parser which did not include UTF-8 support. Next steps are adding UTF-8 support.

This adds a table-driven UTF-8 parser which only has a single branch for the entire parser. UTF-8 support is essentially bolted onto the VTE parser. Not the most elegant, but it does prevent the transition tables from blowing up. Instead of refactoring the syntax extension to handle both table definitions, I've opted to copy/paste now for both simplicities sake and because I can't see a clear path to a minimal shared solution.

Apparently byte character literals are a thing :).

Apparently 0x07 is frequently used. Not handling this causes SSH prompts to never appear!

adds test for UTF-8 parsing

Add travis.yml

This resolves an issue with parsing of DCS escapes, where it would try to write parameters beyond the maximum parameter count limit. Fixes #50.

This adds support for CSI subparameters like `\x1b[38:2:255:0:255m`, which allows the combination of truecolor SGR commands together with other SGR parameters like bold text, without any ambiguity. This implements subparameters by storing them in a list together with all other parameters and having a separate slice to indicate which parameter is a subparameter and how long the subparameter list is. This allows for static memory allocation and good performance while still having the option for dynamic sizing of the parameters. Since the subparameters are now also counted as parameters, the number of allowed parameters has been increased from `16` to `32`. Since the existing structures combine the handling of parameters for CSI and DCS escape sequences, it is now also possible for DCS parameters to have subparameters, even though that is currently never used. Considering that DCS is rarely supported by terminal emulators, handling these separately would likely just cause unnecessary issues. The performance should also be better by using this existing subparam structure rather than having two separate structures for DCS and CSI parameters. The only API provided for accessing the list of parameters is using an iterator, this is intentional to make the internal structure clear and allow for easy optimizations downstream. Since it makes little sense to access parameters out of order, this limitation should not have any negative effects on performance. The main drawback is that direct access to the first parameter while ignoring all other subparameters is less efficient, since it requires indexing a slice after iterating to the element. However while this is often useful, it's mostly done for the first few parameters which significantly reduces the overhead to a negligible amount. At the same time this forces people to support subparameters or at least consider their existence, which should make it more difficult to implement things improperly downstream. Fixes #22.

Since limits CSI parameters to be within range of `u16`, rather than `i64`. This should effectively prevent downstream users from running into DoS problems with excessively big escape sequence requests. An example of a problematic escape would be `CSI Ps b` (repeat char). According to https://vt100.net/emu/dec_ansi_parser, the smallest possible size limit would be `16383`: > The VT500 Programmer Information is inconsistent regarding the maximum > value that a parameter can take. In section 4.3.3.2 of EK-VT520-RM it > says that “any parameter greater than 9999 (decimal) is set to 9999 > (decimal)”. However, in the description of DECSR (Secure Reset), its > parameter is allowed to range from 0 to 16383. Because individual > control functions need to make sure that numeric parameters are within > specific limits, the supported maximum is not critical, but it must be > at least 16383.

I've noticed while playing around with vte in a personal project, that I didn't need all of the methods of the `Perform` trait. In Alacritty we also don't react to everything and other crates like `strip-ansi-escapes` basically doesn't respond to anything. Of course it's always easy to just copy/paste the entire trait and move on, but I think it's probably worth making the life of downstream easier by not enforcing this.

This resolves a bug when transitioning between DCS and ESC sequences, which would cause the intermediates of the ESC dispatch to contain data from the DCS sequence.

This changes the test code to use only a single dispatcher instead of having a dispatcher for every single type of escape sequence. This makes it trivial to test transitions between the two separate escape sequence types.

When the params list for the CSI/DCS escapes is filled with all 32 parameters but ends in a subparameter, it would not properly stage the length of the added subparameters causing the param iterator to get stuck in place. To ensure we always update the subparameter length even when no parameter is staged after it, the length of subparameters is now updated immediately while the subparameters itself are added. Fixes #77.

In #4, I researched different libs I could use for this but none fit the bill. I approached `vte` about making changes to fit our needs but it didn't seem like there was interest (alacritty/vte#82), so instead we are using it as the base for what we need.

jwilm and others added 30 commits September 12, 2016 10:51

WIP

dca1c5c

Finish Transition parser

fd3e436

Finish implementing codegen for state table

18ced20

When modifying table.rs.in, `cargo run` must be run in the `codegen` crate. The result of expansion is included in the source tree so that consumers don't need to pull in syntex just to compile.

wip parser

5fdda06

Add custom Debug for ext::Transition

0e9785c

This shows the variant in addition to the packed value - much more helpful when debugging.

Add test for ext::Transition

171ad08

Debugging

Fix errors in codegen

19c2710

Add fixed table.rs

4c2932c

Fix some comments

8553c85

Expand unpack state/action tests

8da0d9e

Rename crate

5505121

Remove UTF-8 TODO comment

9827671

Rename and document vte crate

85388ab

Move utf8 parsing into separate crate

917080a

Fix import in example

02301e8

Add README.md

2c82ecc

Add developer note to README

a7711f9

Update Cargo.tomls for publishing and add LICENSEs

458eb17

Specify version for utf8parse dependency

0caff0d

Add crates.io badge

b40dff0

Lightly clean up code

edcb3d6

Apparently byte character literals are a thing :).

Add inline attributes to vte stuff

a15a9c3

Publish vte 0.1.1

955bc84

Fix bug with OSC string termination

5509849

Apparently 0x07 is frequently used. Not handling this causes SSH prompts to never appear!

adds UTF8parse test and associated UTF-8 test file

b016827

Merge pull request #2 from lizbaillie/test-utf8-parsing

9aa5c2a

adds test for UTF-8 parsing

Add travis.yml

69cedd0

Merge pull request #3 from jwilm/add-travis

7cec8a5

Add travis.yml

chrisduerr and others added 26 commits February 11, 2020 19:59

Bump version to 0.7.0

9e2fc2f

Fix OOB in DCS parser

cb3b717

This resolves an issue with parsing of DCS escapes, where it would try to write parameters beyond the maximum parameter count limit. Fixes #50.

Bump version to 0.7.1

204893a

Remove C1 ST support from OSCs

c8454b7

Bump version to 0.8.0

2a92abe

Remove redundant .to_string()

582731c

Rename example variable to match trait

408b158

Remove outdated documentation

335aaf3

Move CI to sourcehut

f53d1b7

Improve parser performance

0310be1

Bump version to 0.9.0

3cafbad

Bump version to 0.10.0

8a0c57b

Fix intermediate reset when going from DCS to ESC

59bb331

This resolves a bug when transitioning between DCS and ESC sequences, which would cause the intermediates of the ESC dispatch to contain data from the DCS sequence.

Refactor test code

1f1a929

This changes the test code to use only a single dispatcher instead of having a dispatcher for every single type of escape sequence. This makes it trivial to test transitions between the two separate escape sequence types.

Bump version to 0.10.1

fe1022f

Migrate to 2021 edition

dfac57e

Bump arrayvec to 0.7.2

45670c4

Bump version to 0.11.0

dc861b1

chore: Remove utf8parse / vte_generate_state_changes

89039f8

style: Use default fmt

a55df30

chore: Move files into place

f1e65aa

epage force-pushed the fork branch from 3cc5092 to 7058d0b Compare March 2, 2023 16:46

epage merged commit ba13831 into main Mar 2, 2023

epage deleted the fork branch March 2, 2023 16:49

epage mentioned this pull request Mar 10, 2023

Provide stream parser #4

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(parse): Kickstart with `vte`s code base #41

feat(parse): Kickstart with `vte`s code base #41

epage commented Mar 2, 2023

feat(parse): Kickstart with vtes code base #41

feat(parse): Kickstart with vtes code base #41

Conversation

epage commented Mar 2, 2023

feat(parse): Kickstart with `vte`s code base #41

feat(parse): Kickstart with `vte`s code base #41