Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(parse): Kickstart with vtes code base #41

Merged
merged 105 commits into from
Mar 2, 2023
Merged

feat(parse): Kickstart with vtes code base #41

merged 105 commits into from
Mar 2, 2023

Conversation

epage
Copy link
Collaborator

@epage epage commented Mar 2, 2023

In #4, I researched different libs I could use for this but none fit the
bill. I approached vte about making changes to fit our needs but it
didn't seem like there was interest
(alacritty/vte#82), so instead we are using it
as the base for what we need.

jwilm and others added 30 commits September 12, 2016 10:51
When modifying table.rs.in, `cargo run` must be run in the `codegen`
crate. The result of expansion is included in the source tree so that
consumers don't need to pull in syntex just to compile.
This shows the variant in addition to the packed value - much more
helpful when debugging.
Includes an example `parselog` which prints all of the actions a Parser
implementation is given the opportunity to handle. One way to test this
is to pipe vim into it:

    vim | target/release/examples/parselog

And type `:q` to quit. Vim won't show up, but it still accepts input.

This version of the parser doesn't handle UTF-8. It's implemented as
described by http://vt100.net/emu/dec_ansi_parser which did not include
UTF-8 support.

Next steps are adding UTF-8 support.
This adds a table-driven UTF-8 parser which only has a single branch for
the entire parser. UTF-8 support is essentially bolted onto the VTE
parser. Not the most elegant, but it does prevent the transition tables
from blowing up.

Instead of refactoring the syntax extension to handle both table
definitions, I've opted to copy/paste now for both simplicities sake and
because I can't see a clear path to a minimal shared solution.
Apparently byte character literals are a thing :).
Apparently 0x07 is frequently used. Not handling this causes SSH prompts
to never appear!
chrisduerr and others added 26 commits February 11, 2020 19:59
This resolves an issue with parsing of DCS escapes, where it would try
to write parameters beyond the maximum parameter count limit.

Fixes #50.
This adds support for CSI subparameters like `\x1b[38:2:255:0:255m`,
which allows the combination of truecolor SGR commands together with
other SGR parameters like bold text, without any ambiguity.

This implements subparameters by storing them in a list together with
all other parameters and having a separate slice to indicate which
parameter is a subparameter and how long the subparameter list is. This
allows for static memory allocation and good performance while still
having the option for dynamic sizing of the parameters. Since the
subparameters are now also counted as parameters, the number of allowed
parameters has been increased from `16` to `32`.

Since the existing structures combine the handling of parameters for CSI
and DCS escape sequences, it is now also possible for DCS parameters to
have subparameters, even though that is currently never used.
Considering that DCS is rarely supported by terminal emulators, handling
these separately would likely just cause unnecessary issues. The
performance should also be better by using this existing subparam
structure rather than having two separate structures for DCS and CSI
parameters.

The only API provided for accessing the list of parameters is using an
iterator, this is intentional to make the internal structure clear and
allow for easy optimizations downstream. Since it makes little sense to
access parameters out of order, this limitation should not have any
negative effects on performance. The main drawback is that direct access
to the first parameter while ignoring all other subparameters is less
efficient, since it requires indexing a slice after iterating to the
element. However while this is often useful, it's mostly done for the
first few parameters which significantly reduces the overhead to a
negligible amount. At the same time this forces people to support
subparameters or at least consider their existence, which should make it
more difficult to implement things improperly downstream.

Fixes #22.
Since limits CSI parameters to be within range of `u16`, rather than
`i64`. This should effectively prevent downstream users from running
into DoS problems with excessively big escape sequence requests. An
example of a problematic escape would be `CSI Ps b` (repeat char).

According to https://vt100.net/emu/dec_ansi_parser, the smallest
possible size limit would be `16383`:

> The VT500 Programmer Information is inconsistent regarding the maximum
> value that a parameter can take. In section 4.3.3.2 of EK-VT520-RM it
> says that “any parameter greater than 9999 (decimal) is set to 9999
> (decimal)”. However, in the description of DECSR (Secure Reset), its
> parameter is allowed to range from 0 to 16383. Because individual
> control functions need to make sure that numeric parameters are within
> specific limits, the supported maximum is not critical, but it must be
> at least 16383.
I've noticed while playing around with vte in a personal project, that I
didn't need all of the methods of the `Perform` trait. In Alacritty we
also don't react to everything and other crates like
`strip-ansi-escapes` basically doesn't respond to anything.

Of course it's always easy to just copy/paste the entire trait and move
on, but I think it's probably worth making the life of downstream easier
by not enforcing this.
This resolves a bug when transitioning between DCS and ESC sequences,
which would cause the intermediates of the ESC dispatch to contain data
from the DCS sequence.
This changes the test code to use only a single dispatcher instead of
having a dispatcher for every single type of escape sequence.

This makes it trivial to test transitions between the two separate
escape sequence types.
When the params list for the CSI/DCS escapes is filled with all 32
parameters but ends in a subparameter, it would not properly stage the
length of the added subparameters causing the param iterator to get
stuck in place.

To ensure we always update the subparameter length even when no
parameter is staged after it, the length of subparameters is now updated
immediately while the subparameters itself are added.

Fixes #77.
In #4, I researched different libs I could use for this but none fit the
bill.  I approached `vte` about making changes to fit our needs but it
didn't seem like there was interest
(alacritty/vte#82), so instead we are using it
as the base for what we need.
@epage epage merged commit ba13831 into main Mar 2, 2023
@epage epage deleted the fork branch March 2, 2023 16:49
@epage epage mentioned this pull request Mar 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.