Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make raw field on literal AST types optional, or remove them #7254

Open
overlookmotel opened this issue Nov 12, 2024 · 9 comments
Open

Make raw field on literal AST types optional, or remove them #7254

overlookmotel opened this issue Nov 12, 2024 · 9 comments
Assignees
Labels
C-bug Category - Bug

Comments

@overlookmotel
Copy link
Contributor

The problem

#7211 brought up a problem. Most of our literal types (NumericLiteral etc) have raw field.

This makes sense when the AST has come from the parser, but little sense when the node is generated (e.g. in transformer), because the node has no "raw" representation.

Having to provide "0" here feels like a hack to me, for example:

pub fn number_0(self) -> Expression<'a> {
self.expression_numeric_literal(Span::default(), 0.0, "0", NumberBase::Decimal)
}

The exception is StringLiteral which doesn't have a raw field. This feels like an omission - if the other literal types have a raw field, it should have one too. But adding one would be a pain, as there are a lot of places we generate strings in transformer etc.

Prior art

Babel's types have the raw field under extra which is optional:

export interface StringLiteral extends BaseNode {
  type: "StringLiteral";
  value: string;
}

interface BaseNode {
  type: Node["type"];
  // ...
  extra?: Record<string, unknown>;
}

StringLiteral BaseNode

Acorn's types also have the raw field as optional:

export interface Literal extends Node {
  type: "Literal"
  value?: string | boolean | null | number | RegExp | bigint
  raw?: string
  regex?: {
    pattern: string
    flags: string
  }
  bigint?: string
}

source

Possible solutions

I can see 2 options:

Make raw fields optional

  • Change raw field on NumericLiteral etc to Option<&'a str>.
  • Add a raw: Option<&'a str> field to StringLiteral.

Remove all raw fields

Where an AST node has a non-empty span, the raw value can be obtained by slicing source text. So remove all the raw fields, and use methods to get raw value where required:

impl NumericLiteral {
    pub fn raw<'a>(&self, source_text: &'a str) -> Option<&'a str> {
        if self.span.start == 0 && self.span.end == 0 {
            None
        } else {
            Some(&source_text[self.span.start..self.span.end])
        }
    }
}

(as suggested in #5522)

Which?

Personally I prefer the 2nd. I don't think the raw fields are used much, so they're bloating the AST for little value. The methods to get raw value from Span are pretty trivial and cheap.

@ShuiRuTian
Copy link
Contributor

I did a try to remove raw fields, but there are some issues...

I think the core problem is the mix purpose of AST, different requirement leads to different design.

For example, for compiling, we do not need any origin data, so the AST could be abstract.

But for linter or formatter, some abilities should not be enabled by default. Like, for this, they might not want 1_0_2_2 be converted to 1_022, at least there should be an option to control it.

Roslyn(Biome and RA too, they refers Roslyn) keeps a lossless syntax tree for this purpose. This design makes their AST great for interreact, although it is not pretty memory friendly(because they need to maintain about 2 trees).

So, I would suggest to choose option 1. Any thoughts? @overlookmotel @Boshen

@overlookmotel
Copy link
Contributor Author

It is true that transformer/minifier and linter/formatter have different needs. However in this case, I don't think it much affects which option we go for.

For example, in linter the numeric_separators_style rule does need the raw code string. But it can equally easily get that from a raw field, or a raw() method. So either option works for linter. Ditto the formatter, I think.

The raw() method is simple and cheap, so personally I still prefer it.

@overlookmotel
Copy link
Contributor Author

@Boshen Do you have any opinion on which option? If we can agree, perhaps @ShuiRuTian is willing to work on it?

@overlookmotel
Copy link
Contributor Author

Boshen prefers raw: Option<&'a str>.

@overlookmotel
Copy link
Contributor Author

#7393 adds raw field to StringLiteral.

Still to do: Change raw fields on other literal types from Atom<'a> to Option<Atom<'a>>.

@overlookmotel
Copy link
Contributor Author

overlookmotel commented Nov 25, 2024

@ShuiRuTian I wonder if you'd like to help out with this? Changing the raw fields to Options I don't think will cause same problems as you hit before. In linter the raw fields can just be unwrapped, as AST is always straight from the parser, so they're guaranteed to be Some.

@ShuiRuTian
Copy link
Contributor

ShuiRuTian commented Nov 26, 2024

@overlookmotel I would love to, but @Boshen already have a PR #7393

Not sure what I need to do now.

@overlookmotel
Copy link
Contributor Author

overlookmotel commented Nov 26, 2024

Thanks for your willingness!

Boshen's PR will add raw field to StringLiteral. The remaining task is to make the raw fields on all the other literal types optional.

e.g. NumericLiteral's raw field is currently &'a str but should be Option<Atom<'a>>.

FYI, Atom<'a> and &'a str are basically the same - Atom is just a wrapper around &str:

pub struct Atom<'a>(&'a str);

Boshen said he ran into a problem using raw: Option<&'a str> on StringLiteral because ast_tools didn't understand that syntax, so let's go with raw: Option<Atom<'a>> to avoid that problem.

Does that make more sense?

Boshen added a commit that referenced this issue Nov 26, 2024
@ShuiRuTian
Copy link
Contributor

Yeah, it makes things much clearer, thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category - Bug
Projects
None yet
Development

No branches or pull requests

3 participants