Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial RegExp #184

Merged
merged 14 commits into from
Aug 26, 2024
Merged

Initial RegExp #184

merged 14 commits into from
Aug 26, 2024

Conversation

lemueldls
Copy link
Contributor

Fixes #183
Fixes #175

image

@@ -39,6 +39,7 @@ path-absolutize = { version = "3.0", features = ["use_unix_paths_on_wasm"] }
either = "1.6"
levenshtein = "1"
ordered-float = "4.2"
regress = { version = "0.10.0", features = [] }
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like an awesome library!

exec(input: string): RegExpExecArray | null;
}

interface RegExpExecArray extends Array<string> {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Man JS spec is crazy. So the returned result is an array with a bunch of added string/named properties! I think this definition where an interface extends a class works 🤞

arguments.first().unwrap().non_spread_type().expect("pattern");
let pattern_type = types.get_type_by_id(pattern_type_id);
let pattern = {
let Type::Constant(Constant::String(pattern)) = pattern_type else {
Copy link
Owner

@kaleidawave kaleidawave Aug 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great, if the exec is run with a constant string argument then we get a constant result.

I assume this example gets always true or false warnings under this branch (where TSC doesn't)


I wonder whether this can be extended for general (ie via Type::RootPolyType or Type::Constructor) strings not through the constant result but so that .groups has named members. So new RegExp("Hi (?<name>.*)").exec(event.target.value) would have groups as { name: string }. I think TS added this, but I can't find it?

I don't know whether regress allows introspection? I don't know whether this could be a super simple scan function to pick out groups from the pattern?

If not simple then that is fine. Can merge without

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wait, I think this is being collected below? https://github.com/kaleidawave/ezno/pull/184/files#diff-be53d15efbd9ffdb15e107f75ef4867a569eb1ba61ecbd6cc1c468d3f49e8193R53

I will have a think about how this could work

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TS Playground example works:

image

if let Some(flags) = flags {
for flag in flags.chars() {
match flag {
'd' => unimplemented!("indices for substring matches are not supported"),
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this not something regress supports?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for flag in flags.chars() {
match flag {
'd' => unimplemented!("indices for substring matches are not supported"),
'g' => unimplemented!("stateful regex is not supported"),
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think at the moment this can be ignored rather than panicking.

I think the eventful design of Ezno can support this in the future. For example evaluating a stateful RegExp could set a regexp.#used = true property. That would be a diagnostic that I don't think any other tool does today!

Expression::RegexLiteral { pattern, flags: _, position: _ } => {
return checking_data.types.new_regex(pattern.clone());
Expression::RegexLiteral { pattern, flags, position } => {
return checking_data.types.new_regex(pattern, flags, position);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 nice. RegExp constructor and regex literal have the same logic

@kaleidawave
Copy link
Owner

Nice! Didn't expect this so soon after I wrote that issue. This is great, really impressed that you got it a nice working example. It is a great use case for constant_functions, where the logic extends something representable in JS.

Minor things that I haven't already brought up in the review

Looks great at the moment, if any of those things are too difficult LMK. Happy to just add the constant stuff as that is a really neat feature!

@kaleidawave kaleidawave added enhancement New feature or request checking Issues around checking labels Aug 7, 2024
@lemueldls
Copy link
Contributor Author

I've written a constant and variable version of the exec function, though I am not sure how to test non-constant strings (like Type::RootPolyType or Type::Constructor)..

I am experimenting with utf16 matching but am running into some correctness issues. regress is very standard compliant, so I'll find a test suite to find how to correctly match ranges.

I am also running into issues with optional property access of RegExpExecArray | null types on the result of the non-constant version of exec. Not sure if I am doing unions correctly.

@kaleidawave
Copy link
Owner

I've written a constant and variable version of the exec function, though I am not sure how to test non-constant strings (like Type::RootPolyType or Type::Constructor)..

Awesome. If you try

function greetingWho() {
	const r = new RegExp("Hi (?<name>.*)")
	print_type(r.exec("Hi Ben")!.groups)
}

Does it print { name : string }? For the dynamic version is it creating a result result = types.new_or_type(*result of object builder that contains special .group property*, TypeId::NULL)

I am experimenting with utf16 matching but am running into some correctness issues. regress is very standard compliant, so I'll find a test suite to find how to correctly match ranges.

Awesome, I think basic support is good for now. Maybe put a comment in checker/README.md that regex support is based on regress and mention any slight differences?

I am also running into issues with optional property access of RegExpExecArray | null types on the result of the non-constant version of exec. Not sure if I am doing unions correctly.

From last week I think property access on an or (left, right) didn't check whether both left and right contained that property (aka declare let a: { b: 2 } | null; print_type(a.b) printed 2 rather than an error). I have just merged #157 (which is a huge PR where I redid a lot of the property things) which should have fixed that case (and also added non null assertions so the above should work once main is merged in).

@kaleidawave
Copy link
Owner

Hey, putting out a release this week. Would love to have this feature in! Do you think you can maybe cut it back to what was working in the screenshot?

@lemueldls
Copy link
Contributor Author

lemueldls commented Aug 25, 2024

Non-constant examples work, but I'm still not sure how I should write specification tests for it.


Does it print { name : string }?

It prints { name: "Ben" } (but doing !. causes problems)
image

Issue with !.:
image


For the dynamic version is it creating a result result = types.new_or_type(*result of object builder that contains special .group property*, TypeId::NULL)

Yes that was the original implementation. It was an error until I pulled from main, now there's a Expression is always false warning with optional property access of constants.

However another error still happens with non-constant optional property access:
image

Only when I also remove the new_or_type does it give the expected result:
image


The original example still works fine with or without optional property access, I'll write some working specifications soon.

image

@lemueldls lemueldls marked this pull request as ready for review August 25, 2024 22:36
- Add new specification test for above
- Fix clippy lints
- Don't print [null] as prototype of object
@kaleidawave
Copy link
Owner

Awesome!

It prints { name: "Ben" } (but doing !. causes problems)

Yep its a case of #165. The parser thinks treats ! as something before an expression and so it thinks the . means the start of a decimal number like .25. Will fix eventually. I think it works if you split it between lines with a variable.

Only when I also remove the new_or_type does it give the expected result:

Hmm yes I think there are some problems with ?. and narrowing at the moment...

@kaleidawave kaleidawave merged commit c952b6d into kaleidawave:main Aug 26, 2024
9 checks passed
@kaleidawave
Copy link
Owner

Merged!

Made a few changes before merge

  • Fixed things clippy was whining about
  • Added a function error for new RegExp with a bad input (should rename ConstantFunctionError::BadCall to ConstantFunctionError::CannotComputeConstant in the future)
  • Added a test for bad expressions and the new bad function call above
  • The Constant `RegExp.exec` groups greedy test was testing against a type checker error that printed the .groups object. However it looks like the order of items in .groups is unpredictable because regress uses a HashMap along the way. I have made the changes this side to use my own crate::Map for groups which I already use in the checker for small things and importantly has a stable order. To get stable ordering it needs a (hopefully small) change on the regress side: Stable ordering of named_group_indices ridiculousfish/regress#96. In the meantime I have changed the body of the test to not rely on ordering.

I did try adding String.prototype.match and String.prototype.split but there is something wrong with events on my side. I will investigate and let you when I get it working! Also I was worried it would add a big perf drop integrating this external process in, but looks okay so far!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
checking Issues around checking enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add regex checking Implement RegExp/regular expression/regex analysis
2 participants