Add support for Java Character.isJavaIdentifierStart() and Character.isJavaIdentifierPart() character classes via \p{javaJavaIdentifierStart} and \p{javaJavaIdentifierPart}. #21
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This change takes a brute-force approach and generates the ranges accepted by these methods by testing all the code points. MakeJavaCategories.java generates the source file containing the category ranges. I've added simple tests for the parsing and matching, let me know if you'd like to see more about any particular aspect of this. Since it reuses the existing technique used by the current Unicode character ranges and shouldn't affect any existing code paths, hopefully the change is pretty safe.
I only generated these two ranges since they're the ones I need, but it would be trivial to add the rest of the Character.is* ranges if required. I suspect they're probably covered reasonably well by the existing unicode ranges.
One thing I learned from this change -
Character.isJavaIdentifierPart(0) == true
- who knew?I haven't signed a CLA - let me know if this looks ok and I'll do so.