Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JSOUP-2224] add wildcards #2225

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

beargiles
Copy link

Added wildcards attributes for both tags and global scope. This allows the user to specify an unknown set of attributes, e.g., data-*, using a wildcard pattern. This is different from :all since it is selective.

This pattern must be recognized by java.util.regex.Pattern, e.g., data-.+. The pattern is case-insensiitve and must match the entire string since '^' and '$' are quietly added.

A variant approach is to extend the existing addAttributes() and removeAttributes() methods to recognize patterns, e.g., via a marker like enclosing braces. (hence {data-.+}). This approach could use the same implementation as below - the existing methods would simply strip the braces and call the now-private methods.

It is common for applications to use custom attributes, e.g., 'aria-*', and
in fact with HTML 5(?) this is offically recognized with 'data-*' wildcards.
It is often impossible to list all possible custom attributes, yet we want
to be more selective than using ':all'.

This patch adds support for both per-tag and global attribute wildcards.
The wildcard must be recognized by `java.text.Pattern`, e.g., `data-.+`.
The patch will quietly add the '^' and '$' delimiters to the provided
pattern.
Fixed removeTags() - it hadn't removed the wildcard attributes.

Also changed the Map so the key is a TagName, not a String, in
order to avoid any uncertainty regarding capitalization.
@beargiles
Copy link
Author

I'll check the error...

After thinking about it overnight I think it makes sense to remove the explicit 'wildcard' methods and use pattern:... for the attribute name. This is motivated in part by identifying a few other places where wildcards make sense (and thus would result in a combinatorical explosion if there are extensions, e.g., for more advanced pattern matching) and in part by having more flexibility in how the pattern is specified. There's no need for different braces/brackets/etc, just use a different protocol than 'pattern'.

I also realized that I need to add corresponding functionality to the add/remove protocol methods.

On a related note the 'All' value should be exposed. That would be used for global wildcards.

@beargiles
Copy link
Author

I almost forgot - the All should probably be extended with FormControl, Media, and possibly Visible. These would be used with the standard event attributes - it's shorthand for multiple related tags.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant