Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define OPTIMADE regex format #490

Merged
merged 32 commits into from
Mar 22, 2024
Merged

Conversation

rartino
Copy link
Contributor

@rartino rartino commented Jan 6, 2024

We keep coming back to the lack of regex:es in the OPTIMADE specification, in particular lately:

The challenges with picking one specific regex format has been debated quite intensely all the way back to, e.g., #42, for more recent discussions see far down in #160.

This PR creates a free-standing appendix that describes what I believe works as a regex format for OPTIMADE (see in particular the discussions far down in #160 to see the motivations for this particular format).
No parts of the specification itself yet refers to this appendix. This is intended to be added in #488 and #160.

It would be desirable to merge this fairly soon since it appears we have painted ourselves into a corner to a degree with regards to the need for the pattern JSON Schema validation field to properly validate the new space_group_symmetry_operations_xyz property.

@rartino rartino requested review from ml-evs, vaitkus and sauliusg January 6, 2024 13:48
optimade.rst Outdated Show resolved Hide resolved
optimade.rst Outdated Show resolved Hide resolved
optimade.rst Outdated Show resolved Hide resolved
optimade.rst Outdated Show resolved Hide resolved
@ml-evs
Copy link
Member

ml-evs commented Jan 7, 2024

I'm not sure I see the need to couple the regexp format for search (e.g., a future LIKE) and the JSON Schema field pattern for constraining the shape of allowed values of fields (and in our only example so far, a field where we explicitly state that search is not recommended), so for me the only thing missing is to simply allow pattern in our property definitions. Perhaps this comes down to me not seeing a use case for the suggested LIKE operator given the current use of string fields though.

@rartino
Copy link
Contributor Author

rartino commented Jan 7, 2024

@ml-evs

I'm not sure I see the need to couple the regexp format for search (e.g., a future LIKE) and the JSON Schema field pattern for constraining the shape of allowed values of fields

We now have (at least) two different places in the specification for regex:es where we intend to standardize on subsets of PCRE2 so that not all implementations are forced to handle the full complexity of all of PCRE2 (or possibly worse, the specific ECMA-262 flavor of it, which may not even be 100% compatibly supported by the PCRE2 library in a few years). Does it not make sense to make these regex formats exactly the same?

I suppose a property validator usually can be written with access to the full PCRE2 library. However, one may want to give a bit of consideration to the possibility of validating other data formats against these schemas - in which case it would be nice, e.g., if the pattern property definition regex is restrained enough to be translatable into, e.g., an XML Schema regex, etc.

Co-authored-by: Antanas Vaitkus <[email protected]>
@rartino
Copy link
Contributor Author

rartino commented Jan 10, 2024

@sauliusg You are the author of #160 for better string search in the filter language, which presently sits at us agreeing to skip "LIKE" and do a "MATCH" operator for a subset of PCRE2, with is what this appendix is trying to make precise.

Can you please take a look at this quite short PR and say if you agree this is a useful appendix to standardize the right regex format for OPTIMADE filters? (The idea is that #160 can be updated to refer to this appendix once it has been merged).

Note that I've left out range quantifiers ({x}, {x,y}, {x,}). I do not feel strongly about that, I just have a gut feeling that there are db backends and schema languages that do not do them. If anyone speaks up feeling strongly to include them, I'll change that. I have a slightly stronger opinion to omit the lazy matching given support in current db backends (this I believe was pointed out by @sauliusg somewhere in the discussions about #160).

@rartino rartino added topic/filtering-language Issue discussing changes and improvements to the query and filtering language topic/property-standardization The specification of the precise data representation of properties and entries PR/ready-for-review Add this flag if you are the author of the PR and you want it to be reviewed. Remove it when editing labels Jan 10, 2024
optimade.rst Outdated Show resolved Hide resolved
optimade.rst Outdated Show resolved Hide resolved
optimade.rst Outdated Show resolved Hide resolved
optimade.rst Outdated Show resolved Hide resolved
optimade.rst Outdated Show resolved Hide resolved
optimade.rst Outdated Show resolved Hide resolved
optimade.rst Outdated Show resolved Hide resolved
optimade.rst Outdated Show resolved Hide resolved
optimade.rst Outdated Show resolved Hide resolved
ml-evs
ml-evs previously approved these changes Jan 17, 2024
Copy link
Member

@ml-evs ml-evs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only one formatting issue, otherwise good for me

optimade.rst Outdated Show resolved Hide resolved
@ml-evs ml-evs added the blocking-release This is a PR or issue that presently blocks the release of next version of the spec. label Jan 17, 2024
@ml-evs ml-evs added this to the v1.2 milestone Jan 17, 2024
optimade.rst Outdated Show resolved Hide resolved
optimade.rst Outdated Show resolved Hide resolved
optimade.rst Outdated Show resolved Hide resolved
optimade.rst Outdated Show resolved Hide resolved
sauliusg
sauliusg previously approved these changes Mar 22, 2024
Copy link
Contributor

@sauliusg sauliusg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After diffing commits fb2e2d3 and cba1227...

@ml-evs ml-evs self-requested a review March 22, 2024 08:04
ml-evs added a commit that referenced this pull request Mar 22, 2024
sauliusg
sauliusg previously approved these changes Mar 22, 2024
optimade.rst Outdated Show resolved Hide resolved
merkys
merkys previously approved these changes Mar 22, 2024
ml-evs
ml-evs previously approved these changes Mar 22, 2024
@rartino rartino dismissed stale reviews from ml-evs and merkys via 30e58d8 March 22, 2024 08:55
@merkys merkys requested review from ml-evs, sauliusg and vaitkus March 22, 2024 09:03
@rartino rartino merged commit b8e3075 into Materials-Consortia:develop Mar 22, 2024
5 checks passed
@merkys merkys mentioned this pull request Oct 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocking-release This is a PR or issue that presently blocks the release of next version of the spec. PR/ready-for-review Add this flag if you are the author of the PR and you want it to be reviewed. Remove it when editing topic/filtering-language Issue discussing changes and improvements to the query and filtering language topic/property-standardization The specification of the precise data representation of properties and entries
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants