Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parameters updater [extended syntax] #1061

Closed
wants to merge 1 commit into from
Closed

Conversation

alecandido
Copy link
Member

While the specification was clear in my mind, and the update application as well, there was still a non-trivial problem to solve: how to process in a distinct way lists' and dictionaries' access operations. I.e. the only two compound types you may have in JSON.

In particular, thanks to the __getitem__ (i.e. collection[...]) operation in Python being used by both dictionaries and lists, there are little differences between the two cases.
The only one is just that the access path will be a unique string, and it could be just .-separated, but then list indices have to be turned into integers.

In order to clarify the distinction between the two, I used the [] in the path syntax itself, favoring paths like x.y[n].z, instead of x.y.n.z.
To me, it seems clearer to write, but the price to pay is that you really need to parse paths.

The easiest way to parse would have been writing a simple grammar, but this would have led to a dependency on a parser generator, or to find one that we could use at "compile time" (requiring one that doesn't need any runtime dependency as well), adding the need of some compilation step.
To keep it "simpler", I manually wrote the parser, but it seems that the easiest way is just going through the classical step: a regex-based lexer to tokenize the string, and then a state machine-based parser.

In the end, it works pretty well, but I acknowledge is a good chunk of code. The alternative is just to rely on the collections being generated from some objects, that yields that all dictionary keys have to be valid Python identifiers. And thus they could be distinguished from lists' indices, since indices will always start by digit, and keys will never.
So, I could make the parser terribly simpler, just by using the x.y.n.z syntax, splitting on ., and parsing just indices (i.e. I could even just try int() on all of them, and append successful conversions' results, or original values for failing conversions).

I tried to explore the x.y[n].z syntax, and we have a solution for that. But we also have a solution for the other.
Now, it's just a matter to decide if the more pleasant syntax is worth the code required (we may be able to optimize it a bit, but I believe we can not make it as simple as the uniform syntax, since a parsing step would be truly required).

Copy link

codecov bot commented Oct 4, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 51.95%. Comparing base (6087ba8) to head (75f01f3).

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1061   +/-   ##
=======================================
  Coverage   51.95%   51.95%           
=======================================
  Files          63       63           
  Lines        2808     2808           
=======================================
  Hits         1459     1459           
  Misses       1349     1349           
Flag Coverage Δ
unittests 51.95% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@alecandido
Copy link
Member Author

@stavros11, of course it's just a standalone file, but I would ask your opinion to make a choice, before integrating any implementation in the actual package, and extending the API to cater for Parameters updates.

You can play with the file content by applying the test update u to the test serialization d, with

from prototype import u, d, update
update(d, u)

and you can also play with paths by using tokenize and parse (tokenize("my.path[n]"), and parse(tokenize(...))).

I hoped it could have been simpler, and you could also try to imagine how to make it simpler. But after trying to implement it, I may be leaning for the uniform x.y.n.z syntax, to reduce maintenance burden (and avoid performances' troubles for very large updates...).

@stavros11
Copy link
Member

Thanks @alecandido for this implementation.

I played a bit with the existing code and it seems to work well, except maybe a minor issue (that is not really an issue), that it is still possible to add integer keys in a dictionary and index them as a list. For example try:

update(d, {"b[3]": 0})

I hoped it could have been simpler, and you could also try to imagine how to make it simpler. But after trying to implement it, I may be leaning for the uniform x.y.n.z syntax, to reduce maintenance burden (and avoid performances' troubles for very large updates...).

I also expected this to be simpler, or at least less code lines, maybe even a single function. If I understand correctly, the decision to be taken here is x.y[n].z vs x.y.n.z. If that's it, then I would also lean towards the x.y.n.z which I would expect is simpler to process from our side and is not much of an overhead from the user side. In fact it may actually be even simpler for the user too. At the end of the day, I believe we only need that to update Parameters which have a fixed and known structure, so we don't really need to handle every weird case. The only confusion that I see, is between list indices and qubit names (which are dict keys) that can both be integers, but I think this problem exists with either interface.

Another issue that I have with this approach is that we are kind of inventing our own language, which is something we were trying to avoid, but I also don't have a simpler solution to propose, other than the one in qiboteam/qibocal#996, but this has its own problems already discussed in qibocal.

@alecandido
Copy link
Member Author

I played a bit with the existing code and it seems to work well, except maybe a minor issue (that is not really an issue), that it is still possible to add integer keys in a dictionary and index them as a list. For example try:

Oh yes, I forgot (or forgot to mention). In principle, I can deny even that with an explicit check. In practice, the dictionaries we're interested in can't have integer keys, since they are (mostly) the result of objects serialization (so the keys should correspond to attributes).

I.e. the update() is not fully "safe", but it should be safe with some assumptions on the domain.

The only confusion that I see, is between list indices and qubit names (which are dict keys) that can both be integers, but I think this problem exists with either interface.

And yes, you're right about this. We could in principle force qubit physical names to be str, even when they are just integers, in order to simplify the types around.
Irrespectively of the chosen option, enforcing a simpler type (i.e. not a union) may be worth to consider.

Another issue that I have with this approach is that we are kind of inventing our own language, which is something we were trying to avoid, but I also don't have a simpler solution to propose, other than the one in qiboteam/qibocal#996, but this has its own problems already discussed in qibocal.

Yes, that's exactly a language, though in one case it's extremely simple. It will only be a language for update keys, and it's also a rather common one (not completely made up from scratch) as it is supported also for some JSON configurations.

The alternative is just going immediately for an existing database, either de-nesting the objects, going for relational (and so tables and joint queries), or existing document-oriented (and using their syntax for updates).

What we're reinventing is essentially a database, but using Pydantic classes in place of a more complex ORM, and the .split(".") for update(), instead of query/update language, may be still simpler...

@alecandido alecandido mentioned this pull request Oct 31, 2024
@alecandido alecandido changed the title Parameters updater Parameters updater [extended syntax] Oct 31, 2024
@alecandido
Copy link
Member Author

Now that #1086 is approaching completion, this lost all its motivation. The PR and discussion will still be available, if relevant.

Closed in favor of #1086.

@alecandido alecandido closed this Nov 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants