-
Notifications
You must be signed in to change notification settings - Fork 208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
json Utf8 behavior #964
base: devel
Are you sure you want to change the base?
json Utf8 behavior #964
Conversation
Deals with the funny stuff seen in #771. The documentation should also make clear what we do turn on the utf8 flag on by default. We should also make sure this doesn't break other stuff at a distance -- I'm quite weary of the comment in 'serialize' that we don't utf8 the thing there because it's done "later on"...
FEAR! |
The broader concern I have is about what other utf8 handling is going on. I'm very disturbed to see If you're not really careful decoding at the I/O gateways of the application, you can get really crazy strings and it's hard to know what to do with it. I offer as evidence this nasty dzil code for guessing whether or not to encode META.yml files. From what I can see, Dancer::Request is reading the input handle raw into body and then everything else just guesses from there. Nothing appears to check the charset on content-type which is what probably should be done to decode the content body before doing anything else with it. Then you'd be working with characters and could UTF-8 decode JSON without the UTF-8 flag. (I don't know what that AJAX is doing in the #771 example. It it setting an appropriate content-type? Does the content-type match the content?) So... I'm not sure this PR is really getting at the underlying issue, but I'm not familiar enough with the Dancer1 guts to do or test the invasive surgery it probably needs. |
On 13-09-22 09:55 PM, David Golden wrote:
Hmm... So you're saying that Dancer1 need a fix, but that ain't it. Lemme try to reformulate that into a proposal: Dancer1's problem with utf8 seems to be that we make assumptions about The first is to continue the very hard job of converting every input The second, is to let the developer figure out what he wants. We make no I must say, the second path seems to be the saner of the two. At it is, Sounds like a plan? :-) Joy, |
I think the goal should be to try to get input into characters, work with characters (including serialization/deserialization) and then put output into an encoding based on what the app developer says they want (presumably UTF-8). I'm not an expert on this, but what jumps out at me are these points of input:
Generally, Dancer shouldn't have to guess at encodings. The RFCs are pretty careful about making sure charset is well defined for different kinds of fields, etc. So I think it's less about being tricky and more about just following the spec. |
This sounds very sane. I like. I still have to dig in the guts of Dancer to figure out where exactly we juggle with the encodings. Once I think I have a grasp on how it goes, I'll amend this PR with my proposals. Stay tuned. :-) |
Please at the same time switch off of JSON.pm (JSON::MaybeXS provides a pp-or-XS interface which is very battle-tested), to avoid the extra insanity that JSON::XS would drag in. |
Coming from #771
In a nutshell, we make Dancer's from_json behave like 'decode_json', but don't do the reverse for 'encode_json'. Hilarity ensues.
This fix document our helpful behavior, and make the from_json and to_json.
Now, I'm kinda scared of the comment that was there for 'encode_json', to the effect that the utf8 encoding would be done down the line. I still have to make due dilligences to make sure this fix doesn't inject any double-encoding bug.