Skip to content

Commit

Permalink
Script updating archive at 2023-08-24T00:35:54Z. [ci skip]
Browse files Browse the repository at this point in the history
  • Loading branch information
ID Bot committed Aug 24, 2023
1 parent 2615af1 commit 1c7e1a2
Showing 1 changed file with 23 additions and 2 deletions.
25 changes: 23 additions & 2 deletions archive.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"magic": "E!vIA5L86J2I",
"timestamp": "2023-08-22T00:36:25.905787+00:00",
"timestamp": "2023-08-24T00:35:50.645004+00:00",
"repo": "httpwg/http-extensions",
"labels": [
{
Expand Down Expand Up @@ -58654,7 +58654,7 @@
],
"body": "I propose we ban `%00` from serialized `DisplayString`, in order to avoid a large class of security issues (and to simplify implementations) in C/POSIX environments where `0x00` is the string terminator.\r\n\r\nI can see three alternatives here.\r\n\r\n### Ban `%00` outright\r\n\r\nIn the serialization of `Display String` we add:\r\n\r\n if byte is %0x00, fail serializing.\r\n\r\nIn the parsing of `Display String we add:\r\n\r\n if octet is zero, fail parsing.\r\n\r\nIf either end uses a strict UTF-8 encoder or decoder, which do not allow \"overlong byte sequences\" (see below), this prevents `U+0000 NULL` from being transported with DisplayString.\r\n\r\n### Specify loose UTF-8 encoding & decoding\r\n\r\nJava pioneered a handling of `U+0000 NULL` by specifying that it must be UTF-8 serialized as `0xc0 + 0x80` - a so-called \"over-long UTF-8 byte sequence\".\r\n\r\nWe have presently not specified if the UTF-8 encoding of DisplayString is \"strict\" where such over-long byte sequences are illegal or \"non-strict\" where either all or a few over-long byte sequences are accepted.\r\n\r\nOver-long byte sequences _in general_ are frowned upon, because they can be used to \"obfuscate\" UTF-8 strings, for instance by encoding a period as 0xc0 0xae to try to escape directory traversal checks.\r\n\r\nWe could change the spec to say that the UTF-8 encoding/decoding must be loose (enough) to handle `U+0000 NULL` the Java-way, but I failed to identify a suitable normative reference which didnt bring in a lot of other UniCode/UTF-8 baggage we may not want.\r\n\r\n### Optimistically serialize `0x00` as `%c0%80`\r\n\r\nWe can optimistically serialize any `0x00` bytes we encounter in the encoded UTF-8 byte_array using the \"java-trick\" and leave it to the UTF-8 decoder to either reject or accept as it sees fit.\r\n\r\nIn the serialization of `DisplayString` we add:\r\n\r\n If byte is %x00 append \"%c0%80\" to encoded_string.\r\n\r\nIn the parsing of `DisplayString` we add:\r\n\r\n if octet is zero, fail parsing.\r\n\r\n### Discussion\r\n\r\nI'm a big fan of clear text and simple solutions, and I cannot imagine why anybody would ever try to send `U+0000 NULL` through a `DisplayString` for non-nefarious purposes, so I am 100% on board with simply banning `0x00` bytes in the encoded UTF-8 byte-array.\r\n\r\nBut it is a (tiny) loss of generality, and if we want to avoid that, we either make life difficult for implementers in C/POSIX based environments by rejecting this ticket, or we throw our lot with Java's handling of `U+0000 NULL` to a greater or lesser degree.\r\n\r\nOptimistically serializing `0x00`, as specified in the third alternative is a way to do that without tying ourselves to Java's mast, but it introduces some wiggle-room which, depending on ones point of view, is either desirable (allowing the receiver to reject `U+0000 NULL` by using a strict UTF-8 decoder) or undesirable (making it anyone's guess if `U+0000 NULL` can be transported in a DisplayString or not.)\r\n\r\nBut Java's handling is a bit of a hack, and allowing (some) over-long UTF-8 byte sequences introduces a class of security issues similar to the one this ticket tries to prevent, so all in all, I think we should just ban `%00` in serialized `DisplayString` with the first alternative above.\r\n\r\n\r\n\r\n\r\n",
"createdAt": "2023-08-17T05:19:34Z",
"updatedAt": "2023-08-21T13:38:23Z",
"updatedAt": "2023-08-23T22:55:54Z",
"closedAt": null,
"comments": [
{
Expand Down Expand Up @@ -58719,6 +58719,27 @@
"body": "> I'd prefer to stick with a caution here, rather than a prohibition.\r\n\r\nHow about simply adding: \"UniCode category 'Cc' (Control characters) are not permitted, unless the field definition unwisely explicitly permits them.\"",
"createdAt": "2023-08-21T13:38:23Z",
"updatedAt": "2023-08-21T13:38:23Z"
},
{
"author": "mnot",
"authorAssociation": "MEMBER",
"body": "That's a field-specific parsing rule, which doesn't seem great from an API perspective. ",
"createdAt": "2023-08-23T07:34:27Z",
"updatedAt": "2023-08-23T07:34:27Z"
},
{
"author": "bsdphk",
"authorAssociation": "CONTRIBUTOR",
"body": "> That's a field-specific parsing rule, which doesn't seem great from an API perspective.\r\n\r\nNot if it goes in the \"Defining New Structured Fields\" section ?\r\n\r\nI think it is perfectly fair game to insist that fields which accept Control Characters are forced to specify that, and ban them by default.\r\n\r\nMy hope is that adding such text will mean that no field will ever be defined to allow them.",
"createdAt": "2023-08-23T21:45:57Z",
"updatedAt": "2023-08-23T21:45:57Z"
},
{
"author": "mnot",
"authorAssociation": "MEMBER",
"body": "Right, but the software handling that ban isn't a SF processor, it's the field-specific code. I think the most we could do would be something like how we handle field-specific constraint failures in 'defining new...', eg:\r\n\r\n> Field specifications that allow Display Strings should specify how control characters [ref?] should be handled; absent specific instructions, they are not allowed.\r\n\r\nBut that creates a situation whereby the default is that there's _nothing_ in the field's spec about control characters, the SF generic implementation passes them through, and the field-specific code is expected to know that it should reject them. Not great.\r\n\r\nSo a better approach might be:\r\n\r\n> Field specifications that allow Display Strings are required to specify how control characters [ref?] should be handled; considering them to errors [ref to error handling] is encouraged.\r\n\r\n",
"createdAt": "2023-08-23T22:55:54Z",
"updatedAt": "2023-08-23T22:55:54Z"
}
]
},
Expand Down

0 comments on commit 1c7e1a2

Please sign in to comment.