Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trim whitespace when processing dids #1299

Merged
merged 3 commits into from
Oct 5, 2023

Conversation

ferozsalam
Copy link
Contributor

@ferozsalam ferozsalam commented Jul 7, 2023

I had some issues when enabling a custom domain using the .well-known file method because of the EOL character – as the code currently stands, any did followed by an EOL character is passed for verification as:

"responseBody": {
  "did": "<did>\n"
}

The presence of the trailing newline character leads to the user receiving the error "The server gave an invalid response and may be out of date". I solved the issue by removing the EOL character from my did file, but it would be neater if this was done for the user – especially as the EOL character is added by default by most Unix editors.

I had some issues when enabling a custom domain using the .well-known file method
because of the EOL character – as the code currently stands, any `did` followed
by an EOL character is passed for verification as:

```
"responseBody": {
  "did": "<did>\n"
}
```

The presence of the trailing newline character leads to the user receiving the
error "The server gave an invalid response and may be out of date". I solved
the issue by removing the EOL character from my `did` file, but it would be
neater if this was done for the user.

Signed-off-by: Feroz Salam <[email protected]>
@devinivy
Copy link
Collaborator

devinivy commented Jul 7, 2023

This seems pretty practical to me. @bnewbold @dholms slightly sensitive since it impacts the spec—what do you think?

@bnewbold
Copy link
Collaborator

bnewbold commented Jul 8, 2023

In the context of this .well-known text file, i'd be fine doing a bit of cleanup. Normally I think we should not try to "clean" input data, but whitespace in plain text files can be a bit messy (depending on text editor config, etc).

Parsing rules I would propose:

  • read the file in text, with UTF-8 encoding, but limited to ASCII (so ASCII encoding also permited)
  • read the first line of the file (ASCII newline-delimited)
  • remove leading and trailing ASCII whitespace from that line
  • don't normalize in any other ways (eg, don't auto-lower-case)o

We could consider skipping any #-prefixed lines, but that seems too complex.

I can update the spec if we go ahead with this.

@ferozsalam
Copy link
Contributor Author

Makes sense – I'll update the PR with the necessary changes soon.

- Read the file in text, strip out non-ASCII chars
- Read in the first line of the file and strip any remaining newline characters
@ferozsalam
Copy link
Contributor Author

I've updated the PR based on the feedback – TypeScript isn't a language I'm hugely familiar with so welcome any feedback to improve this.

@dholms
Copy link
Collaborator

dholms commented Aug 18, 2023

Catching back up on this

read the file in text, with UTF-8 encoding, but limited to ASCII (so ASCII encoding also permited)

I'm not quite sure I follow what you meant by this @bnewbold. Did you mean stripping out non-ascii characters? Or failing on non-ascii characters?

The rest sounds pretty reasonable to me 🤔 Altho seems a bit funny to allow subsequent lines of gibberish.

@bnewbold
Copy link
Collaborator

Yeah I mean failing on any non-ASCII characters. The mention of Unicode at all was meant to imply it is safe to use usual "do an HTTP GET and treat the result as Text, not Binary" when using an HTTP client, using default settings. But maybe some clients (or standard libraries) will coerce. Or maybe i'm over thinking this part.

Lines of gibberish are weird, but I feel like adding too many rules might also be weird? I guess we could say something like "the whole file must be less than XYZ KByte, and then strip all leading+trailing whitespace, then process the result". I think in the "simple text file protocol" world it is pretty normal to say "just grab the first line, ignore anything else" though.

@bnewbold bnewbold requested a review from dholms September 12, 2023 20:57
simplifies what we are normalizing: just take the first line and strip whitespace, don't try to remove any non-ASCII characters as well.
@bnewbold
Copy link
Collaborator

I made a small simplification to this PR to only strip the first line, and not try to remove non-ASCII bytes/characters as well. I think this is ready to merge.

@bnewbold bnewbold merged commit 326d849 into bluesky-social:main Oct 5, 2023
mloar pushed a commit to mloar/atproto that referenced this pull request Nov 15, 2023
* Trim whitespace when processing dids

I had some issues when enabling a custom domain using the .well-known file method
because of the EOL character – as the code currently stands, any `did` followed
by an EOL character is passed for verification as:

```
"responseBody": {
  "did": "<did>\n"
}
```

The presence of the trailing newline character leads to the user receiving the
error "The server gave an invalid response and may be out of date". I solved
the issue by removing the EOL character from my `did` file, but it would be
neater if this was done for the user.

Signed-off-by: Feroz Salam <[email protected]>

* Update did handling based on PR feedback

- Read the file in text, strip out non-ASCII chars
- Read in the first line of the file and strip any remaining newline characters

* identity: don't remove non-ASCII characters

simplifies what we are normalizing: just take the first line and strip whitespace, don't try to remove any non-ASCII characters as well.

---------

Signed-off-by: Feroz Salam <[email protected]>
Co-authored-by: bnewbold <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants