Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decode URL paths (#5017) #5018

Merged
merged 1 commit into from
Nov 1, 2023
Merged

Conversation

tustvold
Copy link
Contributor

@tustvold tustvold commented Nov 1, 2023

Which issue does this PR close?

Closes #5017

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

@github-actions github-actions bot added the object-store Object Store Interface label Nov 1, 2023
@tustvold
Copy link
Contributor Author

tustvold commented Nov 1, 2023

@roeap perhaps you might be able to take a look at this one?

@tustvold tustvold mentioned this pull request Nov 1, 2023
2 tasks
@tustvold tustvold merged commit ec788e1 into apache:master Nov 1, 2023
13 checks passed
@roeap
Copy link
Contributor

roeap commented Nov 2, 2023

@tustvold - sorry was a bit to late here, but generally always happy to review :).

The more general question around which characters are encoded or not actually came up a little while ago on delta-rs. Essentially Spark can write certain file path (that would be illegal on windows, but are valid on other os i think) that we cannot read as we consider these characters illegal. Haven't seen that edge case in the wild yet, just discovered it trying to do roundtrip tests that contain a bunch of special characters in a partition value and thus show up in the path ...

I dismissed that however as something we have to live with, as concistency across platforms and providers is a major aim for object store, right?

@tustvold
Copy link
Contributor Author

tustvold commented Nov 2, 2023

I dismissed that however as something we have to live with, as concistency across platforms and providers is a major aim for object store, right?

This PR is simply fixing a bug related to interpreting URLs as well URL-encoded URLs 😅

The broader issue of path safety, and has been relaxed by #5020 with more context on the rationale in #5019. It can broadly be summarized as "the current restrictions were insufficient and added friction, and whilst everything would be better if people didn't have special characters in paths, we have to meet our users in the middle by encouraging against it but not forbidding it". I'd be interested to hear your thoughts on this, the RC has been cut, but it isn't too late if you feel very strongly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
object-store Object Store Interface
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ObjectStore parse_url Incorrectly Handles URLs with Spaces
3 participants