-
Notifications
You must be signed in to change notification settings - Fork 139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add URL.from(object)
constructor?
#782
Comments
+1, this would be awesome, and would avoid the arbitrary roadblocks that make the "make a dummy URL and Object.assign" approach frictionful. |
This is a duplicate of #354. I'm happy to leave this open for a bit to see if someone can resolve the challenges pointed out in that issue though. |
@annevk @ljharb I have a couple ideas to fit it in without duplicating too much existing logic:
Do you follow? And if so, thoughts? |
I'm not sure I follow… if you mean, select properties in a specific order to get to a mutable instance, and then mutate it, that seems fine to me. |
@ljharb More like this:
The difference between option 1 and option 2 is this: Option 1: each "parse" step work as follows:
Username and password might be a bit tricky, as they don't cleanly map to a specific parsing state. Option 2:
Does this help? |
I actually assume the easiest method would be to just build a URL string from the parts, then pass it to the URL constructor. That requires the least corner-casing or refactoring, as it would use the existing spec machinery exactly as-is. It's a little bit wasteful, but what matters is just the author-observable behavior; an impl could, if they wished, implement the parsing more directly so long as it matched behavior. This wouldn't reduce the testing effort for the feature itself, but it would substantially reduce the unrelated testing effort caused by spec refactoring to accommodate this. In fact, it should reduce to zero. Thinking like: dictionary URLFromInit {
required USVString protocol;
USVString hostname;
USVString username;
USVString password;
USVString port;
(USVString or sequence<USVString>) pathname;
(USVString or URLSearchParams) search;
USVString hash;
};
partial interface URL {
static URL from(URLFromInit init);
}; Then the algo would be (stealing directly from the URL serializer algo):
Sprinkle some appropriate encode steps in this and you're golden. I skipped them for brevity and because figuring out precisely where and what kind of escaping is needed is more work than I want to do for a proposal. |
That sounds like a solid approach to me, allowing for optimization as needed/discovered. |
Overall, I'm in favour of adding this and would like us to discuss and resolve any issues so it can be implemented in a consistent way by JS and other URL libraries. Programmers expect to parse a URL string on the web or in their native applications and to receive the same result, and that is why developers are creating libraries which implement the WHATWG URL standard's parser. I think developers have the same expectation when constructing a URL from a set of components, and so it is worth producing a specification for how that operation should behave and collaborating to ensure the implementation is robust and accounts for the various edge-cases.
I think we do need to think about it. Anne mentioned it in the previous issue before it was closed, so it seems like it was the blocking question:
I think I can answer this, and I don't think it's actually so hard. Each component's percent-encode set already includes the delimiters of later components (e.g. the query set includes That means if we have some arbitrary string and encode it using, say, the path encode-set, it will never contain a naked I wonder if this API should have an option which disables additional percent-escaping and fails instead. For instance, it might be important to me that: URL({ ..., pathname = x }).pathname == x
The hostname would need to go through the host parser (which depends on the scheme and may fail), and the port would need to be validated to ensure it is a number. The other components are basically opaque so there's no validation to do.
The path will need to be simplified. For instance, it might contain For instance: // If we're only given a string, "AC/DC" looks like 2 path segments.
// We have no way to tell the difference.
URL({ ..., pathname = "/bands/AC/DC" }).pathname == "/bands/AC/DC"
// But if the user tells us "AC/DC" should be 1 segment, we'd have to escape it.
URL({ ..., pathname = ["bands", "AC/DC"] }).pathname == "/bands/AC%2FDC" It's not a significant problem (we'd just need to add U+002F This issue hasn't come up until now because the existing parser splits the string on As I mentioned, this would be the first part of the JS URL API to expose the path as a collection of segments. In my survey of URL APIs, I found that surprisingly few libraries expose such a view. Of those that do, So it's somewhat debatable what the following should return: URL({ ..., pathname = ["foo", "..", "bar"] }).pathname // "/foo/bar" or "/bar"? Note that we cannot escape |
The safest thing to do would be just to throw an error and force users to decide what to do. Alternatively there could be an option for some possible builtin behaviours (throw being default, as it by far the safest): enum URLFromRelativePathSegmentBehaviour {
"throw",
"omit", // ["foo", "..", "bar"] → "/foo/bar"
"resolve", // ["foo", "..", "bar"] → "/bar"
}
dictionary URLFromOptions {
URLFromRelativePathSegmentBehaviour relativePathSegment = "throw";
} |
It could be simplified, but it doesn't need simplified. Servers normally handle this through one of four ways:
1 also implies there must be a way to allow paths to be passed in raw as well, without modification. |
Currently URLs can only be created from a string, and then modified after that. If you have an object of URL parts you have to make a dummy URL and then modify each part in turn. However, the various bits of URLs have some special rules preventing arbitrary modification; for instance when setting
protocol
it runs the URL parser in a special way, and that restricts the protocol from being altered in certain ways. (See the "scheme state" state, step 2.1.)I'm happy to assume that those restrictions are necessary to ensure a consistent data model for some reason, but it does mean that if you're wanting to construct a URL that you have entirely in parts, then your choice of dummy URL to initialize the URL object with affects whether or not you'll be able to create your desired final URL, even if a URL made of said parts would be perfectly valid and correctly parsable if originally presented in string form.
It would be helpful to have a way to construct a URL directly from parts, to avoid issues like this. Suggestion: a static
URL.from(object)
method, that takes a dictionary matching the modifiable bits of the URL class and returns a fresh URL set up appropriately.This probably isn't quite trivial, since right now the only way to create a URL is by invoking the parser and then invoking the parser more for each bit, so there's a degree of statefulness in this, but hiding this complexity from authors seems worthwhile.
/cc @bakkot @ljharb, who asked about this in the WHATWG chat room
The text was updated successfully, but these errors were encountered: