curl / Mailing Lists / curl-library / Single Mail
Buy commercial curl support from WolfSSL. We help you work out your issues, debug your libcurl applications, use the API, port to new platforms, add new features and more. With a team lead by the curl founder himself.

Re: A canonical URL host name dilemma

From: Daniel Stenberg via curl-library <curl-library_at_lists.haxx.se>
Date: Sat, 9 Oct 2021 13:04:34 +0200 (CEST)

On Sat, 9 Oct 2021, Henrik Holst wrote:

Thanks for your thoughts!

> #D would most likely be the preferred way if it's possible, however it
> sounds both brittle and the "works differently if not built with IDN
> support" gives it the kind of "it depends" quality that one perhaps not want
> from this API?

Yeah, I think that would be a very unfortuante property of an API for a format
you'd think would be rather fixed and established.

> In essence I think it boils down to the use case of extracting the URL,
> since it was given by the caller in the first place so it should already
> know the URL and most likely also in a format preferred by the user
> (thinking that the caller got the URL from the user in some way or form so
> it should most likely be in the preferred form already), but then there is
> of course the need to see it due to a redirect.

Since it is an API, we don't know all the use cases and for example the API
allows users to set the host name independently from the original URL.

A user can for example parse "https://curl.se", replace the host name of that
URL with "räksmörgås.se" and then extract the newly constructed URL...

Another use case could be getting two separate URLs from a user, then
canonicalizing them with the URL API (set then get) to compare them and see if
they are actually the same.

In either of those cases the exact method isn't important until the
application decides to show the retrieved URL to a user, as then it might want
the "beautified" version of the host name.

> So perhaps the better solution would be to always do #B and then also have
> #E - a new option for extracting "display friendly url" that tries to do #D
> id built with IDN but will fallback to #B or #A if not, since it will be
> used for display only then some inconsistency should be more tolerable.

I like this idea. It will push the decision making to the API user rather than
doing it ourself.

The question is perhaps then if that new option should rather be A) "don't URL
encode host names" or B) "don't URL encode host names that are valid IDN
names".

Making it A) is way simpler and make a more predictable behavior.

-- 
  / daniel.haxx.se
  | Commercial curl support up to 24x7 is available!
  | Private help, bug fixes, support, ports, new features
  | https://curl.se/support.html


-- 
Unsubscribe: https://lists.haxx.se/listinfo/curl-library
Etiquette:   https://curl.haxx.se/mail/etiquette.html
Received on 2021-10-09