curl / Mailing Lists / curl-library / Single Mail

curl-library

Re: encoding expectations

From: Daniel Stenberg <daniel_at_haxx.se>
Date: Thu, 19 Apr 2018 10:14:50 +0200 (CEST)

On Wed, 18 Apr 2018, Michael Kilburn wrote:

> Apparently certain easy options expect data to be encoded in certain way.

Yes and no. The encoding situation is... weird. But it is also different for
different options, so we need to look at them separately.

> For example, it seems that CURLOPT_URL expects a string that is encoded
> according to current C locale(?). And if user encoding doesn't support
> certain symbol -- curl can't talk to a server who's name contains that
> symbol.

It's even worse than so. First, let me mention my recent talk about "the sorry
state of URLs" from curl up 2018:
https://curl.haxx.se/video/curlup-2018/2018-04-15_Daniel-Stenberg-urls.webm

URLs, by curl's definition, and by the original RFC3986 definition is ASCII
only. You can use IDN domain names (if curl was built with that feature
enabled), and then curl will use the currently set locale when trying to
convert it to puny code.

Outside of the domain name, anything that isn't ASCII is not RFC 3986
compliant. But curl uses a forgiving approach as you might want to sent
rubbish to your server so as long as curl can figure out the individual parts
it will pass on what you passed in. So UTF-8 or any other encoding will be
accepted and passed on.

If you then happen to send in a URL with some funny encoding in the same funny
encoding the server expects, it works. If you use another encoding client side
than the server wants, it doesn't work. This can be seen in cases where users
pass in for example unicode letters from a windows command line and a linux
command line, but it only works from one of them - because they use different
encodings and curl doesn't recode the data.

> It is no problem on Linux where "user encoding" apparently is UTF-8, but on
> Windows it is a problem.

This, presumably, because your server end likes the encoding passed in on
Linux but not the one used on Windows.

> Questions: - what encoding libcurl assumes for data passed in
> CURLOPT_URL/etc? (on Windows/Linux/etc) - what other easy options make use
> of "user encoding"?

I hope I answered enough to make you start to understand that there's really
no right answer here other than: don't use non-ascii in URLs. Encode them
yourself properly before you pass them to curl.

The current state of URLs is disastrous.

> CURLOPT_HTTPHEADER?

I hope you don't try to send funny encodings in HTTP headers. That won't work
very reliably and you better stick to ascii there as well.

> I am specifically interested in information about rmt_lib_curl v7.51.0 (
> https://www.nuget.org/packages/rmt_curl_winssl/)

I have no insights into what that particular build is (it is not an official
product from this project), but it seems it hasn't been updated since the
release date so I feel I should mention this page:
https://curl.haxx.se/docs/vuln-7.51.0.html

> which doesn't look very good.

I'm always interested in feedback, suggestions and ideas on how to improve
libcurl and the subject of URLs and how to deal with them and not deal with
them is a subject I struggle with frequently. Unfortunately, I have no easy
way out from this icky situation.

-- 
  / daniel.haxx.se
-------------------------------------------------------------------
Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library
Etiquette:   https://curl.haxx.se/mail/etiquette.html
Received on 2018-04-19