cURL / Mailing Lists / curl-users / Single Mail

curl-users

Re: Curl --data-urlencode posts broken non-English characters

From: Irvin Jacob <nivribocaj_at_gmail.com>
Date: Wed, 3 Feb 2016 12:40:52 +0800

>
> It's quite possible that the server doesn't support UTF-8, but assumes the
>
> input is always ISO-8859-1 or something. Are you sure the server can even
>> do
>
> what you want it to? Does it work if you use a browser? If so, then you
>> can
>
> see what headers and data the browser sends in this case and make curl
>> match.
>
>
I have isolated the issue. It seems that the "@" functionality for options
-d/-data and --data-urlencode always interprets Unicode characters with
decimal range 128 and higher as UTF-8 even if I set the Content-Type:
application/x-www-form-urlencoded charset header that Curl sends to
something else. When I try using -d "Thére Àre sôme spëcial charâcters ïn
thìs têxt" or store the string in an environment variable, the text comes
out perfectly fine but if I store the special characters in a text file for
curl to pick up via "@", fancy rubbish characters always come out.

To answer your question, the server I am trying to send POST requests to
supports ISO-8859-1 and non-ISO-8859-1 characters get encoded to numeric
HTML entities before they are sent. I confirmed it by viewing the headers
that Firefox sends.

I am almost convinced this is a bug or an oversight on the part of the
developers of Curl for a long time that nobody cared about. I've tried
posting the special characters using Curl from Powershell on the same
server/website and the issue is also present even if I set the url encoding
charset header of Curl to ISO-8859-1 to match the server's encoding. Curl
should have support for the conversion of special Unicode characters
(decimal range 128 and up) to numeric HTML entities via the "@file"
interpretation, leave alone ASCII characters ranging from 0-127, and
URL-encode ONLY safe non-printing characters like horizontal tab, newline,
carriage return etc. This way Curl has universal support for all websites,
regardless of their native character encodings.

-- 
*Irvin*
<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
This
email has been sent from a virus-free computer protected by Avast.
www.avast.com
<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
<#DDB4FAA8-2DD7-40BB-A1B8-4E2AA1F9FDF2>

-------------------------------------------------------------------
List admin: http://cool.haxx.se/list/listinfo/curl-users
FAQ: http://curl.haxx.se/docs/faq.html
Etiquette: http://curl.haxx.se/mail/etiquette.html
Received on 2016-02-03