cURL / Mailing Lists / curl-library / Single Mail

curl-library

Re: libcurl - windows / unicode filenames support...

From: Ray Satiro via curl-library <curl-library_at_cool.haxx.se>
Date: Fri, 5 Aug 2016 15:27:47 -0400

On 8/5/2016 6:16 AM, Sergei Nikulov wrote:
> 2016-08-05 12:11 GMT+03:00 Rod Widdowson<rdw_at_steadingsoftware.com>:
>> >Aside, but curious minds need to know.
>> >
>> >As a newcomer here - can someone help me what "Unicode for windows" means? I have to assume it is in URL handling, not files? The word UTF8 has to be the give-away since UTF8 is a pretty alien concept for windows at the k-mode interface (where I mostly hang out).
> +1
>
> UTF-16 (wide character) encoding, which is the most common encoding of
> Unicode and the one used for native Unicode encoding on Windows
> operating systems.
>
> So I also wondering how it can encode UTF-8 in file names.
>

Supporting Unicode in Windows has been discussed in #345 [1]. While I
acknowledge UTF-16 is the native choice I thought it would be easier to
pass around UTF-8 in the library, that way we wouldn't have to implement
a bunch of sister libcurl functions for wide characters. The problem
with that is because UTF-8 is not properly supported as a locale (except
maybe cygwin) by the underlying MS C runtime (CRT) it won't do the
conversions automatically. For example before we call a function like
fopen with a UTF-8 filename we'd have to convert to UTF-16 stored in
wide chars and instead call _wfopen [2] since there is no way to set the
locale to UTF-8. We'd have to handle that for a lot of CRT functions
basically making a layer over the CRT and doing something also painful.
It seems like either way we'd have to create a bunch of functions, but I
suspected the latter would be easier to maintain since they're
essentially just wrappers. But how do we know in many of our library
functions whether a string we're passed is UTF-8 or just ANSI? That's
another problem. And another one is displaying Unicode characters in the
console, which didn't always work well, although with Consolas it has
gotten better.

A few people have shown interest in this but it waned. Make no mistake
it will take a lot of time to implement properly in a way that is
maintainable, which is very important. The issues have essentially been
abandoned because nobody has the time, but feel free to resurrect them
if you want to do the work. Going forward, I think it is important that
we all have some consensus on a design before any other work is put in.
It could be done in a way that is piecemeal, like only for filenames
first, but we should agree on some sort of ultimate plan first.

[1]: https://github.com/curl/curl/issues/345
[2]: https://msdn.microsoft.com/en-us/library/yeby3zcb.aspx

-------------------------------------------------------------------
List admin: https://cool.haxx.se/list/listinfo/curl-library
Etiquette: https://curl.haxx.se/mail/etiquette.html
Received on 2016-08-05