cURL / Mailing Lists / curl-library / Single Mail

curl-library

Re: patch for file:// encoding on Windows

From: Daniel Stenberg <daniel_at_haxx.se>
Date: Sun, 28 Sep 2014 10:44:03 +0200 (CEST)

On Sat, 27 Sep 2014, clinton_at_elemtech.com wrote:

>> It feels like something with a much larger scope than just file:// URLs
>> that I feel very scared of even considering. Please provide a proper
>> motivation for why we want this! URLs are not UTF-8, they're a sequence of
>> bytes/octets.
>
> The raw "sequence of bytes" idea doesn't work on Windows.

Sure it does. See below.

> From the current code page

That's not a very workable approach. What if you copy the URL from somewhere?
Assuming a "current code page" is asking for non-deterministic behaviors in
how the input is treated.

> Not all files are accessible this way when you have an NTFS file system that
> supports file names that can't be represented with the default 8 bit
> encoding.

It is a mistake to think that you should be able to feed in the "raw 8 bit
encoding" in the URL to start with. Also, a URL should work the same no matter
which OS you run where you enter it so treating it differently if you feed it
on windows than on non-windows is asking for trouble.

> This problem has been brought up before:

... and never properly dealt with in any of those situations.

"This problem" is at least two separate ones: 1 - what the URL should look
like to allow a unicode file name to get opened and 2 - have the actual file:
code understand and work with a file name provide according to (1).

So a question that would help me at least form my opinion on this better:
given a unicode file name example like "ŕéüñíöñ", how does a file: URL that
works with IE, Firefox and Chrome look like? I don't mean what it looks like
in the URL bar, but if you copy it and paste it somewhere, what does that look
like?

In both Firefox and Chrome on Linux, such a file name in my home directory
uses this URL:

   file:///home/daniel/%C5%95%C3%A9%C3%BC%C3%B1%C3%AD%C3%B6%C3%B1

Percent-encoded UTF-8 it looks like to me.

No "current code page" necessary. A single defined way how to decode it.

-- 
  / daniel.haxx.se

-------------------------------------------------------------------
List admin: http://cool.haxx.se/list/listinfo/curl-library
Etiquette: http://curl.haxx.se/mail/etiquette.html
Received on 2014-09-28