cURL / Mailing Lists / curl-users / Single Mail

curl-users

Re: Japanese characters in URL

From: Daniel Stenberg <daniel_at_haxx.se>
Date: Mon, 7 May 2001 08:21:37 +0200 (MET DST)

On Sat, 5 May 2001, Max WebMaster wrote:

> I want to download some pages with curl off a Japanese site. Via browser
> it is no problem (I have Japanese fonts installed) but curl naturally
> wants to send in the link the double-byte expression for each character.
> The web site does not understand it and refuses the connection.

RFC2396 (http://curl.haxx.se/rfc/rfc2396.txt) details how URLs are to be
written. Curl itself performs no magic on the input string but expects it to
be correct.

Now, I'm not really up to speed with how localized strings are supposed to
work in URLs (I figure they're UTF8'ed at some point to hide this fact to
lower layers).

You need to enter all special characters as '%[2-digit-code]' as the section
2.1 in RFC2396 describes:

   For original character sequences that contain non-ASCII characters,
   however, the situation is more difficult. Internet protocols that
   transmit octet sequences intended to represent character sequences
   are expected to provide some way of identifying the charset used, if
   there might be more than one [RFC2277]. However, there is currently
   no provision within the generic URI syntax to accomplish this
   identification. An individual URI scheme may require a single
   charset, define a default charset, or provide a way to indicate the
   charset used.

   It is expected that a systematic treatment of character encoding
   within URI will be developed as a future modification of this
   specification.

> What now? How do I encode Japanese character into an URL string?

If no one else around can provide info on this subject, I'd recommend that
you "spy" on the request sent by your browser and clone that string to use
with curl!

-- 
  Daniel Stenberg -- curl project maintainer -- http://curl.haxx.se/
Received on 2001-05-07