cURL / Mailing Lists / curl-library / Single Mail

curl-library

Re: Cookies and URL legal characters

From: Daniel Stenberg <daniel_at_haxx.se>
Date: Tue, 18 Mar 2003 21:31:50 +0100 (CET)

On Tue, 18 Mar 2003, Sigal Algranaty wrote:

> Does anyone know what is the group of legal characters for cookies in HTTP
> protocol?

This is not as easy to answer as you might wish.

First out (in my list, not when ordered in time), there is a HTTP spec
(RFC26216 section 4.2 [1]) saying what response headers should look like and
what they may contain.

Then, there's an original cookie spec [2] written by Netscape once upon the
time, that defined how cookies were supposed to work.

Later, they wrote serious RFC documents describing how cookies should work
when done right, RFC 2109 [3] and RFC2965 [4]. These latter two have never
been read or at least not used much, judging from how cookies work on most
sites still.

The main problem, if you ask me, is however that most cookie-using
servers/sites are controlled by scripts, programs and applications that are
written by humans. And these humans are not spec-obeying people. These ones
don't follow any of these documents very strictly. They send and expect
cookies the way the main browsers send and receive them.

So, in order to support cookies properly. We need to study the specs,
understand how ordinary cookie-using sites work, see how the browsers do, mix
everything and stir slowly before we put everything in the oven and bake the
libcurl cookie support. Out comes something that usually work.

> Is there any functionality in cUrl that adapts strings to be a legal
> cookie?

No. libcurl doesn't adapt strings to cookies. When told to, it reads certain
headers and tries to parse and understand the cookies claimed to be there.

The code for this is all in lib/cookie.c. It's pretty straight-forward.

> Same question for URLs. I found out that spaces in URL, for example, should
> be replaced by %20.

No. Strictly speaking you cannot have spaces in URLs, so you can't find any
in a URL since then it isn't a URL! But I guess that's basicly what you
meant.

> Does anyone know is are the character set that should be replaced by its
> ascii code?

It is ordinary URL-encoding. You convert the byte into the hexadecimal
version with a prefixed '%'. So an ASCII space is 32 decimal, 20 hex so it
should be %20. The letter '+' is 43 decimal and 2B hex, so putting that in a
URL could be made with %2B.

I hope this helped.

[1] = http://curl.haxx.se/rfc/rfc2616.txt
[2] = http://curl.haxx.se/rfc/cookie_spec.html
[3] = http://curl.haxx.se/rfc/rfc2109.txt
[4] = http://curl.haxx.se/rfc/rfc2965.txt

-- 
 Daniel Stenberg -- curl, cURL, Curl, CURL. Groks URLs.
-------------------------------------------------------
This SF.net email is sponsored by: Does your code think in ink? 
You could win a Tablet PC. Get a free Tablet PC hat just for playing. 
What are you waiting for?
http://ads.sourceforge.net/cgi-bin/redirect.pl?micr5043en
Received on 2003-03-18