cURL / Mailing Lists / curl-library / Single Mail

curl-library

Re: End of Line Handling

From: Daniel Stenberg <daniel_at_haxx.se>
Date: Fri, 24 Mar 2006 08:57:41 +0100 (CET)

On Fri, 24 Mar 2006, David McCreedy wrote:

> In accordance with the NVT standard, the <CRLF> sequence
> should be used where necessary to denote the end of a line
> of text. (See the discussion of file structure at the end
> of the Section on Data Representation and Storage.)"

And in order to "denote the end of a line" we need a way to actually find
it... The current code has no notion of "lines" when it sends or receives
files.

> 1) On uploads (puts) libcurl makes line end conversion optional based on the
> data->set.crlf flag (in transfer.c's Curl_readwrite function). If that flag
> is set, all LFs in the data being sent are converted to CRLFs. Should
> transfer.c be change so that ASCII-mode FTP transfers unconditionally
> convert to CRLF? If so, we'll have to identify which platforms internally
> use CRLF so we leave them alone (Windows only? There are probably others).

I find this a very fuzzy area and I certainly don't know how to behave here.
I'm the kind of person who *NEVER* actually used FTP ASCII mode intentionally
(but many times unintentionally - causing me grief), and then I truly mean
never. And I've used ftp since the early 1990s.

I figure we could do some tests with a reliably compliant client to see what
it does for a few different CRLF/CR/LF combos when sent between for example a
windows and a unix box.

But, instead of enlarging this problem for you and putting this unwanted
burden on you I think you can focus on getting this to work for your EBCDIC
case and then let people with other platforms try out and possibly add
corrections to this later on.

Also, the crlf option of today (before your work) is really just a work-around
for the "real thing" so we should reallt reconsider if that option is needed
anymore when things are done "the right way". Also, custom crlf replacing etc
could easily be done by an app in whatever way it wants. There's no real
reason for libcurl to provide odd features like this one that isn't truly
protocol related. I also doubt anyone actually use this crlf option these
days.

> 2) Should Curl_readwrite be changed to leave existing CRLF sequences alone?
> That's the friendly thing to do but it deviates from a strict interpretation
> of RFC959.

I figure a CRLF in unix land is a line ending with some cruft just before it,
and in Windows land it is a "normal" line ending.

> Take the case of a Windows file that was originally transferred as binary to
> a system like Unix. That file will already have CRLF line ends, so
> transfer.c's existing code converts the CRLFs to CRCRLFs when the file is
> sent (if set.crlf is on). I've seen this quite a bit (the annoying ^M).

But to fix that the unix side would need to treat the CR as part of the line
end, which it by no means actually is. It would also make the CRLF get
translated to plain LF if you send such a file unix to unix.

> I tried out the scenario with various FTP servers and some leave CRLFs alone
> when sending data while others change them to CRCRLFs. What should libcurl
> do?

When trying to decide weather to be strict or to be fancy, I think the better
choice is to start as strict since it seems to be the easiest and more
reliable route here.

> And should that be done across the board or just for FTP?

FTP (and FTPS of course) is the only protocol we support that has a notion of
ASCII and system-specific line endings. HTTP for example explicitly says how
line endings should be encoded (CRLF).

> 3) On the inbound (get) side, my code in Curl_readwrite will do the reverse:
> turn CRLFs into LFs.
> Many of the same questions apply:
> Do it for some or all platforms?

I doubt a unix user doing unix to unix ASCII transfer would expect CRLF to get
translated into plain LF... But I'm certainly not sure.

> Do it unconditionally or conditionally based on a flag (data->set.crlf or a
> new one)?

I think we could first pick a default that we think the RFC mandates and go
with that, and only if we find out we get users arguing for both camps we
should add an option.

> Do it for FTP only or everyone going through transfer.c's Curl_readwrite.

Only for line ending-converting protocols, but FTP is the only such one we
use.

-- 
  Commercial curl and libcurl Technical Support: http://haxx.se/curl.html
Received on 2006-03-24