cURL / Mailing Lists / curl-library / Single Mail

curl-library

Re: End of Line Handling (affects ALL platforms)

From: David McCreedy <mccreedytpf_at_msn.com>
Date: Wed, 12 Apr 2006 18:47:28 +0000

Here's my patch for end-of-line handling for FTP in ASCII mode.
If the mode is ASCII/text, the changes will automatically convert LFs to
CRLFs on FTP uploads and automatically convert CRLFs to LFs on FTP
downloads.

This affects ALL platforms, per Daniel Stenberg's request.
(I've included the original thread below.)

My code in urldata.h #defines CURL_DO_LINEEND_CONV unless WIN32 is defined.
I'm guessing other platforms may need to be excluded - definitely any that
use a two byte CRLF sequence for native files.
I'm not sure how Mac OS will be affected. I've read it uses CRs for
end-of-line marks but don't know if a C language '\n' equates to a CR (0x0d)
or a NL (0x0a) on Macs.

I would be more comfortable with an opt-in for those that platforms that
want automatic FTP lineend conversion instead of an opt-out, but I'll defer
to Daniel.

There are two items to point out in the code:

1) Almost all FTP servers I've tried give the same SIZE regardless of the
mode (bin vs ascii) of the download.
(Some servers do go to the extra trouble of calculating the SIZE accurately
for ascii mode transfers.)
For example, let's use an FTP server on a platform that uses LF for the
end-of-line marker.
If the file is 50 bytes long (natively) with 10 lineends, the SIZE will
usually be given as 50 bytes.
That's correct for bin mode.
But for ASCII mode those LFs become CRLFs and we get 60 bytes from the
server.
So we get a "partial" file with -10 bytes left to read.
I've tracked the number of CRLFs Curl converted so we can try to determine
if the size difference was really a partial file or just lineend related.
This could be wrong for some errors conditions but the alternative is to
treat it as a partial file and terminate the connection.
That certainly is undesirable.
And I think it outweighs the risks of mistaking a partial file as OK).

2) Out of necessity, and for consistency, my code converts bare CRs to LFs
in ascii mode FTP downloads.
A block ending with a CR might be a final, bare CR or just the CR portion of
a CRLF sequence that is split between two blocks (a very real possibility).
So the best I came up with was to always convert bare CRs to LFs, with code
to keep split CRLF sequences from erroneously becoming LFLF sequences.
The alternatives were uglier.

-David

>From: Daniel Stenberg <daniel_at_haxx.se>
>Reply-To: libcurl development <curl-library_at_cool.haxx.se>
>To: libcurl development <curl-library_at_cool.haxx.se>
>Subject: Re: End of Line Handling
>Date: Fri, 24 Mar 2006 08:57:41 +0100 (CET)
>
>On Fri, 24 Mar 2006, David McCreedy wrote:
>
>> In accordance with the NVT standard, the <CRLF> sequence
>> should be used where necessary to denote the end of a line
>> of text. (See the discussion of file structure at the end
>> of the Section on Data Representation and Storage.)"
>
>And in order to "denote the end of a line" we need a way to actually find
>it... The current code has no notion of "lines" when it sends or receives
>files.
>
>>1) On uploads (puts) libcurl makes line end conversion optional based on
>>the data->set.crlf flag (in transfer.c's Curl_readwrite function). If
>>that flag is set, all LFs in the data being sent are converted to CRLFs.
>>Should transfer.c be change so that ASCII-mode FTP transfers
>>unconditionally convert to CRLF? If so, we'll have to identify which
>>platforms internally use CRLF so we leave them alone (Windows only? There
>>are probably others).
>
>I find this a very fuzzy area and I certainly don't know how to behave
>here. I'm the kind of person who *NEVER* actually used FTP ASCII mode
>intentionally (but many times unintentionally - causing me grief), and then
>I truly mean never. And I've used ftp since the early 1990s.
>
>I figure we could do some tests with a reliably compliant client to see
>what it does for a few different CRLF/CR/LF combos when sent between for
>example a windows and a unix box.
>
>But, instead of enlarging this problem for you and putting this unwanted
>burden on you I think you can focus on getting this to work for your EBCDIC
>case and then let people with other platforms try out and possibly add
>corrections to this later on.
>
>Also, the crlf option of today (before your work) is really just a
>work-around for the "real thing" so we should reallt reconsider if that
>option is needed anymore when things are done "the right way". Also, custom
>crlf replacing etc could easily be done by an app in whatever way it wants.
>There's no real reason for libcurl to provide odd features like this one
>that isn't truly protocol related. I also doubt anyone actually use this
>crlf option these days.
>
>>2) Should Curl_readwrite be changed to leave existing CRLF sequences
>>alone? That's the friendly thing to do but it deviates from a strict
>>interpretation of RFC959.
>
>I figure a CRLF in unix land is a line ending with some cruft just before
>it, and in Windows land it is a "normal" line ending.
>
>>Take the case of a Windows file that was originally transferred as binary
>>to a system like Unix. That file will already have CRLF line ends, so
>>transfer.c's existing code converts the CRLFs to CRCRLFs when the file is
>>sent (if set.crlf is on). I've seen this quite a bit (the annoying ^M).
>
>But to fix that the unix side would need to treat the CR as part of the
>line end, which it by no means actually is. It would also make the CRLF get
>translated to plain LF if you send such a file unix to unix.
>
>>I tried out the scenario with various FTP servers and some leave CRLFs
>>alone when sending data while others change them to CRCRLFs. What should
>>libcurl do?
>
>When trying to decide weather to be strict or to be fancy, I think the
>better choice is to start as strict since it seems to be the easiest and
>more reliable route here.
>
>>And should that be done across the board or just for FTP?
>
>FTP (and FTPS of course) is the only protocol we support that has a notion
>of ASCII and system-specific line endings. HTTP for example explicitly says
>how line endings should be encoded (CRLF).
>
>>3) On the inbound (get) side, my code in Curl_readwrite will do the
>>reverse: turn CRLFs into LFs.
>>Many of the same questions apply:
>>Do it for some or all platforms?
>
>I doubt a unix user doing unix to unix ASCII transfer would expect CRLF to
>get translated into plain LF... But I'm certainly not sure.
>
>>Do it unconditionally or conditionally based on a flag (data->set.crlf or
>>a new one)?
>
>I think we could first pick a default that we think the RFC mandates and go
>with that, and only if we find out we get users arguing for both camps we
>should add an option.
>
>>Do it for FTP only or everyone going through transfer.c's Curl_readwrite.
>
>Only for line ending-converting protocols, but FTP is the only such one we
>use.
>
>--
> Commercial curl and libcurl Technical Support: http://haxx.se/curl.html

Received on 2006-04-12