cURL / Mailing Lists / curl-library / Single Mail

curl-library

Re: Bug: cURL should check Content-Length before assuming the server does not support resume

From: Daniel Stenberg <daniel_at_haxx.se>
Date: Mon, 25 Aug 2014 12:00:39 +0200 (CEST)

On Fri, 22 Aug 2014, Robert Xiao wrote:

> cURL may erroneously produce this error:
>
>> curl: (33) HTTP server doesn't seem to support byte ranges. Cannot resume.
>
> when asked to continue downloading a file that is already completed. This
> happens with servers that do not send Content-Range headers when replying
> with the HTTP 416 error ("Requested Range Not Satisfiable"), even if the
> server ordinarily supports Content-Range. In particular, when attempting to
> continue a completed download, cURL will request a byte range past the end
> of the file, which triggers a 416 error.
>
> Test URL which triggers this bug:
> http://www.ngdc.noaa.gov/mgg/global/relief/ETOPO1/data/ice_surface/grid_registered/georeferenced_tiff/ETOPO1_Ice_g_geotiff.zip
>
> Note that the omission of Content-Range in the 416 response is permitted by
> RFC 2616 (which notes only that that the response SHOULD include a
> Content-Range header).

I agree that the 416 error should probably override the error message here.
The problem is not that the server doesn't support range, it is that the range
wasn't to the server's satisfaction.

> aria2c, another download utility, handles this by performing an ordinary GET
> request and checking the Content-Length header, instead of relying on the
> server to send Content-Range. I think a similar approach (maybe using HEAD
> instead of GET to avoid overhead) would work for cURL.

That's quite a drastic change to what libcurl does now and I would prefer if
we do not mix in what a command line tool can or cannot do for this particular
use case. Let's discuss what libcurl should do.

You ask libcurl to resume a transfer from particular URL at a specific byte
offset. You do that by requesting a byte range from the HTTP server. The byte
range cannot be delivered by the server, and libcurl detects that and returns
an error.

You seem to propose a solution that would force libcurl to first make a
non-range request to the URL just to figure what the size/state of the server
is and then make a subsequent range request? To me, that feels like quite a
long away away from the standard libcurl behaviors and concepts so I'm
hesitant to go to such extremes. Especially since I'm quite sure it would
break some applications, HEAD is not working widely enough and doing a GET
would basically force us to break the connection in case the contents is
bigger than just very small.

I wouldn't mind seeing curl the command line tool do something like that, as
it has the full freedom to do multiple requests just or whatever we think is
necessary to do what's asked.

Or did I misunderstand your problem and your proposition?

> Bug reported on StackOverflow: http://stackoverflow.com/q/23586214/1204143

I ignore all bugs reported on stackoverflow and so should you. We do ourself a
BIG disservice if we start treating bugs everywhere instead of within the
project itself.

-- 
  / daniel.haxx.se
-------------------------------------------------------------------
List admin: http://cool.haxx.se/list/listinfo/curl-library
Etiquette:  http://curl.haxx.se/mail/etiquette.html
Received on 2014-08-25