cURL / Mailing Lists / curl-library / Single Mail

curl-library

[PATCH/RFC] More flexible output filename (based on HTTP reply)

From: Leon Winter <winter-curl_at_bfw-online.de>
Date: Mon, 16 Mar 2015 14:00:54 +0100

Hi,

so I recently ran into the known bug #81 [0] and in certain use cases
like the usage of some large download sites / one-click hosters it
happens often.
The basic problem is that curl (the binary) opens the file very early
without any knowledge of the server response. This also leads to another
bug/inconsistency. The effective URL does not need to match the
effective filename since the filename is statically computed from the
input URL and not from the effective URL (which can be different due to
3xx HTTP redirections).
This patch them attempts to delay the opening of the output file until
actually needed (when writing to the file).
However this patch is still work-in-progress as it does not solve the
key problem of #81 which is to actually resume the download. The patch
however gets the "right" file, yet does not resume the download. This is
because when we do the initial first request we do not know the output
file name and thus cannot determine the filesize of our local file.
Later when we get to know the "dynamic" filename (result of either
redirect or Content-Disposition filename), we are already reading the
response (since we got the information from there).
Now to solve this problem one could check whether the file exists. If
not, one can happily continue. If it does exist however, one would need
to abort the connection and launch a new request with byte-range.
In order to prevent a possible connection abortion like this one could
do a HEAD request first just to get the filename.

Regarding the code in the patch I am not to sure about my check for the
"Location" header response field or whether one can perform this check
in a better way.

Since I developed this patch against curl-7.38 (version in debian
testing/unstable) it might not compile against git [I still attached a
applicable patch for the git version though].

Also while testing I just noticed there are problems when doing a HEAD
request (curl -I):
Warning: Remote filename has no length!
* Failed writing body (0 != 33)
* Closing connection 0
curl: (23) Failed writing body (0 != 33)

I am not sure why there is an attempt to write to the file when I am
just doing a HEAD request though.

My test case (for location redirect):
curl -v -LOC- http://goo.gl/WQcAFw

Current behavior:
Saves into WQcAFw

Patch behavior:
Saves into ubuntu-14.04.2-server-amd64.iso

Let me know you think,
Leon

[0] http://sourceforge.net/p/curl/bugs/1169/

-------------------------------------------------------------------
List admin: http://cool.haxx.se/list/listinfo/curl-library
Etiquette: http://curl.haxx.se/mail/etiquette.html

Received on 2015-03-16