cURL / Mailing Lists / curl-library / Single Mail

curl-library

Re: [PATCH/RFC] More flexible output filename (based on HTTP reply)

From: Leon Winter <lwi_at_ring0.de>
Date: Mon, 16 Mar 2015 18:55:53 +0100

Hi,

> My first gut reaction: I don't think we are "allowed" to change what
> file name -O uses unless you also use -J. I mean, your logic and work
> so far is great but I think -O should get the file name from the
> initial URL while you're allowed or even encouraged to figure out the
> "right" name with -J.

while "-J" parameter is described as "based on the header" it is also
then said the information is explicitly the "Content-Disposition"
header field. So we could "enhance" the -J parameter functionality of
course. Or maybe one even want to add a new option to enable this
"follow redirection file name" behavior. This yeah, I am with you. If
at all this belongs to the "magic" -J parameter.
>
> This because with -J a user cannot fully know which file name that
> will be used but with -O (without -J) the URL clearly tells which
> file name that will end up on the disk after a succesful transfer and
> there is bound to exist many scripts out there relying on its
> behavior. Even with redirects.
>
> Or am I being unreasonable?

Yes and no. If the use wants a predictable filename he would use "-o".
"-O" is just for lazy people who for some reason do not want to grep
the "filename" part out of the URL first.
Also don't we have -w exactly for this?
FILENAME=$(curl -LOJ -w "%{filename_effective}")

However I cannot say how many people use the -O feature like a grep. I
would always assume a feature called "remote name" would figure out the
current remote name no matter what. Also as I mentioned earlier,
"effective" filename/URL are not corresponding after redirects which is
probably also not what one would expect.

> I'm pretty sure we should just document this as a known limitation
> somewhere (probably documented nearby -J) as I believe chasing after
> a solution to this is going to lure us into dark and scary places of
> guesses and assumptions.

Well, it is in the "known bug" file. But an automagic "do the right
thing" behavior would probably be nice after all (maybe then with a
new parameter).
Just from a semantics perspective a use would do:
$ curl <url> ^C
$ curl -C- <url>

The use would expect the download to continue. Short URLs are very
common these days. Also the "Content-Disposition" header is heavily
used. So why should curl fail for this use case?
Of course knowing the HTTP standard this is not as easy as curl makes
it look like, but isn't hiding the complexity the whole point of the
library/tool? ;)
 
> I think that's a decent way. You can also just opt to parse incoming
> headers and check the HTTP response code for relevant 3xx codes.

I had problems gathering relevant information from the CURL handle
inside the header callback function. Some information are only defined
after the callback (like effective URL).

> > Also while testing I just noticed there are problems when doing a
> > HEAD request (curl -I):
>
> > I am not sure why there is an attempt to write to the file when I
> > am just doing a HEAD request though.
>
> Hm. I'll try to give that a closer look soon.

I am not sure if I was clear enough. The error only occurs with the
patch applied. For some reason curl tries to call the write function
before the header with the filename would be received. Then it bails
out with "empty remote name". I have yet to debug this.

Regards,
Leon
-------------------------------------------------------------------
List admin: http://cool.haxx.se/list/listinfo/curl-library
Etiquette: http://curl.haxx.se/mail/etiquette.html
Received on 2015-03-16