curl / Mailing Lists / curl-users / Single Mail

curl-users

Re: Proposal to change default handling for content-encoded responses

From: Jan Stary <hans_at_stare.cz>
Date: Wed, 31 Oct 2018 12:11:46 +0100

(Please wrap your lines; these long paragrahps make it
uneasy to reply to a specific portion of your text.)

On Oct 30 00:34:28, lists.curl_at_staktrace.com wrote:
> Hello curl users,
>
> I was recently writing a script to download a JSON file with curl, and discovered that the server was sending the file with 'Content-Encoding: gzip'. The downloaded file therefore had to be gunzip'd before it was usable. Other similar JSON files from the same server were *not* being similarly encoded, so I couldn't just pipe the result through gunzip unconditionally. After some searching online, I found [1] which said to use the --compressed argument, and sure enough that resolved my problem.
>
> The documentation for --compressed says that it makes curl *request* a compressed response, which is not quite the same as just decompressing the received response. So --compressed actually does both - request a compressed response, and automatically decompress the response if needed.

Yes, and the manpage says so:

        --compressed

        (HTTP) Request a compressed response using one of the
        algorithms curl supports, and save the uncompressed document.
        If this option is used and the server sends an unsupported
        encoding, curl will report an error.

> I only need the latter,

That is, you only need to uncompress, IF the response is compressed,
without asking for the compression yourself. Right?
That seems like a sensible default behaviour.

> I also looked at the relevant HTTP spec [2], which says (paraphrasing) that a request without any Accept-Encoding headers means the server can send any Content-Encoding in response. Personally, I think that if the client side is capable of decoding the encoding, it should attempt to do so, as that provides the most useful default.

Yes.

> Otherwise it's up to the user of curl to check for encodings and explicitly decompress them. It just seems like a not-so-great pitfall.
>
> Does anybody have examples where turning on automatic content-decoding would adversely impact the use case? Any other comments on changing the default behaviour here? I'm curious to know also if other people have run into this problem before.

On Oct 30 09:51:30, daniel_at_haxx.se wrote:
> On Tue, 30 Oct 2018, Kartikaya Gupta wrote:
>
> > Does anybody have examples where turning on automatic content-decoding
> > would adversely impact the use case? Any other comments on changing the
> > default behaviour here?
>
> I'm in favor of this change unless someone can present a use case indicating
> a risk for serious user discomfort.
>
> Changing the default to use '--compressed' by default might also improve
> life for a bunch of users since it might save bandwidth and can reduce
> transfer waits (since it'll need to transfer less data to get the job done).

Please don't change the default to --compressed, as that also *requires*
compression. For a slow machine with a fast connection (not that rare
when serving big static files), this would actually be counterproductive.
For example, my slow home server will happily serve a 700MB *.iso file
down its fast line, but it would take ages if it were to compress it.

> It could also be noted that --no-compressed would then be necessary to use
> in order to *not* ask for a compressed (and auto-decompress) resource.

Not asking for anything extra should imho be the default.
If the user wants the server to compress the content, let him say so.

On Oct 30 10:14:06, dan_at_coneharvesters.com wrote:
> On Tue, Oct 30, 2018 at 09:51:30AM +0100, Daniel Stenberg wrote:
> > On Tue, 30 Oct 2018, Kartikaya Gupta wrote:
> > >Does anybody have examples where turning on automatic content-decoding
> > >would adversely impact the use case? Any other comments on changing the
> > >default behaviour here?
> >
> > I'm in favor of this change unless someone can present a use case indicating
> > a risk for serious user discomfort.
>
> Any script that downloads compressed tar balls will suddenly start breaking on
> many servers if curl starts silently decompressing them. Many servers mark
> these with Content-Encoding: gzip so this is a very common case

Can you please give an example?
I haven't found any server that does that.

On Oct 30 12:06:15, lists.curl_at_staktrace.com wrote:
> On Tue, Oct 30, 2018 at 10:14:06AM +0100, Dan Fandrich wrote:
> > Any script that downloads compressed tar balls will suddenly start breaking on
> > many servers if curl starts silently decompressing them. Many servers mark
> > these with Content-Encoding: gzip so this is a very common case, I would guess.
> > I'd say that having it on by default would probably make more sense if it
> > weren't for this.
>
> Are you referring to tarballs that are already compressed (e.g. foo.tgz
> or foo.tar.gz)? If so, the server should only be marking it
> 'Content-Encoding: gzip' if the the tarball is double-compressed.

Exactly. Responding with 'Content-Encoding: gzip' means that the content
has been compressed. The fact that the content itself is a gzipped
tarball is none of the server's bussines.

> If it is sending the raw foo.tgz bytes with a 'Content-Encoding: gzip' that
> sounds like a server bug.

Yes.

On Oct 30 13:20:34, dan_at_coneharvesters.com wrote:
> On Tue, Oct 30, 2018 at 12:06:15PM +0000, Kartikaya Gupta wrote:
> > On Tue, Oct 30, 2018 at 10:14:06AM +0100, Dan Fandrich wrote:
> > > Any script that downloads compressed tar balls will suddenly start breaking on
> > > many servers if curl starts silently decompressing them. Many servers mark
> > > these with Content-Encoding: gzip so this is a very common case, I would guess.
> > > I'd say that having it on by default would probably make more sense if it
> > > weren't for this.
> >
> > Are you referring to tarballs that are already compressed (e.g. foo.tgz
>
> Yes.
>
> > or foo.tar.gz)? If so, the server should only be marking it
> > 'Content-Encoding: gzip' if the the tarball is double-compressed. If it
> > is sending the raw foo.tgz bytes with a 'Content-Encoding: gzip' that
> > sounds like a server bug.
>
> I wouldn't call it necessarily a bug—the file is gzip compressed and the
> contents need to be uncompressed to make use of it,

That's none of the server's bussines. Neither is its job to speculate
on what the content is and what the client wants to do with it.

> so Content-Encoding: gzip isn't prima facie wrong.

Yes it is. If the server sends the file raw, as is,
it's plainly wrong to say "Content-Encoding: gzip".

If there is any need to advise the client about what the content is,
there is the Content-Type header.

> It's just that many clients would prefer to keep the
> file compressed in order to save it to disk intact. But consider a web
> application that, for example, lets you list the directories of tar balls
> given the URL of one on the web.

I don't know what you mean: what application?
Do you mean an FTP directory listing?

> For such a client,

What client? curl is the client here.
(And the server doesn't care who the client is.)

> Content-Encoding: gzip makes
> sense as the data it needs is in tar format, not gzip format.
>
> Regardless, many servers do this

Please show an URL of a gzipped tarball on some such server.

On Oct 31 09:56:01, dan_at_coneharvesters.com wrote:
> On Wed, Oct 31, 2018 at 12:56:37AM +0000, Kartikaya Gupta wrote:
> > If there many servers that do this, then silent decompression is likely
> > to break things. But maybe sending 'Accept-Encoding: identity' with the
> > default request (i.e. when --compressed is not specified) would work
> > here?
>
> I doubt that would change anything. Most such servers have a dumb rule to just
> Content-Encoding: gzip to any file ending in .gz and nothing the client sends
> would change that.

Which http server does that, for example?

On Oct 30 12:01:51, lists.curl_at_staktrace.com wrote:
> For the record, I was originally thinking that curl should just do the
> decompression part by default, not necessarily *request* the compression
> by default. (i.e. it would keep the current default of not sending a
> Accept-Encoding header, but would decode any reponses that were
> content-encoded anyway).

That seems like the obvious thing to do.

        Jan

-----------------------------------------------------------
Unsubscribe: https://cool.haxx.se/list/listinfo/curl-users
Etiquette: https://curl.haxx.se/mail/etiquette.html
Received on 2018-10-31