curl / Mailing Lists / curl-users / Single Mail
Buy commercial curl support from WolfSSL. We help you work out your issues, debug your libcurl applications, use the API, port to new platforms, add new features and more. With a team lead by the curl founder himself.

Re: feature request: expected payload size command-line flag

From: Danny McClanahan via curl-users <curl-users_at_lists.haxx.se>
Date: Wed, 02 Nov 2022 07:23:04 +0000

> On 11/1/2022 3:27 PM, Danny McClanahan via curl-users wrote:
>
> > I was recently looking to download my twitter user data archive via curl since my browser was shorting out. The file size was quite large, and twitter fails to provide an exact Content-Length for some reason, except in their own custom header e.g. "x-ton-expected-size: 8274859056", which means the default curl progress output was unable to estimate the remaining time for the download. This of course looks like:
> > % Total % Received % Xferd Average Speed Time Time Time Current
> > Dload Upload Total Spent Left Speed
> > 100 9.9M 0 9.9M 0 0 1820k 0 --:--:-- 0:00:05 --:--:-- 1981k
> >
> > As it turns out, even when the download completes successfully (in either the browser or curl), the zip file twitter provides for my account is corrupt, but that's not curl's problem. I'm mostly interested in whether someone has already considered adding a way to provide an expected Content-Length to curl in order to obtain the benefits of the progress bar, such as estimating remaining time.
> >
> > I have tried setting --max-filesize, but that doesn't work for my purposes for two reasons:
> > 1. It doesn't affect the progress output ("Time Left" remains at "--:--:--"), so it does not solve the problem.
> > 2. It would cut off the download after that many bytes, whereas my use case does not expect to know the precise number of bytes in advance, and I need to ensure I download the complete file (instead, --max-filesize would complement this proposed feature by setting an upper bound for payload size so I can avoid downloading more than I have space for).
> >
> > In searching archives of this mailing list, I found this issue (https://github.com/curl/curl/issues/2158), which provides an easier repro case of a download missing a Content-Length:"https://github.com/torvalds/linux/archive/v4.14-rc1.tar.gz", but wasn't immediately able to find discussion about hard-coding an expected payload length when not provided.
> >
> > I'd like to know whether this feature has already been considered already, or whether there are likely to be any blockers. I'm not yet too familiar with how curl communicates with libcurl, but if libcurl produces the progress output, and libcurl requires a precise (instead of estimated) Content-Length to produce the progress estimate, I could see this requiring a change to libcurl. But I'm hoping this can be implemented purely in the curl command-line tool.
> >
> > I'm planning to take a stab at implementing this change now from my checkout of the curl git repo, but would love to hear any objections to this feature as well. I was thinking this would be a command-line flag that accepts the same type of size specification that --max-filesize does. I was also planning to print out a warning and ignore the value of this flag if the response provides its own Content-Length, in cases such as described inhttps://github.com/curl/curl/issues/2158 above, where the Content-Length may or may not be set.
>
>
>
> I think an expected content length option is too niche to add to the
> curl tool. I would likely vote against it. If the server chooses chunked
> encoding or otherwise does not supply the length then there's no
> accepted way to measure the length, so I think working off something
> like x-ton-expected-size (which AFAICT is specific to twitter) is too
> niche as well.

Thank you so much for your helpful framing! I mention below a few use cases outside of the twitter download that I learned about after spending some time looking into the codebase today. Also, I have tentatively named the flag --expected-filesize for easier discussion.

First: yes, the twitter download is served from a "ton.twitter.com" domain, so I believe that x-ton-expected-size will not be available elsewhere.

However, https://github.com/curl/curl/issues/2158 does demonstrate another case where Content-Length may not be available (depending on whether github has recently served that tarball), and after reviewing other options, I believe the existing --ignore-content-length flag also demonstrates another case where the Content-Length may exist, but be incorrect (for specific versions of the Apache server). In fact, I believe /every/ command line using --ignore-content-length would be able to make use of the --expected-filesize flag I'm proposing here; since that flag was deemed useful enough to add, and --expected-filesize seems usable for strictly /more/ use cases than --ignore-content-length, I think there is reasonable-enough precedent to demonstrate this use case is broader than it may seem at first. That is, unless --ignore-content-length is itself considered a mistake (although since it's not deprecated, I assume it's still considered useful).

Also, after diving into the code today, I believe that this change would probably necessitate a libcurl modification, since I see that libcurl also provides a progress bar. However, since I see that src/tool_progress.c:progress_meter() appears to calculate progress separately from libcurl, that may not yet be necessary. I will probably try to get together a proof of concept before abandoning this.
-- 
Unsubscribe: https://lists.haxx.se/listinfo/curl-users
Etiquette:   https://curl.se/mail/etiquette.html
Received on 2022-11-02