curl-users
[STATUS UPDATE] Parallelism
Date: Fri, 26 Apr 2019 23:27:09 +0200 (CEST)
Hi again,
Since I got even further today, I just wanted to share with you all how this
new functionality works right now and possibly trick one or two of you to try
it out or provide some thoughts and ideas!
Status:
- Parallel transfers work really good. I've repeatedly done several hundred
concurrent requests and they just... work!
- I'm spending the -Z single letter option for enabling this! (the long form
is --parallel). We only have two single letters left, W and Z and for some
reason I think -Z is the least bad option.
- I'm limiting the default concurrency to 50 by default. Meaning that even if
you add more transfers, curl will only do 50 transfers simultaneously and as
soon one of those transfers are completed, it will start the next one in the
queue. There's no limit to the number of transfers that it can do. Is 50
a good enough default?
- I've added support for --parallel-max to change the concurrency amount, and
right now it has a hard maximum of 400. Simply because of the reason that
above that a typical linux machine runs into problems with too many file
descriptors in use. I think we'll have reasons to work on where exactly
this max should be and how to figure it out.
- The status meter alone feels like something I could write a lot about. The
first point perhaps being that we can't use the "normal" progress meter/bars
when doing parallel transfers. I've implemented a completely new one that
is designed to handle any amount of transfers. It's a little tricky. It
currently shows:
o percent download (if known, which means *all* transfers need to have a
known size)
o precent upload (if known, with the same caveat as for download)
o total amount of downloaded data
o total amount of uploaded data
o number of transfers to perform
o number of concurrent transfers being transferred right now
o number of transfers queued up waiting to start
o total time all transfers are expected to take (if sizes are known)
o current time the transfers have spent so far
o estimated time left (if sizes are known)
o current transfer speed (the faster of UL/DL speeds measured over the last
few seconds)
Here's an example progress meter snapshot. It's me asking for 101 transfers,
consisting of in total 52.7GB of data. Asking for '--parallel-max 30' it
means that there will never be more than 30 "live" transfers. In this case
here, 48 transfers are already completed:
DL% UL% Dled Uled Xfers Live Qd Total Current Left Speed
72 -- 37.9G 0 101 30 23 0:00:55 0:00:34 0:00:22 2752M
Good enough? What's missing here that we need?
Work pending:
- Ponder if there's any way we can "report" transfer status/success for
individual transfers here that makes sense to users. Doing 101 transfers
is all fun and games, but if one of the transfers failed, surely a user
would like a way to figure this out?
- Tests. Tricky, but I have not gotten to this yet.
- make --retry work for parallel
- Consider a --parallel-host-max to limit the number of connections done to
a single host but I also think that it is better saved as a separate PR
for once the initial support lands.
-- / daniel.haxx.se ----------------------------------------------------------- Unsubscribe: https://cool.haxx.se/list/listinfo/curl-users Etiquette: https://curl.haxx.se/mail/etiquette.htmlReceived on 2019-04-26