curl / Mailing Lists / curl-users / Single Mail

curl-users

[STATUS UPDATE] Parallelism

From: Daniel Stenberg <daniel_at_haxx.se>
Date: Fri, 26 Apr 2019 23:27:09 +0200 (CEST)

Hi again,

Since I got even further today, I just wanted to share with you all how this
new functionality works right now and possibly trick one or two of you to try
it out or provide some thoughts and ideas!

Status:

- Parallel transfers work really good. I've repeatedly done several hundred
   concurrent requests and they just... work!

- I'm spending the -Z single letter option for enabling this! (the long form
   is --parallel). We only have two single letters left, W and Z and for some
   reason I think -Z is the least bad option.

- I'm limiting the default concurrency to 50 by default. Meaning that even if
   you add more transfers, curl will only do 50 transfers simultaneously and as
   soon one of those transfers are completed, it will start the next one in the
   queue. There's no limit to the number of transfers that it can do. Is 50
   a good enough default?

- I've added support for --parallel-max to change the concurrency amount, and
   right now it has a hard maximum of 400. Simply because of the reason that
   above that a typical linux machine runs into problems with too many file
   descriptors in use. I think we'll have reasons to work on where exactly
   this max should be and how to figure it out.

- The status meter alone feels like something I could write a lot about. The
   first point perhaps being that we can't use the "normal" progress meter/bars
   when doing parallel transfers. I've implemented a completely new one that
   is designed to handle any amount of transfers. It's a little tricky. It
   currently shows:

   o percent download (if known, which means *all* transfers need to have a
     known size)
   o precent upload (if known, with the same caveat as for download)
   o total amount of downloaded data
   o total amount of uploaded data
   o number of transfers to perform
   o number of concurrent transfers being transferred right now
   o number of transfers queued up waiting to start
   o total time all transfers are expected to take (if sizes are known)
   o current time the transfers have spent so far
   o estimated time left (if sizes are known)
   o current transfer speed (the faster of UL/DL speeds measured over the last
     few seconds)

   Here's an example progress meter snapshot. It's me asking for 101 transfers,
   consisting of in total 52.7GB of data. Asking for '--parallel-max 30' it
   means that there will never be more than 30 "live" transfers. In this case
   here, 48 transfers are already completed:

   DL% UL% Dled Uled Xfers Live Qd Total Current Left Speed
   72 -- 37.9G 0 101 30 23 0:00:55 0:00:34 0:00:22 2752M

   Good enough? What's missing here that we need?

Work pending:

   - Ponder if there's any way we can "report" transfer status/success for
     individual transfers here that makes sense to users. Doing 101 transfers
     is all fun and games, but if one of the transfers failed, surely a user
     would like a way to figure this out?

   - Tests. Tricky, but I have not gotten to this yet.

   - make --retry work for parallel

   - Consider a --parallel-host-max to limit the number of connections done to
     a single host but I also think that it is better saved as a separate PR
     for once the initial support lands.

-- 
  / daniel.haxx.se
-----------------------------------------------------------
Unsubscribe: https://cool.haxx.se/list/listinfo/curl-users
Etiquette:   https://curl.haxx.se/mail/etiquette.html
Received on 2019-04-26