curl / Mailing Lists / curl-library / Single Mail

curl-library

Re: On the right way to use HTTP/2 multiplex

From: Kunal Ekawde via curl-library <curl-library_at_cool.haxx.se>
Date: Thu, 21 Mar 2019 12:13:46 +0530

>I also take this chance to ask a second question, out of curiosity: with
>HTTP/2 multiplex enabled, will libcurl also attempt to open concurrent
>connections, and do multiplex on all these connections? Or does it stick
>to one connection?

To what I've observed, libcurl would try to open new connection to same
host depending on
CURLMOPT_MAX_HOST_CONNECTIONS & CURLMOPT_MAX_TOTAL_CONNECTIONS

Following the piece of code from url.c
  /* If we found a reusable connection that is now marked as in use, we may
     still want to open a new connection if we are pipelining. */
  if(reuse && !force_reuse && IsPipeliningPossible(data, conn_temp)) {
    size_t pipelen = conn_temp->send_pipe.size + conn_temp->recv_pipe.size;
    if(pipelen > 0) {
      infof(data, "Found connection %ld, with requests in the pipe (%zu)
conn_bund:%u maxperhost:%u conncache:%u maxtotal:%u\n",
            conn_temp->connection_id, pipelen,
Curl_conncache_bundle_size(conn_temp), max_host_connections,
Curl_conncache_size(data), max_total_connections);

* if(Curl_conncache_bundle_size(conn_temp) < max_host_connections
&& Curl_conncache_size(data) < max_total_connections) {*
        /* We want a new connection anyway */
        reuse = FALSE;

        infof(data, "We can reuse, but we want a new connection anyway\n");
        Curl_conncache_return_conn(conn_temp);
      }
    }
  }

~Kunal

On Thu, Mar 21, 2019 at 11:49 AM Arnaud Rebillout via curl-library <
curl-library_at_cool.haxx.se> wrote:

> Hi libcurl devs,
>
> I'm writing an application that uses libcurl, and I have no prior
> expertise with HTTP, so I'd like to make sure I got things right.
>
> I'm working on the internal http client of casync [1]. This client is
> simple, it basically has a list of files to download, and its job is to
> download it efficiently. We're talking about small chunks of data
> (around 64KB), but the list is possibly huge (60,000 chunks is very
> possible). And we talk to only ONE server.
>
> Since we live in a modern world, I explicitly enable
> `CURL_HTTP_VERSION_2_0` and `CURLPIPE_MULTIPLEX`, and I assume that the
> server supports it.
>
> In a **first implementation**, I just create a curl easy handle for each
> chunk I need to download (so, possibly 60k easy handles), add it to the
> curl multi, and then I let curl deal with it. I also make sure to set
> `CURLMOPT_MAX_TOTAL_CONNECTIONS` to ensure that the whole thing doesn't
> go crazy (I used 64 at first, but after more reading I wonder if I
> should lower that to 8).
>
> It works good this way. Even too good. My issue then, during local tests
> (with both client and server on my machine), is that the client isn't
> fast enough to handle all the incoming chunks. Indeed, the client needs
> to give the chunks to another co-process, through a custom IPC, and this
> proved to be the bottleneck. So what happened is that all my chunks were
> downloaded very quickly, and then sat in RAM until the client had time
> to forward it to its co-process. Even though it works, it possibly uses
> a lot of RAM and it's not nice.
>
> Of course, this doesn't happen in "real-life", when the server is away
> and the latency is higher. Then the client has time to handle the
> chunks, and everything works beautifully.
>
> I didn't find a way to to tell libcurl to pause or slow down in case
> things go too fast, so I went for a **second implementation**, slightly
> different. I decided that instead of creating one easy handle per chunk
> request and feed it all to the curl multi handle, I would only create a
> small number of easy handles (let's say 8) and give it to curl multi.
> Only when a chunk is downloaded and handled by the client, then I re-use
> the easy handle (ie. remove it from the multi handle, set a new URL, and
> give it back to the curl multi for processing).
>
> This implementation works good as well.
>
> Now, I take a bit of time to think, and I wonder if this second
> implementation is really the smart thing to do. More precisely: by
> feeding handles one by one (even though we might have 8 active handles
> in curl multi at the same time), do I prevent internal optimization
> within libcurl? How can libcurl multiplex efficiently if I don't tell it
> in advance the list of chunks I want to download?
>
> So basically, I think that my first implementation was better than the
> second one, can you agree or disagree, based on your knowledge of
> libcurl internals?
>
> I also take this chance to ask a second question, out of curiosity: with
> HTTP/2 multiplex enabled, will libcurl also attempt to open concurrent
> connections, and do multiplex on all these connections? Or does it stick
> to one connection?
>
> Thanks!
>
> Arnaud
>
> ----
>
> [1]:
>
> http://0pointer.net/blog/casync-a-tool-for-distributing-file-system-images.html
>
> -------------------------------------------------------------------
> Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library
> Etiquette: https://curl.haxx.se/mail/etiquette.html

-- 
~Kunal

-------------------------------------------------------------------
Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library
Etiquette: https://curl.haxx.se/mail/etiquette.html
Received on 2019-03-21