curl / Mailing Lists / curl-library / Single Mail

curl-library

On the right way to use HTTP/2 multiplex

From: Arnaud Rebillout via curl-library <curl-library_at_cool.haxx.se>
Date: Thu, 21 Mar 2019 11:06:53 +0700

Hi libcurl devs,

I'm writing an application that uses libcurl, and I have no prior
expertise with HTTP, so I'd like to make sure I got things right.

I'm working on the internal http client of casync [1]. This client is
simple, it basically has a list of files to download, and its job is to
download it efficiently. We're talking about small chunks of data
(around 64KB), but the list is possibly huge (60,000 chunks is very
possible). And we talk to only ONE server.

Since we live in a modern world, I explicitly enable
`CURL_HTTP_VERSION_2_0` and `CURLPIPE_MULTIPLEX`, and I assume that the
server supports it.

In a **first implementation**, I just create a curl easy handle for each
chunk I need to download (so, possibly 60k easy handles), add it to the
curl multi, and then I let curl deal with it. I also make sure to set
`CURLMOPT_MAX_TOTAL_CONNECTIONS` to ensure that the whole thing doesn't
go crazy (I used 64 at first, but after more reading I wonder if I
should lower that to 8).

It works good this way. Even too good. My issue then, during local tests
(with both client and server on my machine), is that the client isn't
fast enough to handle all the incoming chunks. Indeed, the client needs
to give the chunks to another co-process, through a custom IPC, and this
proved to be the bottleneck. So what happened is that all my chunks were
downloaded very quickly, and then sat in RAM until the client had time
to forward it to its co-process. Even though it works, it possibly uses
a lot of RAM and it's not nice.

Of course, this doesn't happen in "real-life", when the server is away
and the latency is higher. Then the client has time to handle the
chunks, and everything works beautifully.

I didn't find a way to to tell libcurl to pause or slow down in case
things go too fast, so I went for a **second implementation**, slightly
different. I decided that instead of creating one easy handle per chunk
request and feed it all to the curl multi handle, I would only create a
small number of easy handles (let's say 8) and give it to curl multi.
Only when a chunk is downloaded and handled by the client, then I re-use
the easy handle (ie. remove it from the multi handle, set a new URL, and
give it back to the curl multi for processing).

This implementation works good as well.

Now, I take a bit of time to think, and I wonder if this second
implementation is really the smart thing to do. More precisely: by
feeding handles one by one (even though we might have 8 active handles
in curl multi at the same time), do I prevent internal optimization
within libcurl? How can libcurl multiplex efficiently if I don't tell it
in advance the list of chunks I want to download?

So basically, I think that my first implementation was better than the
second one, can you agree or disagree, based on your knowledge of
libcurl internals?

I also take this chance to ask a second question, out of curiosity: with
HTTP/2 multiplex enabled, will libcurl also attempt to open concurrent
connections, and do multiplex on all these connections? Or does it stick
to one connection?

Thanks!

  Arnaud

----
[1]:
http://0pointer.net/blog/casync-a-tool-for-distributing-file-system-images.html
-------------------------------------------------------------------
Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library
Etiquette:   https://curl.haxx.se/mail/etiquette.html
Received on 2019-03-21