cURL / Mailing Lists / curl-library / Single Mail

curl-library

Re: HTTP Pipelining Contributions

From: Dan Fandrich <dan_at_coneharvesters.com>
Date: Fri, 27 Jul 2012 23:43:39 +0200

On Thu, Jul 26, 2012 at 05:30:26PM +0000, Joe Mason wrote:
> Can they? curl today doesn't expose the number of connections and the
> mapping of curl handles to connections. The only options I can find

For pipelined connections, this is implicitly controlled by the
application as it adds pipelined easy handles to a multi handle.

> to control how a request is mapped to a connection are
> The only callbacks I can find today that can be used to control how
> requests are assigned to connections are CURLOPT_FRESH_CONNECT and
> CURLOPT_FORBID_REUSE, which force requests to use new connections (and
> disable both pipelining and regular connection reuse).
>
> I don't see any way to implement this proposal outside curl without
> adding functions to assign requests to connections explicitly, and I
> thought that Daniel was strongly against that.

I'll admit I'm not completely familiar with the existing pipelining
code, but my understanding is that libcurl will pipeline all it can on a
single connection within a multi handle. An application that wants two
connections would use two multi handles. Within each handle, the
application can control (and in some cases, even reorder) requests and
the pipeline depth by controlling when the easy handles are added to the
multi handle. This could become a bit hairy, which is why I suggest
delegating it to a libcurlapp which would only have to be written once,
not for each app.

> What would the interface to curl look like?

I'm thinking of something along the lines of the multi interface, where
the application sets up easy handles then hands them to a pipelining
controller of some sort. The pipelining controller then creates a
multi handle (or handles) internally and assigns easy handles to them
as best befits the desired pipelining characteristics. If a new
high-priority, low latency request (a.k.a. easy handle) comes in for
example, the controller could create a new multi handle and connection to
serve it. Or, if an existing multi handle and connection are available
but are currently being used for a request that is almost complete, it
could just pipeline the new request there. If too many existing
connections are idle, it would close them.

A memory or disk caching feature would fit well in here nicely as
well. If an easy handle requests a URL that is already cached,
libcurlapp would completely bypass the network and return the cached
data instead (though the normal callbacks). To the app, it looks like a
normal transfer has occurred.

This controller would chain in to the callbacks to gain access to the
connection state, and would be responsible for setting options like
CURLOPT_FRESH_CONNECT, CURLOPT_FORBID_REUSE and CURLOPT_MAXCONNECTS.
Instead, the application would configure the controller separately with
the desired number of connections, maximum pipeline depth, cache
directory & size, etc., which the controller would use to set the
lower-level easy handle options.

The application then just needs to set up easy handles and lets the
controller be responsible for making sure the requests are fulfilled in
the optimum manner.

> curl owns the concepts of
> connections and requests, and in fact generates some requests (such as
> HTTP auth followups or CONNECTs) internally. So it would need to call
> into the pipelining library for each request asking what to do with
> it. And when it wanted to open a socket, it would need to call into
> the pipelining library for permission (if the pipelining library does
> load balancing, it's in charge of enforcing connection limits) - and
> if it returned false, do what? When it's time to close a socket, curl
> would need to call into the pipelining library to decide which one to
> get rid of. That's a lot of callbacks, and curl would need to have a
> default path for people not using the pipelining library as well.

A pipelining controller can tell what libcurl is doing with the requests
and knows how it will handle new ones, so it uses that knowledge to
optimize dispatching requests to implement the desired pipelining
heuristics. I haven't thought through the corner cases maybe it's not
actually going to be sufficient, but the existing callbacks provide
pretty good insights into the connection state already.

> The hardest part of this proposal is adding a request queue to curl,
> and I think that would be needed even with a pipelining library, since
> if we add any limits on the number of simultaneous requests, curl is
> going to have to put them somewhere.

The request queue would be completely controlled by libcurlapp. The app
hands over an easy handle, and eventually gets back the data it wants.

> Unless the idea is that the client code interfaces with the pipelining
> library instead of with curl directly, and the pipelining library
> queues requests and doesn't send them to curl until it's decided a
> connection exists to handle them. But this means a design where the
> pipelining library drives curl, which is sort of an inversion of the
> interface that curl uses today. It would mean that it's hard to add
> support for this library to an app that's already using curl, and
> probably a lot of redesign of curl's interface itself (for example,
> if the pipelining library is driving, that conflicts with curl's
> automatic generation of auth followups and CONNECTs). Sounds like a
> can of worms.

The pipelining library drives curl in exactly the same way that
applications drive curl today. It might be necessary to restrict some
libcurl features that aren't compatible with this approach, but I'm not
sure that there are any yet. Some, such as tunneling requests through an
HTTP/1.0 proxy mean that there won't be any pipelining going on, but
they operations will still at least work. The auth issues would be
no worse than they are today using the existing pipelining interface,
and it might be possible to use the new auth callback interfaces being
discussed here recently to optimize that case as well.

> I think this sounds like a great idea, since the internals of curl
> are pretty hairy and so separating concerns into another layer with
> well-defined boundaries would help with complexity a lot. But some
> things which would logically be part of this are already in curl, so
> we'd want to refactor it - but if we move functionality out of curl
> into this new library, how do we maintain compatibility? I wouldn't
> know where to begin to start with this.

Most of those features (except for Metalink, which I'm still not
convinced belongs in curl :-) aren't already fully supported, at least
in a way that's convenient for applications. Existing functionality
can't be removed from libcurl, at least in the short term, to preserve
backward compatibility, but some features could be reimplemented in a
libcurlapp in preparation for such a date when the are removed. And if
libcurlapp existed, then future feature requests that are turned down in
libcurl because they're not a good fit would have someplace to go.

>>> Dan
-------------------------------------------------------------------
List admin: http://cool.haxx.se/list/listinfo/curl-library
Etiquette: http://curl.haxx.se/mail/etiquette.html
Received on 2012-07-27