cURL / Mailing Lists / curl-library / Single Mail

curl-library

Re: Avoiding creation of multiple connections to an http capable server before CURLMOPT_MAX_PIPELINE_LENGTH is reached.

From: Ray Satiro <raysatiro_at_yahoo.com>
Date: Fri, 31 Oct 2014 00:56:03 -0400

On 10/30/2014 9:32 PM, Carlo Wood wrote:
> Note that English is not my first language. If the text below sounds
> awkward at times, then that is the reason. I assure you that I'm a very
> capable coder with decades of networking experience, despite that
> fact and hope you are willing to over look this, and still take my post
> seriously.

You write well I think. I'm no expert but English is my first language.
Dividers seem over and underused in some places (think commas). That
you're capable was pretty obvious after the first few paragraphs!

> 2) For several different reasons, it can happen that an existing
> connection, which libcurl knows that supports pipelining, is
> closed. So far I know of three reasons why this happens: a) a connection
> is reset by peer (resulting in a "retry"), b) a request times out (due
> to CURLOPT_TIMEOUT) or c) the connection cache is full and a pipeline
> connection is picked to be closed.
>
> In all cases, since there is only a single connection to that server
> (as desired) closing such a connection currently causes the bundle for
> that site to be deleted, and hence for libcurl to forget that that
> site is http pipeline capable. After this, since in all cases the "pipe
> broke", all not-timed-out requests in the pipe are being internally
> retried, all instantly-- before the server replies to the first-- hence
> each resulting in a new connection and the length of the pipe
> (CURLMOPT_MAX_PIPELINE_LENGTH) is converted into
> CURLMOPT_MAX_PIPELINE_LENGTH parallel connections!

That sounds a lot like a bug, but I also wonder if converting to
parallel connections was done intentionally to workaround possibly buggy
servers. If it's unintended you could probably make an independent cache
table for hosts (or IP but maybe that's risky) so that once libcurl
knows whether or not its getting HTTP/1.1 you could store that
information. A table might exist already that you can piggyback onto, I
don't know. Also I'm assuming here an HTTP/1.1 response implies
pipelining support, but there's an interesting discussion on mozilla
about that [1]. Example cdn returning HTTP/1.0 for gif and HTTP/1.1 for
html --same host-- if anyone is still doing that.

> ANALYSIS OF OPTIONS AND PROPOSAL
> --------------------------------
>
> When a server connection is made and it is not yet known if the that
> connection supports pipelining or not then there are the following
> possibilities:
>
> 1) Libcurl creates a new connection for every new request (this
> is the current behavior).
>
> 2) Libcurl creates a single connection, and queues all other
> requests until it received all headers for that first request
> and decided if the server supports pipelining or not.
> If it does, it uses the same connection for the the queued
> requests. If it does not, then it creates new connections
> for each request as under 1).
>
> 3) Libcurl assumes every connection can do pipelining and just
> sends all requests over the first connection, until it finds
> out that this server can't do pipelining - in which case the
> pipe 'breaks' and already existing code will cause all requests
> to be retried - now knowing that the server does not support
> pipelining.
>
> 4) Libcurl doesn't wait for headers but uses information from
> the user to decide if the server is supposed to support
> http pipelining; meaning that it picks strategy 2) or 3) based
> on a flag set on the easy handle by the user.
>
> I think that option 4 is by far the best experience for the user;
> however it is the hardest to implement because it requires to
> implement both 2) and 3) as well as add the code to add a new flag.

If the programmer has turned on CURLMOPT_PIPELINING isn't it already
implied? What about if CURLMOPT_PIPELINING is on and host HTTP version
is not in cache table then before any requests do OPTIONS * HTTP/1.1 to
get the HTTP protocol version. The * doesn't work with a proxy though I
guess I haven't thought this out, still that's the general idea.
Basically I wonder if it would be faster to detect the version first by
doing something where no resource is requested (if that's possible in
every case), then send the requests. OPTIONS isn't working for me on
google's web server I get a 1k 405 method not allowed so I guess not
everyone supports it.

> Note that just option 2) (instead of 4) isn't that bad: if pipelining
> is supported and the first request would cause a stall of -say-
> 10 seconds, then also in that case all requests added behind that
> in the pipeline would be stalled ("queued" on the server, instead
> of client-side on the viewer).

That initial delay might not be bad for you but maybe it could be for
someone else depending on how they're using the library.

> If there are no objections then I will continue to implement
> point 2), after which I'd decide if I want to go ahead and do 4 (and 3)
> as well. I'd prefer to get some feedback from the appropriate devs
> however before I do all this work and then it turns out that people
> have better ideas ;).

I'm hardly experienced with developing libcurl. These are only
suggestions and I cannot help you implement them. I have made only minor
contributions so take my comments for what they're worth. I hope someone
with experience will speak up and give you some feedback.

[1]: https://bugzilla.mozilla.org/show_bug.cgi?id=264354

-------------------------------------------------------------------
List admin: http://cool.haxx.se/list/listinfo/curl-library
Etiquette: http://curl.haxx.se/mail/etiquette.html
Received on 2014-10-31