cURL / Mailing Lists / curl-library / Single Mail

curl-library

Re: [PATCH 2/7] pipelining: Fix connection handling under timeouts.

From: Carlo Wood <carlo_at_alinoe.com>
Date: Sun, 9 Nov 2014 21:33:20 +0100

On Sat, 8 Nov 2014 20:42:10 -0500 (EST)
m brandenberg <mcbinc_at_panix.com> wrote:

> I have also seen this behavior: connection made and broken
> without anything going out on the wire. Not certain that
> this was the cause but it certainly does happen.

I am certain that this was a cause. Also, I am no longer seeing
this behavior at ALL (in the heavy duty production environment).
The pipeline connection is a rock solid single connection.

Please note that I use extreme(tm) debugging code which I have
not pushed to git so you don't know about it ;). But I am really
knowing about every single bit that is going on, even in the complex
production environment (and needed to know that in order to fix
the bugs). Therefore I'm fairly sure that this was also the reason
you saw this behavior - as it is gone now.

...
> > + /* Make sure this connection is no longer considered for
> > + pipelining. */
> > + data->easy_conn->bits.timedout = TRUE;
>
> Request has timed out, but the connection hasn't. Connection
> has been poisoned by the timedout request. Other scenarios
> might poison it for other reasons. (Trying to capture the
> core idea.)

The new 'bits.timedout' correspond to your 'poisoned_1420' yes.
I don't think a connection can time out. I've also not run
into any other reason why I'd want or need to poison a connection:
the only reason to do this is when the pipe breaks; if that literally
happens due to a read or write error on the socket (ie, connection
reset by peer) then that was already handled and certainly doesn't
give rise that connection still being used :p haha.

What makes timeouts so special is that it is the only reason
why the (incoming) data on the pipe can no longer we trusted,
without that the connection was closed. The reason it wasn't
closed however was this:

> > data->result = CURLE_OPERATION_TIMEDOUT;
> > - multistate(data, CURLM_STATE_COMPLETED);
> > - break;
> > + multistate(data, CURLM_STATE_DONE);

After that state change, the connection IS closed, so it can
no longer be used anyway... Therefore, new 'timedout' bit might
actually no longer be needed.

> This state change worries me and those more familiar with
> multi.c should look at this very carefully. This appears
> to open several possibilities:
>
> * Calling Curl_multi_process_pending_handles() which
> will activate queued requests on the now-poisoned
> connection.

How so? All that function does is remove requests from
the pending list and set their state to CURLM_STATE_CONNECT,
they are not actually added to any connection until the
next run, which is when the connection is already closed
and deleted. The opposite is the case: without this state
change the connection survives and THEN pending requests
will be added to it sooner or later.

> * Cascading down the state machine rather than skipping
> it. This would seem to invite early return on this
> request allowing other (now invalid) requests to operate
> on this connection.

This is most certainly not the case. The state machine handles
one request at a time, and we're talking about a very limited
part of the state machine that is added to the processing
of this request, namely the CURLM_STATE_DONE state.

Remember that conn->bits.close has been set at this point.
If the state machine would add anything to the connection
then that would be a separate bug. However, it does not;
it does exactly what you'd expect when running the state
machine in state CURLM_STATE_DONE on a request with an error
status and a connection that is set to 'close'.

> > + data->multi_do_connection_id =
> > data->easy_conn->connection_id;
>
> If the meaning of 'multi_do_connection_id' is really
> that this request has had a side-effect on the connection,
> then this may be too late. Would the PROXY case need to
> be accounted for earlier (http or socks5)?

The meaning of 'multi_do_connection_id' is to recognize if
this request has had enough side-effect on the connection
that the connection cannot be reused by another request
while being sure that the that will work, and that nothing
else will be received before the normal reply to that
new request.

I have no been able to test the use of a proxy and I'm not familiar
with how they work. Can you explain to me why a connection might
be corrupted prior to actually sending the request out, in the presence
of a proxy?

> > @@ -2519,6 +2526,8 @@ void Curl_multi_set_easy_connection(struct
> > SessionHandle *handle, struct connectdata *conn)
> > {
> > handle->easy_conn = conn;
> > + handle->multi_do_connection_id = -1; /* Nothing was sent over
> > this connection
> > + by this easy handle yet.
> > */ }
>
> Hmm, re-used handles have stale connection IDs
> dangling in them?
>
> Also retried requests?

Actually, I can't find where Curl_multi_set_easy_connection is called.
I don't think it is used at all... So I can't comment on the possible
state of the passed handle. Perhaps a better patch would be to delete
this function all together?
 
> > diff --git a/lib/urldata.h b/lib/urldata.h
> > index 62a2b80..78dd1a8 100644
> > --- a/lib/urldata.h
> > +++ b/lib/urldata.h
> > @@ -486,6 +486,7 @@ struct negotiatedata {
> > struct ConnectBits {
> > /* always modify bits.close with the connclose() and connkeep()
> > macros! */ bool close; /* if set, we close the connection after
> > this request */
> > + bool timedout; /* if set, the transfer timedout */
>
> I don't know if this is a curl-library@ standard but I generally
> leave existing struct offsets alone when possible (adding new
> fields at the end).

That is an internal header, so it doesn't matter.

> > bool reuse; /* if set, this is a re-used connection */
> > bool proxy; /* if set, this transfer is done through a proxy -
> > any type */ bool httpproxy; /* if set, this transfer is done
> > through a http proxy */ @@ -1659,6 +1660,8 @@ struct SessionHandle {
> > struct SessionHandle *prev;
> >
> > struct connectdata *easy_conn; /* the "unit's" connection */
> > + long multi_do_connection_id; /* the value of
> > easy_conn->connection_id
> > + at the moment Curl_do was
> > called */
>
> Comment/naming/purpose question. This appears to be
> duplicated information which would usually be a 'bad thing'
> to do. But that is not the actual intent here, is it?
> The idea that is trying to be expressed is: "This request
> has had a side-effect on an associated connection."
> Correct? While the connection_id might be useful for
> debug, perhaps the name and comment should reflect the
> actual semantics.

Yes that is correct, see my comment above.
The original connection_id is only used for debugging purposes I think.
By using it like this, that is no longer the case -- but the usage is
legit I think; If the same functionality can be achieved differently
then by all means make a suggestion. I agree that the name is
pretty awkward, also open for suggestions regarding just that.

-- 
Carlo Wood <carlo_at_alinoe.com>
-------------------------------------------------------------------
List admin: http://cool.haxx.se/list/listinfo/curl-library
Etiquette:  http://curl.haxx.se/mail/etiquette.html
Received on 2014-11-09