cURL cURL > Mailing List > Monthly Index > Single Mail

curl-tracker Archives

[curl:bugs] #1420 Pipelining + client-side timeouts lead to loss of synchronization with response stream and incorrect data delivery

From: Monty Brandenberg <mbrandenberg_at_users.sf.net>
Date: Thu, 18 Sep 2014 18:23:20 +0000

What I've been seeing disagrees with the above description so I dug in with the debugger to investigate. There are two locations where timeouts are detected:

1. At the top of multi_runsingle() which handles all 'mstate' modes of the request handle, and
2. In Curl_readwrite() (called by multi_runsingle()) which implicitly supports only the PERFORM state.

The implementation of the two differs greatly. Detector 1. seems to be the most likely to fire and it is the one that generates the issues I see. Differences:

* Detector 1. marks 'bits.close' using connclose() and sets mstate to COMPLETED without doing cleanup or disconnecting. 2. does a more complete cleanup signalling all requests bound to the connection.
* Once bits.close is set, the connection isn't really closed. The pipeline code continues to consider the connection when trying to moving pending requests to the send queue. But after selecting the connection as the best fit, the binding fails with a 'We can reuse, but we want a new connection anyway' message. This should still work but a connection that might have performed pipelining correctly will be passed over. Requests already bound to the connection can ask for data and will be given stale data from responses already committed to the wire.
* Connection disconnect appears not to happen under client direction. Instead, it looks like Curl_disconnect() gets called when the server eventually closes the socket and libcurl reacts to the EOF. This can happen several requests after the timed out request.

The state and connection management code looks a bit... sensitive. But some changes that look in order:

* Single timeout detector or at least common code extracted into a function.
* Logic for responding to a timed out request is going to be dependent upon 'mstate'. Rather than collapse all states into COMPLETED on timeout, mapping 'mstate' to one of a few states (pending, post-pending, post-completion) and then handling the timeout in the case statement might be better.
* Once bits.close is set on the connection, Curl_readwrite() will continue to deliver data to any caller. Is this correct or should this become an error condition with some defensive code?
* Curl_posttransfer() seems to be missing on some paths to Curl_done(). Haven't done a thorough survey, just a casual observation.
* I'm not entirely sure the non-pipelined case is unaffected by this. The semi-zombie state of the connection might permit some part of a later request reusing the connection with stale data in the socket.

I'm generally able to detect this problem by examining the HTTP status of a completed request. If data is delivered to the wrong request, I will get a '0' status back because the HTTP status line was never read as a header, only as data. But if a timeout is detected at just the wrong time, an HTTP status line for a preceding request *could* be fed to a later one leading to data corruption.

I'm going to work on moving the signalling logic to the top of multi_runsingle() and preventing the consideration of connections for pipelining that have 'bits.close' set. That will probably be my local fix for 7.38.0.

---
** [bugs:#1420] Pipelining + client-side timeouts lead to loss of synchronization with response stream and incorrect data delivery**
**Status:** open
**Labels:** pipelining 
**Created:** Tue Sep 02, 2014 10:36 PM UTC by Monty Brandenberg
**Last Updated:** Sat Sep 06, 2014 06:34 AM UTC
**Owner:** nobody
I've been tracking a data corruption/missing http status problem and I think I have enough data for a useful bug report.
The problem centers around the handling of queued requests in a pipeline when preceding requests are failed in libcurl after committing to a request/response transaction.  In the table below, I show six GET requests pipelined on one connection.  'Time' is relative seconds since the creation of the connection.  The first three requests are processed normally.  The fourth request times out while processing the response body.  The fifth request times out waiting for the response header.  The sixth request is allowed to proceed but appears to be out-of-sync with the response stream.  I haven't dumped the data in verbose mode but I'd guess that the sixth request is consuming the remainder of the fourth request's response body in some demented fashion.
Request | Time | Event
------- | ---- | -----
0 | 0 | HEADEROUT issued
0 | 1 | First HEADERIN data
0 | 13 | Request completed, 200 status
1 | 0 | HEADEROUT issued
1 | 13 | First HEADERIN data
1 | 15 | Request completed, 200 status
2 | 0 | HEADEROUT issued
2 | 15 | First HEADERIN data
2 | 20 | Request completed, 200 status
3 | 0 | HEADEROUT issued
3 | 20 | First HEADERIN data
3 | 30 | Timeout declared (easy error 28)
3 | 30 | Request failed, easy 28 status
4 | 0 | HEADEROUT issued
4 | 30 | Timeout declared (easy error 28), no HEADERIN data seen
4 | 30 | Request failed, easy 28 status
5 | 13 | HEADEROUT issued
5 | 30 | First DATAIN received, NO HEADERIN data for this request
5 | 43 | Connection closed.  This may be in response to server socket close.
5 | 43 | Request appears to succeed but CURLINFO_RESPONSE_CODE returns 0.
The sixth request appears to succeed as far as multi_perform and multi_info_read are concerned.  But fetching CURLINFO_RESPONSE_CODE returns 0 for status.  
As a workaround, checking the status as above appears to be useful.  I'm not certain that's 100% reliable or that the connection will be broken immediately at that point.  This has the potential of going very wrong as far as data integrity goes.
If I understand this correctly, solutions would include:
* Canceling/failing a request that's active on a pipeline results in failure to all requests farther down the pipeline.
* Canceling/failing a request results in 'passivation' of the request.  It no longer interacts with the caller but remains active sinking data from the response until satisfied.
---
Sent from sourceforge.net because curl-tracker@cool.haxx.se is subscribed to https://sourceforge.net/p/curl/bugs/
To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/curl/admin/bugs/options.  Or, if this is a mailing list, you can unsubscribe from the mailing list.
Received on 2014-09-18

These mail archives are generated by hypermail.