cURL cURL > Mailing List > Monthly Index > Single Mail

curl-tracker Archives

[curl:bugs] #1420 Pipelining + client-side timeouts lead to loss of synchronization with response stream and incorrect data delivery

From: Carlo Wood <libcw_at_users.sf.net>
Date: Tue, 28 Oct 2014 18:05:11 +0000

I did extensive testing of http pipelining with respect to timeouts.
For this I wrote a testsuite that you can find here: https://github.com/CarloWood/curl_http_pipeline_tester

Running the testsuite (VERBOSE off) on vanilla libcurl
(5b12a4b4e404e5b5a75d3509cce21553edc9a974) I get the following
output:

2014-10-28 18:48:00.944097: : Request #0 added [now running: 1]
<html><body>Reply 1 on connection 1 for request #0</body></html>
2014-10-28 18:48:00.944979: : Request #0 finished [now running: 0]
2014-10-28 18:48:00.945011: : Request #1 added [now running: 1]
2014-10-28 18:48:00.945022: : Request #2 added [now running: 2]
2014-10-28 18:48:00.945032: : Request #3 added [now running: 3]
2014-10-28 18:48:00.945058: : Request #4 added [now running: 4]
<... 1.001327 seconds ...>
2014-10-28 18:48:01.946385: : Request #1 TIMED OUT! [now running: 3]
2014-10-28 18:48:01.946506: : Request #2 TIMED OUT! [now running: 2]
2014-10-28 18:48:01.946561: : Request #4 TIMED OUT! [now running: 1]
2014-10-28 18:48:01.946575: : Request #5 added [now running: 2]
2014-10-28 18:48:01.946585: : Request #6 added [now running: 3]
2014-10-28 18:48:01.946607: : Request #7 added [now running: 4]
<html><body>Reply 2 on connection 1 for request #1</body></html>
<... 0.099109 seconds ...>
2014-10-28 18:48:02.045716: : Request #3 finished [now running: 3]
2014-10-28 18:48:02.045758: : Request #8 added [now running: 4]
<html><body>Reply 1 on connection 2 for request #5</body></html>
<html><body>Reply 1 on connection 3 for request #6</body></html>
<html><body>Reply 1 on connection 4 for request #7</body></html>
<html><body>Reply 1 on connection 5 for request #8</body></html>
<... 0.100185 seconds ...>
2014-10-28 18:48:02.145943: : Request #5 finished [now running: 3]
2014-10-28 18:48:02.145979: : Request #6 finished [now running: 2]
2014-10-28 18:48:02.145991: : Request #7 finished [now running: 1]
2014-10-28 18:48:02.146004: : Request #8 finished [now running: 0]
2014-10-28 18:48:02.146017: : Request #9 added [now running: 1]
<html><body>Reply 2 on connection 2 for request #9</body></html>
<... 0.100499 seconds ...>
2014-10-28 18:48:02.246516: : Request #9 finished [now running: 0]

This is correct, except that in total 5 connections
have been used instead of 2 (not 1, we need to create a new connection
when the pipe breaks due to the timeout of #1).
Request #5, #6, #7 and #8 all cause a new connection because
curl forgot that the server can do pipelining. After receiving
the reply on connection 2 for #5 it knows that this server does
pipelining again, so request #9 is using connection 2 too.

After applying Monty's patch, I get this result:

2014-10-28 18:57:48.487717: : Request #0 added [now running: 1]
<html><body>Reply 1 on connection 1 for request #0</body></html>
2014-10-28 18:57:48.488689: : Request #0 finished [now running: 0]
2014-10-28 18:57:48.488720: : Request #1 added [now running: 1]
2014-10-28 18:57:48.488740: : Request #2 added [now running: 2]
2014-10-28 18:57:48.488750: : Request #3 added [now running: 3]
2014-10-28 18:57:48.488760: : Request #4 added [now running: 4]
<... 1.001437 seconds ...>
2014-10-28 18:57:49.490197: : Request #1 TIMED OUT! [now running: 3]
2014-10-28 18:57:49.490252: : Request #2 TIMED OUT! [now running: 2]
2014-10-28 18:57:49.490266: : Request #4 TIMED OUT! [now running: 1]
2014-10-28 18:57:49.490281: : Request #5 added [now running: 2]
2014-10-28 18:57:49.490292: : Request #6 added [now running: 3]
2014-10-28 18:57:49.490302: : Request #7 added [now running: 4]
<html><body>Reply 1 on connection 3 for request #3</body></html>
<html><body>Reply 1 on connection 5 for request #5</body></html>
<... 0.100334 seconds ...>
2014-10-28 18:57:49.590636: : Request #3 finished [now running: 3]
2014-10-28 18:57:49.590673: : Request #5 finished [now running: 2]
2014-10-28 18:57:49.590688: : Request #8 added [now running: 3]
2014-10-28 18:57:49.590700: : Request #9 added [now running: 4]
<html><body>Reply 1 on connection 6 for request #6</body></html>
2014-10-28 18:57:49.590815: : Request #6 finished [now running: 3]
<html><body>Reply 1 on connection 7 for request #7</body></html>
2014-10-28 18:57:49.590855: : Request #7 finished [now running: 2]
<html><body>Reply 2 on connection 3 for request #8</body></html>
<html><body>Reply 2 on connection 5 for request #9</body></html>
<... 0.100399 seconds ...>
2014-10-28 18:57:49.691254: : Request #8 finished [now running: 1]
2014-10-28 18:57:49.691289: : Request #9 finished [now running: 0]

As you see now we even made 7 connections in total. The reason for this
is that although the timed out #2 and #4 attempt a new connection
now, they time out immediately after and also close those connections as
if the pipe broke (again)...

We end up as before with 4 connections: 3, 5, 6 and 7.

---
** [bugs:#1420] Pipelining + client-side timeouts lead to loss of synchronization with response stream and incorrect data delivery**
**Status:** open
**Labels:** pipelining 
**Created:** Tue Sep 02, 2014 10:36 PM UTC by Monty Brandenberg
**Last Updated:** Mon Oct 20, 2014 03:36 PM UTC
**Owner:** nobody
I've been tracking a data corruption/missing http status problem and I think I have enough data for a useful bug report.
The problem centers around the handling of queued requests in a pipeline when preceding requests are failed in libcurl after committing to a request/response transaction.  In the table below, I show six GET requests pipelined on one connection.  'Time' is relative seconds since the creation of the connection.  The first three requests are processed normally.  The fourth request times out while processing the response body.  The fifth request times out waiting for the response header.  The sixth request is allowed to proceed but appears to be out-of-sync with the response stream.  I haven't dumped the data in verbose mode but I'd guess that the sixth request is consuming the remainder of the fourth request's response body in some demented fashion.
Request | Time | Event
------- | ---- | -----
0 | 0 | HEADEROUT issued
0 | 1 | First HEADERIN data
0 | 13 | Request completed, 200 status
1 | 0 | HEADEROUT issued
1 | 13 | First HEADERIN data
1 | 15 | Request completed, 200 status
2 | 0 | HEADEROUT issued
2 | 15 | First HEADERIN data
2 | 20 | Request completed, 200 status
3 | 0 | HEADEROUT issued
3 | 20 | First HEADERIN data
3 | 30 | Timeout declared (easy error 28)
3 | 30 | Request failed, easy 28 status
4 | 0 | HEADEROUT issued
4 | 30 | Timeout declared (easy error 28), no HEADERIN data seen
4 | 30 | Request failed, easy 28 status
5 | 13 | HEADEROUT issued
5 | 30 | First DATAIN received, NO HEADERIN data for this request
5 | 43 | Connection closed.  This may be in response to server socket close.
5 | 43 | Request appears to succeed but CURLINFO_RESPONSE_CODE returns 0.
The sixth request appears to succeed as far as multi_perform and multi_info_read are concerned.  But fetching CURLINFO_RESPONSE_CODE returns 0 for status.  
As a workaround, checking the status as above appears to be useful.  I'm not certain that's 100% reliable or that the connection will be broken immediately at that point.  This has the potential of going very wrong as far as data integrity goes.
If I understand this correctly, solutions would include:
* Canceling/failing a request that's active on a pipeline results in failure to all requests farther down the pipeline.
* Canceling/failing a request results in 'passivation' of the request.  It no longer interacts with the caller but remains active sinking data from the response until satisfied.
---
Sent from sourceforge.net because curl-tracker@cool.haxx.se is subscribed to https://sourceforge.net/p/curl/bugs/
To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/curl/admin/bugs/options.  Or, if this is a mailing list, you can unsubscribe from the mailing list.
Received on 2014-10-28

These mail archives are generated by hypermail.