cURL cURL > Mailing List > Monthly Index > Single Mail

curl-tracker Archives

[curl:bugs] Re: #1420 Pipelining + client-side timeouts lead to loss of synchronization with response stream and incorrect data delivery

From: Carlo Wood <libcw_at_users.sf.net>
Date: Thu, 30 Oct 2014 18:36:42 +0000

Hi Daniel,

I'm still not satisfied with the result.

The objective is to have only a single connection (I know that
you can set a limit on the number of simultaneous requests in the
pipeline and when THAT limit is reached then of course a new
connection should be created; but for now lets assume that that
limit is not or not reached; in that case want just one connection
to be active-- the whole point of doing pipelining).

If a bundle has only one connection, then the normal reaction of
libcurl is to destroy the bundle when that connection is closed.

The reasons why a connection can be closed are:
1) One (or more) requests timeout.
2) The server closes the connection (connection reset by peer).
3) The connection cache runs full and the pipeline connection
   happens to be the oldest idle connection.

Perhaps there are more reasons that I didn't run into yet.

My first commits solved 1), and a commit that I didn't push yet
should fix 2). Then I ran into 3)... and I'm starting to think
that the attempt to keep the bundle alive each of the (special)
cases is not the right approach.

A better approach is look at it from the other side: when a
server connection is made and it is not yet known if the that
connection support pipelining or not then there are the following
possibilities:

1) Libcurl creates a new connection for every new request (this
   is the current behavior).
2) Libcurl creates a single connection, and queues all other
   requests until it received all headers for that first request
   and decided if the server supports pipelining or not.
   If it does, it uses the same connection for the the queued
   requests. If it does not, then it creates new connections
   for each request as under 1).
3) Libcurl assumes every connection can do pipelining and just
   sends all requests over the first connection, until it finds
   out that this server can't do pipelining - in which case the
   pipe 'breaks' and already existing code will cause all requests
   to be retried - now knowing that the server does not support
   pipelining.
4) Libcurl doesn't wait for headers but uses information from
   the user to decide if the server is supposed to support
   http pipelining; meaning that it picks strategy 2) or 3) based
   on a flag set on the easy handle by the user.

I think that option 4 is far the best experience for the user;
however it is the hardest to implement because it requires to
implement both 2) and 3) as well as add the code to add a new flag.
The easiest solution is probably to always pick 3), but that is
almost unacceptable 'dirty' for those connections that do not
support pipelining; My current approach (for the user application)
is to add only a single request and let that finish, so the application
itself can detect if pipelining is supported or not, and when not
to add that site to the blacklist. Doing that option 3) would not
lead to 'dirty' failures, but I can hardly say that that seems like
a good general solution; it would be far more favorable to add
a flag for easy handles. Hence, if the choice is to not add
a new flag to easy handles, so the user can specify preferences
on a per easy handle case (as opposed to being forced to use a
dedicated multi handle for pipelining) then option 2) seems the
only reasonable choice.

Note that just option 2) (instead of 4) isn't that bad: if pipelining
is supported and the first request would cause a stall of -say-
10 seconds, then also in that case all requests added behind that
in the pipeline would be stalled ("queued" on the server, instead
of client-side on the viewer). The main difference is that if
pipelining is NOT supported then you don't create a lot of parallel
connections right away and hence there is an extra delay more or
less equal to the response time of the first request (typical less
than a second) for the additional requests; which subsequently occurs
every time that libcurl destroys the bundle structure (ie, when ALL
connections are closed). I don't see that as something bad: if all
connections are closed then apparently we're done with downloading
the bulk - getting an extra sub-second delay the next burst seems
very acceptable to me. Of course we only need to do with when the
multi handle is flagged to support pipelining in the first place.
The benefit is far larger!
 

---
** [bugs:#1420] Pipelining + client-side timeouts lead to loss of synchronization with response stream and incorrect data delivery**
**Status:** open
**Labels:** pipelining 
**Created:** Tue Sep 02, 2014 10:36 PM UTC by Monty Brandenberg
**Last Updated:** Thu Oct 30, 2014 08:41 AM UTC
**Owner:** nobody
I've been tracking a data corruption/missing http status problem and I think I have enough data for a useful bug report.
The problem centers around the handling of queued requests in a pipeline when preceding requests are failed in libcurl after committing to a request/response transaction.  In the table below, I show six GET requests pipelined on one connection.  'Time' is relative seconds since the creation of the connection.  The first three requests are processed normally.  The fourth request times out while processing the response body.  The fifth request times out waiting for the response header.  The sixth request is allowed to proceed but appears to be out-of-sync with the response stream.  I haven't dumped the data in verbose mode but I'd guess that the sixth request is consuming the remainder of the fourth request's response body in some demented fashion.
Request | Time | Event
------- | ---- | -----
0 | 0 | HEADEROUT issued
0 | 1 | First HEADERIN data
0 | 13 | Request completed, 200 status
1 | 0 | HEADEROUT issued
1 | 13 | First HEADERIN data
1 | 15 | Request completed, 200 status
2 | 0 | HEADEROUT issued
2 | 15 | First HEADERIN data
2 | 20 | Request completed, 200 status
3 | 0 | HEADEROUT issued
3 | 20 | First HEADERIN data
3 | 30 | Timeout declared (easy error 28)
3 | 30 | Request failed, easy 28 status
4 | 0 | HEADEROUT issued
4 | 30 | Timeout declared (easy error 28), no HEADERIN data seen
4 | 30 | Request failed, easy 28 status
5 | 13 | HEADEROUT issued
5 | 30 | First DATAIN received, NO HEADERIN data for this request
5 | 43 | Connection closed.  This may be in response to server socket close.
5 | 43 | Request appears to succeed but CURLINFO_RESPONSE_CODE returns 0.
The sixth request appears to succeed as far as multi_perform and multi_info_read are concerned.  But fetching CURLINFO_RESPONSE_CODE returns 0 for status.  
As a workaround, checking the status as above appears to be useful.  I'm not certain that's 100% reliable or that the connection will be broken immediately at that point.  This has the potential of going very wrong as far as data integrity goes.
If I understand this correctly, solutions would include:
* Canceling/failing a request that's active on a pipeline results in failure to all requests farther down the pipeline.
* Canceling/failing a request results in 'passivation' of the request.  It no longer interacts with the caller but remains active sinking data from the response until satisfied.
---
Sent from sourceforge.net because curl-tracker@cool.haxx.se is subscribed to http://sourceforge.net/p/curl/bugs/
To unsubscribe from further messages, a project admin can change settings at http://sourceforge.net/p/curl/admin/bugs/options.  Or, if this is a mailing list, you can unsubscribe from the mailing list.
Received on 2014-10-30

These mail archives are generated by hypermail.