Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTP pipelining and reusing easy handles causes mismatched request/response #1380

Closed
ahatala opened this issue Apr 3, 2017 · 7 comments
Closed
Labels

Comments

@ahatala
Copy link

ahatala commented Apr 3, 2017

I've narrowed down an issue in our use of libcurl to this repro case. Getting the repro to trigger seems to be timing-dependent so it might take a couple of runs to hit.

This is CURL verbose output for a failed test run: fail.txt

The user-visible problem is that the response data to a simple http request occasionally is associated to another http request in flight (to the same host), when multiple requests are pipelined over the same connection. The curl pipelining maintains a queue of easy handles for feeding the results to, under some conditions it seems that the queue gets sorted into incorrect order. A necessary condition for triggering the error seems to be that there are request both in sending and receiving states, the repro case accomplishes this by pushing in new requests in response to reading responses.

I'm using current github master (commit 55f4aba) but the issue triggers with stable versions as well (tried 7_39_0 and 7_53_1). The issue seems platform independent, FWIW I'm running on Win10 with configuration "LIB Debug - DLL Windows SSPI".

I can get the issue to repro against a number of different HTTP servers and I've verified that there's nothing suspicious in the data returned by the server on the wire.

I'm happy to provide any further data that may be useful for tracking down the root cause.

@ahatala
Copy link
Author

ahatala commented Apr 3, 2017

I spent some time root causing this and I believe I know what is happening.

The culprit seems to be USE_RECV_BEFORE_SEND_WORKAROUND. Received data may be buffered in connectdata.postponed and it might get silently discarded on connection reuse in the call to conn_reset_all_postponed_data() in url.c:6232

conn_reset_all_postponed_data(conn); /* reset unprocessed data */

This causes the next read to read the response of a more recent request, I suppose with larger payloads the buffered data might not be limited to full response boundaries either so garbage would be read in place of valid HTTP headers.

Since this requires USE_RECV_BEFORE_SEND_WORKAROUND my original statement of being platform independent is clearly wrong, the issue is Windows specific.

I don't understand what the motivation behind discarding buffered data upon connection reuse is, but just commenting out that offending line does seem to resolve the issue.

@jay jay added the HTTP label Apr 3, 2017
@jay
Copy link
Member

jay commented Apr 3, 2017

Yeah that looks strange, it probably needs a move/copy_all_postponed_data instead of freeing it.

I suppose with larger payloads the buffered data might not be limited to full response boundaries either so garbage would be read in place of valid HTTP headers.

Can you explain how that could happen?

@bagder
Copy link
Member

bagder commented Apr 3, 2017

That code section comes from commit 72d5e14, issue #657 and PR #668 ... Maybe we just need to add a condition to not do that if pipelining is used?

@jay
Copy link
Member

jay commented Apr 3, 2017

I think it would be better to just keep it regardless so it doesn't come back to bite some other way. I don't understand why it's being freed there.

jay added a commit to jay/curl that referenced this issue Apr 4, 2017
- Don't free the postponed struct on conn reuse; transfer it to the
  replacement conn instead.

- Encapsulate postponed struct initialization.

Bug: curl#1380
Reported-by: Antti Hätälä
@jay
Copy link
Member

jay commented Apr 4, 2017

@ahatala
Copy link
Author

ahatala commented Apr 4, 2017

@jay I can test later, but I don't think that patch is going to work. The original problem is that there is postponed data in "conn" that gets reset. There is no postponed data in "old_conn" and how could there be, since that connection was just created. Your change overwrites existing postponed data with no postponed data, leaking the buffer(s) in doing so. Right?

@jay jay closed this as completed in 36e604f Apr 4, 2017
@jay
Copy link
Member

jay commented Apr 4, 2017

Your change overwrites existing postponed data with no postponed data, leaking the buffer(s) in doing so. Right?

Yes I had it backwards, old_conn is a pointer to the temporary connection struct created after conn and conn holds the connection, so there's no reason to transfer. Your fix is fine then. Thanks for the great analysis and repro case.

@lock lock bot locked as resolved and limited conversation to collaborators May 6, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Development

No branches or pull requests

3 participants