Menu

#1465 Easy handle might hang in CURLM_STATE_CONNECT_PEND state after error reply

closed-fixed
None
5
2015-03-29
2014-12-29
Jiri Dvorak
No

Easy handle might hang in CURLM_STATE_CONNECT_PEND state after error reply.

Version: libcurl 7.39.0 however the problem did appear with 7.37.0 as well.

It seems that there is problem when using CURLMOPT_MAX_HOST_CONNECTIONS or CURLMOPT_MAX_TOTAL_CONNECTIONS options which can cause the handles which did transition to CURLM_STATE_CONNECT_PEND because of the limit to fail to be waken if the active requests fails with error. The appearance of the problem is fairly random.

As part of test of my program I have CURLMOPT_MAX_HOST_CONNECTIONS set to 1, CURLMOPT_MAX_TOTAL_CONNECTIONS to 4 and pipelining left on the default disabled state. I create 5 simultaneous easy-handles to do HTTP get of a non-existent URL (HTTP 404) using single multi-handle from Nginx running on localhost. Most of the time the operations complete without problem however sometimes, depending on timing and system load, some from them complete and remaining ones end stuck in CURLM_STATE_CONNECT_PEND state.

It looks like the following is happening when the hang happens:
1. Handle X starts download.
2. Remaining handles detect that connection is not available and enter CONNECT_PEND state
3. Sometimes latter the handle X transitions DO => DO_DONE and and switches all waiting handles to CONNECT.
4. Handle X switches to WAITPERFORM and then to PERFORM.
5. Remaining handles try to obtain the connection and see that it is still not available so switch back to CONNECT_PEND.
6. Handle X enters to the "The transfer phase returned error, we mark the connection to get...." (multi.c:1552) path and closes the connection.
7. Handle X enters the error handling path within "if(data->mstate < CURLM_STATE_COMPLETED) {" (multi.c:1699), data->result is CURLE_HTTP_RETURNED_ERROR and as the data->easy_conn is already a NULL, it simply enters CURLM_STATE_COMPLETED without waking anyone.

Discussion

  • Daniel Stenberg

    Daniel Stenberg - 2015-03-20
    • assigned_to: Daniel Stenberg
     
  • Daniel Stenberg

    Daniel Stenberg - 2015-03-20

    What kind of "waking" do you mean? Doesn't it still return the final message fine that the transfer completed?

     
  • Jiri Dvorak

    Jiri Dvorak - 2015-03-20

    By waking I meant switching them from CONNECT_PEND to the CONNECT state. The handle which I marked as X completes as expected however all the other handles remain forever in the CONNECT_PEND state.

     
  • Daniel Stenberg

    Daniel Stenberg - 2015-03-21

    Ah yes, I somehow overlooked some details there. So what would you say about a patch like the one I attach here?

     
  • Jiri Dvorak

    Jiri Dvorak - 2015-03-22

    My knowledge of the inner working of the library is very limited however the patch looks fine. Also using a slower server I was able to get more reliable test for the original problem and it did fix it.

    When checking the code I found another situation which might trigger similar problem. Between the steps (5) and (6) the control flow might leave curl_multi_perform() while waiting for reply from the server. It is possible that in this moment the application might call curl_multi_remove_handle() and remove the handle X. For example in reaction to user abort request. In that case the same problem would happen as there is no call to Curl_multi_process_pending_handles() during the removal.

     
  • Daniel Stenberg

    Daniel Stenberg - 2015-03-26

    Thanks, I pushed that first patch now as commit 318ad8d7. Will give your follow-up comment a good thought too...

     
  • Daniel Stenberg

    Daniel Stenberg - 2015-03-29
    • status: open --> closed-fixed
     
  • Daniel Stenberg

    Daniel Stenberg - 2015-03-29

    commit 787c2ae91b1 is now merged and addresses the follow-up concern. I think we're good now so I'm closing this.

    Thanks a lot for the report and all the details!