cURL / Mailing Lists / curl-library / Single Mail

curl-library

Bug in curl multi DONE->COMPLETED state transition?

From: Albert Lee <trisk_at_acm.jhu.edu>
Date: Sat, 02 Aug 2008 00:50:25 -0400

Hi,

I've been experiencing a recurring problem with curl sockets being
"lost" and all of the curl handles being stuck afterward. The handles
with sockets already open are in state CURLM_STATE_DONE while the rest
are in CURLM_STATE_INIT, and remain there indefinitely. The application
will have the sockets stuck in CLOSE_WAIT. Annoyingly, this can occur
after correctly processing connections from anywhere from a few minutes
to several hours.

I think I've finally pinpointed the cause of the issue.

Example curl_multi_dump() output:
Multi status: 64 handles, 64 alive
handle b18a36c, state DONE, 0 sockets
...
handle bec17b4, state DONE, 0 sockets
handle be0abec, state INIT, 0 sockets
...

Here's the multi handle:
(gdb) p *((struct Curl_multi *)(control->m_core->m_httpStack->m_handle))
$7 = {type = 764702, easy = {next = 0xb20ae1c, prev = 0xb37f5e4,
    easy_handle = 0x0, easy_conn = 0x0, state = CURLM_STATE_INIT,
    result = CURLE_OK, msg = 0x0, msg_num = 0, sockets = {0, 0, 0, 0, 0},
    numsocks = 0}, num_easy = 64, num_msgs = 0, num_alive = 64,
  socket_cb = 0x81508f0 <core::CurlSocket::receive_socket(void*, int, int, void*, void*)>, socket_userp = 0x83236e0, hostcache = 0x8324b14,
  timetree = 0xbf37cfc, sockhash = 0x8324b74, pipelining_enabled = false,
  connc = 0x832220c, maxconnects = 0, closure = 0x0,
  timer_cb = 0x814f960 <core::CurlStack::set_timeout(void*, long, void*)>,
  timer_userp = 0x83236e0, timer_lastcall = {tv_sec = 4442190, tv_usec = 238}}

And here's one of the easy handles:
(gdb) p *((struct Curl_multi *)(control->m_core->m_httpStack->m_handle))->easy.next
$9 = {next = 0xb587b24, prev = 0x8324a88, easy_handle = 0xbb3372c,
  easy_conn = 0xbade924, state = CURLM_STATE_DONE, result = CURLE_OK,
  msg = 0x0, msg_num = 0, sockets = {4824, 0, 0, 0, 0}, numsocks = 0}

Notice how numsocks is 0 but the sockets array has the correct socket
fd. The only place numsocks is set is in singlesocket() (usually
called from multi_socket()). If the handle is in one of several states
at the time including CURLM_STATE_DONE, singlesocket()'s
multi_getsock() call will not populate the socket array that the
handle's sockets are checked against. In this case, singlesocket()
deletes the sockets from the hash table and uses the CURL_POLL_REMOVE
callback to tell the application's poll/event system to stop watching
those sockets. This is the correct behaviour if the sockets are no
longer active.

(gdb) p ((struct Curl_multi *)(control->m_core->m_httpStack->m_handle))->easy.next->easy_conn->sockfd
$11 = 4824

(gdb) p ((struct Curl_multi *)(control->m_core->m_httpStack->m_handle))->easy.next->easy_conn->data->req.keepon
$13 = 0

(gdb) p ((struct Curl_multi *)(control->m_core->m_httpStack->m_handle))->easy.next->easy_conn->bits
$17 = {close = true, reuse = false, proxy = false, httpproxy = false,
  user_passwd = false, proxy_user_passwd = false, ipv6_ip = false,
  ipv6 = false, do_more = false, tcpconnect = false, protoconnstart = true,
  retry = false, tunnel_proxy = false, tunnel_connecting = false,
  authneg = false, rewindaftersend = false, ftp_use_epsv = true,
  ftp_use_eprt = true, netrc = false, done = false,
  stream_was_rewound = false, proxy_connect_closed = false, bound = false}

Notice bits.close is true but bits.done is false - this means
Curl_done() has not yet been called.

multi_socket() calls multi_runsingle() on the specified handle and if
the handle is in CURLM_STATE_PERFORM it will transition to
CURLM_STATE_DONE, but multi_runsingle() needs to be called again to
reach CURLM_STATE_COMPLETED. Reaching CURLM_STATE_COMPLETED will also
call Curl_done() and because we are not reusing sockets they will be
closed.

However, singlesocket() is immediately called after multi_runsingle().
If the handle state was just changed to CURLM_STATE_DONE,
singlesocket() discards the sockets as described above - and since
Curl_done() has not yet been called the cleanup for those sockets never
happens.

Making singlesocket() not modify the socket hash table if the handle
is in CURLM_STATE_DONE seems to fix the problem - the sockets can be
removed in a subsequent multi_socket() called at which point the handle
will be in CURLM_STATE_COMPLETED and the sockets will actually have
been closed.

Let me know if this seems sane.

-Albert

Received on 2008-08-02