cURL
Haxx ad
libcurl

curl's project page on SourceForge.net

Sponsors:
Haxx

cURL > Mailing List > Monthly Index > Single Mail

curl-tracker mailing list Archives

[ curl-Bugs-1431750 ] crash when dns slow

From: SourceForge.net <noreply_at_sourceforge.net>
Date: Fri, 17 Feb 2006 06:14:56 -0800

Bugs item #1431750, was opened at 2006-02-14 20:59
Message generated for change (Comment added) made by bagder
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=100976&aid=1431750&group_id=976

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: libcurl
Group: crash
Status: Open
Resolution: None
Priority: 8
Submitted By: xbx (xbx___)
Assigned to: Daniel Stenberg (bagder)
Summary: crash when dns slow

Initial Comment:
I have a strange setup with the dns taking a long time
to answer.
I run my app that make periodic http(s) requests.
Everything goes fine, but after a while (that is, after
maybe 10 successfull requests to the same host) I have
this crash:

#0 0xb7fade01 in Curl_connecthost (conn=0x8a73094,
remotehost=0x0,
    sockconn=0x8a7312c, addr=0xbfaefdc4,
connected=0xbfaefe07 "")
    at connect.c:789
#1 0xb7f9ed04 in ConnectPlease (conn=0x8a73094,
hostaddr=0x0,
    connected=0xbfaefe07 "") at url.c:2017
#2 0xb7fa19f3 in SetupConnection (conn=0x8a73094,
hostaddr=0x0,
    protocol_done=0xbfaefe88 "") at url.c:3670
#3 0xb7fa1b08 in Curl_connect (data=0x8ab48bc,
in_connect=0x8b02798,
    asyncp=0xbfaefe89 "", protocol_done=0xbfaefe88 "")
at url.c:3729
#4 0xb7faf002 in curl_multi_perform
(multi_handle=0x8aeac1c,
    running_handles=0xbfaeff34) at multi.c:422
[....]

(notice the hostaddr=0x0 part)

I'm using libcurl 7.15.1, with this configure options:
./configure --prefix=/usr/ --disable-gopher
--disable-ldap --disable-dict --disable-telnet
--enable-nonblocking --enable-debug
--disable-crypto-auth --enable-ares --disable-ipv6

I could further investigate, but I'd have to know which
way to look..
More precisely,
  code = CreateConnection(data, in_connect, &dns, asyncp);
returns:
    code == OK
    dns == null
    asyncp == false
Is the problem that 'code' shouldn't be 'ok', or that
the program shouldn't crash when dns == Null?

----------------------------------------------------------------------

>Comment By: Daniel Stenberg (bagder)
Date: 2006-02-17 15:14

Message:
Logged In: YES
user_id=1110

Yes, it looks like the first transfer's ares resolve remains
"outstanding" when it stops, and the callback for it then
gets called when a subsequent transfer is running.

The only place in the code that stops outstanding ares
resolves is when a DNS timeout is handled, so I guess the
first transfer doesn't bail out because of this. Can you
figure out why/how it completes and why there seems to be a
outstanding resolve left pending?

I don't think this is a c-ares bug. I believe it is a
libcurl bug that is related to how libcurl uses c-ares.

Also, the ares_cancel() function get gets called in the DNS
timeout case (and only then) cancels _all_ outstanding ares
resolves which just can't be the right thing to do.

----------------------------------------------------------------------

Comment By: xbx (xbx___)
Date: 2006-02-17 10:42

Message:
Logged In: YES
user_id=1452265

What I imagine is the following scenario:
- a request is made, a connection (1) is created (but with
no sockets yet), and a dns request is made via ares.
- some time after that, an other similar request is made.
curl reuses (1) because it seems to fit. But at that point
it the socket is invalid, so it either
    crashes (the initial bug report),
    or, with the "fix" in SocketIsDead(), it notices it,
closes the connection (1) and starts a new one (2).
- after some more time, the first dns querry completes
successfully, the callback is called, with the connection
(1) as parameter which has been destructed -> crash with
callstack from message 2006-02-16 15:38 .

So, according to me the bug isn't in ares, but in curl: when
it closes connections because SocketIsDead() (which, I think
is a network failure recovery code; It shouldn't happen in
normal cases), it should cancel ares request as well (so
that the callback with destructed pointer isn't used).
And it shouldn't even be in the case where it think it can
reuse the connection, as it is still being worked on by a
previous request. -> that's what I tired to fix (with
success) by saying "connections with uninitialised sockets
cannot be reused". but I don't know if it's the correct fix.

(yes I'm doing simultatenous transfers with mutli.)

----------------------------------------------------------------------

Comment By: Daniel Stenberg (bagder)
Date: 2006-02-17 09:06

Message:
Logged In: YES
user_id=1110

The most interesting in that back trace is this:

#2 0xb7ed72c9 in addrinfo_callback (arg=0x8ae1a2c,
status=0, addr=0x8ae1fd8)

Status zero means OK, so that was a successful resolve
callback! The question is why does this callback get sent if
the connection is no longer OK.

Are you doing several simultanoues transfers with the multi
interface? I can see a problem in the code involving c-ares,
DNS timeouts and simultaneous transfers...

----------------------------------------------------------------------

Comment By: xbx (xbx___)
Date: 2006-02-16 15:38

Message:
Logged In: YES
user_id=1452265

yes that's what I said, it does crash in
addrinfo_callback(). With this callstack:

#0 0xb7ed35a5 in Curl_hash_add (h=0x21, key=0x8a824cc
"\001:-1209571840",
    key_len=14, p=0x8b159cc) at hash.c:153
#1 0xb7eb0443 in Curl_cache_addr (data=0x8ae1c68,
addr=0x8b1580c,
    hostname=0xb7e765d8 "\001", port=-1209571840) at
hostip.c:328
#2 0xb7ed72c9 in addrinfo_callback (arg=0x8ae1a2c,
status=0, addr=0x8ae1fd8)
    at hostasyn.c:129
#3 0xb7ed7342 in Curl_addrinfo4_callback (arg=0x8ae1a2c,
status=0,
    hostent=0x8ae1fd8) at hostasyn.c:161
#4 0xb7edaa73 in end_hquery (hquery=0x8b2c708,
status=Variable "status" is not available.
)
    at ares_gethostbyname.c:187
#5 0xb7edaf27 in host_callback (arg=0x8b2c708, status=0,
    abuf=0xbff164f7 ":v\201\200", alen=128) at
ares_gethostbyname.c:169
#6 0xb7ede63c in qcallback (arg=0x8a6b630, status=Variable
"status" is not available.
) at ares_query.c:111
#7 0xb7ed9cad in end_query (channel=0x8af0778,
query=0x8b22c20, status=Variable "status" is not available.
)
    at ares_process.c:693
#8 0xb7eda1bd in process_answer (channel=0x8af0778,
    abuf=0xbff164f7 ":v\201\200", alen=128, whichserver=1,
tcp=0,
    now=1140100006) at ares_process.c:373
#9 0xb7eda73f in ares_process (channel=0x8af0778,
read_fds=0xbff167b8,
    write_fds=0xbff16738) at ares_process.c:287
#10 0xb7ed6d0d in Curl_is_resolved (conn=0x8b27e6c,
dns=0xbff16890)
    at hostares.c:146
#11 0xb7ed40bd in curl_multi_perform (multi_handle=0x8ad8f4c,
    running_handles=0xbff16944) at multi.c:449
[...]

in order to reproduce it, I don't know if it's easy. (I'm
avoiding to change anything/reboot my computer because I'm
afraid to lose the situation..)
but I think the dns querries are very slow because the first
entry is 127.0.0.1 which is probaly running but not
configured correctly, and then a real dns ip. So I guess it
first try to ask 127.0.0.1 and after a timeout it gets the
answer from the other dns...

----------------------------------------------------------------------

Comment By: Daniel Stenberg (bagder)
Date: 2006-02-16 13:03

Message:
Logged In: YES
user_id=1110

The fact that you made it work by simply not re-using the
connection (leaving the struct around) but not when you
disconnect and kill the struct properly, certainly indicates
that "something" wrongly is trying to use that memory at a
later time.

I suspect it might be the c-ares callback that gets called
when the resolve worked or timed out, because it was somehow
not properly cancelled in a previous request.

Any chance you can temporarily remove your fix and add a
debug output or a break-point in
hostasync.c:addrinfo_callback() and check if this or a
related issue is perhaps the case?

Can you think of a way I could do repeat this problem in my end?

----------------------------------------------------------------------

Comment By: Daniel Stenberg (bagder)
Date: 2006-02-16 00:38

Message:
Logged In: YES
user_id=1110

Ah, I hadn't noticed you use c-ares!

I'll check your details closer tomorrow. Thanks for all the
hard work!

----------------------------------------------------------------------

Comment By: xbx (xbx___)
Date: 2006-02-15 22:00

Message:
Logged In: YES
user_id=1452265

I tried something for:
> - the connection shouldn't be reused
> as it is being initialised rather than finished...

in ConnectionExists(), I added a check so that connections
with (check->sock[FIRSTSOCKET] == CURL_SOCKET_BAD) wouldn't
be reused.
And the program doesn't seem to crash any longer. (can't be
100% sure, because it is a random bug)

But I don't know if this condition is supposed to mean
"being initialised connection" ....

----------------------------------------------------------------------

Comment By: xbx (xbx___)
Date: 2006-02-15 19:17

Message:
Logged In: YES
user_id=1452265

I think I understood what is going on!

I changed the SocketIsDead(), and it did remove the crash.
However the app crashed a bit later, in an ares callback,
because the connection data had been deleted and overwritten.

So I guess that what is happening is that a connection is
being created and waiting for a dns querry before being
fully initialized.
And before the dns querry is over, the connection is reused.
Here the fact that it is "dead" is discovered and the
connection is killed. And later when the callback from ares
is called, the connection has been destructed...

So I guess this is in fact 2 bugs:
- when a connection is destructed, ares calls should be aborted.
- the connection shouldn't be reused as it is being
initialised rather than finished...
 
but
1) I may be wrong
2) I have no idea how to correct any of these...

----------------------------------------------------------------------

Comment By: Daniel Stenberg (bagder)
Date: 2006-02-15 13:24

Message:
Logged In: YES
user_id=1110

Hm. Yes a first test for you would be to modify the
SocketIsDead() code to check for that situation.

I'm a bit reluctant to simply add that check and be happy,
since I consider it being a problem that the
sock[FIRSTSOCKET] value ended up like this in the first place...

----------------------------------------------------------------------

Comment By: xbx (xbx___)
Date: 2006-02-15 13:17

Message:
Logged In: YES
user_id=1452265

The CURLOPT_VERSBOSE doesn't show anything strange, (expect
that the reuse case isn't triggered as often as I thought..).

however, in ConnectionExists() there is something strange:
      bool dead = SocketIsDead(check->sock[FIRSTSOCKET]);
returns False, although check->sock[FIRSTSOCKET]==-1
So I reckon that's why the (CURL_SOCKET_BAD ==
conn->sock[FIRST]) triggers in SetupConnection().

So, maybe SocketIsDead() should check for that case?

----------------------------------------------------------------------

Comment By: Daniel Stenberg (bagder)
Date: 2006-02-15 11:34

Message:
Logged In: YES
user_id=1110

Ah, right. That's NULL if the connection is re-used. So it
looks like it enters the SetupConnection() function and then
some 30 lines into the function the condition

  if(CURL_SOCKET_BAD == conn->sock[FIRST])

equals true. And then it wrongly tries to re-connect using
the host address that is now NULL.

Can you figure out how it comes it has marked the socket as
being bad, while still believing the connection is being
re-used fine?

What does a CURLOPT_VERSBOSE output reveal? Does the
connection just prior to the failing one look fine and claim
to be leaving the connection alive? Does the failing one
claim to retry the connection or just re-use it normally
like other re-used connections?

----------------------------------------------------------------------

Comment By: xbx (xbx___)
Date: 2006-02-15 11:10

Message:
Logged In: YES
user_id=1452265

looks like it's because, url.c::3530
  if(conn->bits.reuse) {
is true.

----------------------------------------------------------------------

Comment By: Daniel Stenberg (bagder)
Date: 2006-02-15 10:42

Message:
Logged In: YES
user_id=1110

Right, can you see if you can figure out why
CreateConnection() returns CURLE_OK while still not
returning any address and setting asnyncp to FALSE?

----------------------------------------------------------------------

You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=100976&aid=1431750&group_id=976
Received on 2006-02-17

These mail archives are generated by hypermail.

donate! Page updated November 12, 2010.
web site info

File upload with ASP.NET