cURL / Mailing Lists / curl-library / Single Mail

curl-library

RE: Crash during gethostbyname in libcurl

From: Sebastien Trottier <strot_at_paraxip.com>
Date: Mon, 19 Mar 2007 15:46:03 -0400

Thanks Daniel for your response, see below for some answers

> -----Original Message-----
> From: curl-library-bounces_at_cool.haxx.se
> [mailto:curl-library-bounces_at_cool.haxx.se]On Behalf Of Daniel Stenberg
> Sent: Monday, March 19, 2007 4:56 AM
> To: libcurl development
> Subject: RE: Crash during gethostbyname in libcurl
>
>
> On Sat, 17 Mar 2007, Cyril Picat wrote:
>
> > We have further investigated the crash related to name
> resolution and for us
> > it is because there is a threading problem with the use of a shared DNS
> > cache (data->hostcache) in the multi interface. Indeed, even if
> the multi
> > interface is only accessed by one thread, its DNS cache could
> be accessed at
> > the same time by several.
>
> By several... what?

Several resolver threads. In fact, I came up with problematic case 3, which
I was able to reproduce with a carefully inserted sleep() in the curl
library, and yields the exact crash reported at the start of this thread,
reported by many users of the BOINC middleware
(http://www.google.ca/search?q=boinc+Curl_llist_insert_next)

CASE 3. Win32 threaded resolver, multi interface, no share, 2 "simultaneous"
requests for URLs using the same hostname, which is not in the shared multi
hostname cache.

thread1 easy_handle1 request for http://hostname
thread1 "hostname" not in cache, launching threaded resolver
thread2 first resolver thread calling gethostbyname("hostname")
thread1 easy_handle2 request for http://hostname
thread1 "hostname" not in cache, launching threaded resolver
thread3 second resolver thread calling gethostbyname("hostname")
[...]
thread2 first resolver thread returns from gethostbyname("hostname") with
result
thread2 shared llist "l" of "hostname" key in cache is empty,
Curl_llist_insert_next() is called with second arg. e==null (position to
insert to, l->tail, is currently null)
- context switch -
thread3 second resolver thread returns from gethostbyname("hostname") with
result
thread3 shared llist "l" of "hostname" key in cache is still empty
thread3 "hostname" resol. result stored in llist for "hostname" key, now
llist "l" size is 1
[...]
- context switch -
thread 2 while inserting entry, size of "l" is checked: it is now 1, in
this case, the position to insert was passed as second argument (e), but the
value of e is null; it is dereferenced to access "next" field -> kaboom

>
> And just to reiterate my previous suggestion: if you change the
> resolve to use
> the standard synchronous one, you get no problems? You can even
> try the c-ares
> one if you want asynchronous that avoids the threaded one.
>

I'm sure the problems we reported don't apply to the synchronous or c-ares
resolvers, but we need
a) an asynchronous interface to be able to trigger multiple transfers at the
same time (ie without the calling thread being blocked by gethostbyname())
and
b) to resolve hostnames from the same sources as the win32 gethostbyname(),
ie including WINS name resolution, as we have a lot of customers that don't
maintain local DNS servers, and I understand c-ares deals with DNS only.

> > There are two cases where it can happen (imagine we have two
> easy handles in
> > a multi interface) :

[...]

>
> First, I must say that this is a Windows-specific problem and for
> *really*
> sensible code I would recommend using a non-threaded resolver. For multi
> interface purposes, and especially when using many easy handles,
> the threaded
> resolver will add quite a few threads and most probably won't be
> as effciciant
> as using c-ares in the first place.
>

I'm sure that together we can make the threaded resolver rock-solid. As for
optimization, there are ways to manage the amount of threads in use and all,
but that's not an issue for us in our usage of the library.

> It being windows-specific also means not only that I cannot test
> it nor do I
> know much about the windows magic stuff, but also that I don't
> want to pollute
> the generic code too much due to peculiarities in this part.
>

We're win32 magicians, so don't worry... ;)

> The generic approach for protecting a resource that is or might
> be accessed by
> more than one thread simultaneously is of course to introduce
> some kind of
> locking/mutex scheme that prevents the resource to be removed as
> long as it is
> in use by another thread.
>

There are already locks around the cache using a share is used: the problem
is the sharing of the multi cache between easy handles coupled with the
threaded resolver. I saw that there are changes in terms of cache sharing
in 7.16.1 compared to the 7.15.4 version we're using. Are easy handles still
sharing their cache via the multi by default?

In my mind, a share should be explicitly used, and mutexes provided, by a
client that wants to share its hostname resolution results between easy
handles when using the threaded resolver; no sharing should be performed
when using default compile flags (threaded resolver) and the multi interface
without a share on win32.

Sébastien Trottier
Paraxip Technologies
Received on 2007-03-19