cURL / Mailing Lists / curl-library / Single Mail

curl-library

Re: perf with libcurl over time

From: Daniel Stenberg <daniel_at_haxx.se>
Date: Thu, 14 Jan 2010 22:52:20 +0100 (CET)

On Thu, 14 Jan 2010, Nick Gerner wrote:

> I'm seeing some perf issues out of libcurl (7.19.3)

I think it would be interesting if you could upgrade to the latest and see how
things run. To me it makes little sense of chasing problems that have a chance
of not being present in the current version.

> curl_easy_setopt(curl[i]->curl, CURLOPT_DNS_CACHE_TIMEOUT, (long)0);
> curl_easy_setopt(curl[i]->curl, CURLOPT_FRESH_CONNECT, (long)1);
> curl_easy_setopt(curl[i]->curl, CURLOPT_FORBID_REUSE, (long)1);
>
> We've got a caching DNS server running locally, so we don't need any more
> DNS cache (our perf is much worse with any of the above reversed).

I'm sorry but that doesn't make sense at all. If it is indeed true, it would
rather indicate bugs in your testing or in libcurl more than anything else.

CURLOPT_DNS_CACHE_TIMEOUT limits how long DNS entries are cached within
libcurl and default is a mere 60 seconds. I very much doubt any external cache
will do the lookup (noticably) faster than libcurl does itself without asking
any outsider. It's not that I think libcurl's lookup is so amazingly
brilliant, but simply because it only knows the hosts it has resolved itself
and has a decent hash lookup.

CURLOPT_FRESH_CONNECT and CURLOPT_FORBID_REUSE make each subsequent connection
always have to do a fresh TCP connect, which in cases where you never re-use
the same host name won't make any difference but if you _do_ transfers against
the same host again will make a _significant_ speed difference that you cannot
make up for using any external means.

We've however tracked down a fixed a bug in the DNS cache recently (present
only in CVS and thus next release if I'm not mistaking) that makes the entries
in the cache get kept too long while connections against the hosts are still
in use. That shouldn't affect lookup speed though, it should only make DNS
entries stay in the cache longer than specified.

> If I'm not mistaken, this is telling me that about 70% of our CPU time is
> spent in libcurl using the DNS cache, which I tried to disable.

I really can't tell based on that little information, but if that us truly the
case then I can only say even more that it looks like a bug. It's been a while
since I did any proper benchmarking but I really cannot recall any numbers
that looked like the ones you present.

> * Am I right that hostip.c, that hostcache_timestamp_remove,
> Curl_hash_clean_with_criterium, Curl_hash_pick are all related to the DNS
> cache?

No it isn't, but for the curl_mult_perform() case I think that's the only use.
The same set of functions are only used for the socket hash, but that's only
used for the multi_socket API.

> * Is there some other option to disable the DNS cache more completely?

No.

> * Is there a compile time option to disable it so we don't even have to dive
> into the functions at run-time?

No.

-- 
  / daniel.haxx.se
-------------------------------------------------------------------
List admin: http://cool.haxx.se/list/listinfo/curl-library
Etiquette:  http://curl.haxx.se/mail/etiquette.html
Received on 2010-01-14