cURL / Mailing Lists / curl-library / Single Mail

curl-library

perf with libcurl over time

From: Nick Gerner <nick_at_seomoz.org>
Date: Thu, 14 Jan 2010 12:56:18 -0800

I'm seeing some perf issues out of libcurl (7.19.3) over time relating to
the hostcache, even if I've got it disabled.

I'm using curl multi with a ton of easy handles (900 which seems to optimize
throughput for us)

here's some relevant config:
curl_easy_setopt(curl[i]->curl, CURLOPT_DNS_CACHE_TIMEOUT, (long)0);
curl_easy_setopt(curl[i]->curl, CURLOPT_FRESH_CONNECT, (long)1);
curl_easy_setopt(curl[i]->curl, CURLOPT_FORBID_REUSE, (long)1);

We've got a caching DNS server running locally, so we don't need any more
DNS cache (our perf is much worse with any of the above reversed).

and here's some interesting oprofile output toward the end of our app (a web
crawler), after we've pulled ~8million pages from more than 50k domains:
samples % linenr info image name app
name symbol name
1086204 44.1576 hostip.c:0 libcurl.so.4.1.1
libcurl.so.4.1.1 hostcache_timestamp_remove
664021 26.9946 (no location information) libcurl.so.4.1.1
libcurl.so.4.1.1 Curl_hash_clean_with_criterium
207088 8.4188 (no location information) libcurl.so.4.1.1
libcurl.so.4.1.1 Curl_hash_pick
92158 3.7465 charset.h:204 retrieve
retrieve CleanNulls(char*, char*, unsigned long, unsigned
long, StatisticSet*)

For comparison I included the relative time spent in "CleanNulls", our own
function does a pass over each response body (via a for loop) rewriting any
zeros to the utf-8 sequence for "?".

If I'm not mistaken, this is telling me that about 70% of our CPU time is
spent in libcurl using the DNS cache, which I tried to disable.

 * Am I right that hostip.c,
that hostcache_timestamp_remove, Curl_hash_clean_with_criterium, Curl_hash_pick
are all related to the DNS cache?
 * Is there some other option to disable the DNS cache more completely?
 * Is there a compile time option to disable it so we don't even have to
dive into the functions at run-time?

--Nick

-------------------------------------------------------------------
List admin: http://cool.haxx.se/list/listinfo/curl-library
Etiquette: http://curl.haxx.se/mail/etiquette.html
Received on 2010-01-14