cURL / Mailing Lists / curl-library / Single Mail

curl-library

Re: DNS Cache

From: brubelsabs <brubelsabs_at_googlemail.com>
Date: Thu, 07 Aug 2008 11:21:02 +0200

Daniel Stenberg wrote:
> On Wed, 6 Aug 2008, brubelsabs wrote:
>
>> with LibCurl I want to check many URLs hence I do that in waves
>> (consits only of different URLs) so no server will be overloaded. Each
>> wave has its own multi-handle with its easy handles. At the end of
>> each wave all handles are cleaned up.
>
> The easiest way is probably to just re-use the multi handle all the time
> instead of creating new ones all the time.

Hm thereby I encounter some strange problems while the reuse is as follows:

0. for each request create it's easy handle and set proper options
1. create multi handle
2. In each wave do: (until there are no requests anymore)
2.1 register at max. 1000 easy handles in the multi handle
2.2 multi handle perform
2.2 when a req. finishes: remove its easy handle from multi handle
3. for each request cleanup its easy handle
4. cleanup multi handle

the problem is, that already in the second wave more than 300 requests
are marked as "COULDN'T RESOLVE HOST NAME" (CURL_ERROR 6) despite they
are resolvable actually!

I tried to get to the source of the problem. First I thought that the
DNS server refuses its work (since it thinks it's under a DOS). And in
reallity, when I fill the multi handle with 7000 URLs, indeed at most
60% are then erroneously marked as not resolvable. But than I started to
wait 60 and up to 120 seconds between each transfer wave, which have had
no effect at all. (BTW: If I recreate each wave a brand new multi handle
using 1000 easy handles (dying after each wave), than all work fine)
Another interesting part was, if I rearanged the order of the requests
it sometimes worked, which finished me off. Certainly this is my fault
that this don't work but I have no idea how this could happen. Hence I
use now a seperate multi handle each time.

I even counted the DNS pakets going over my eth0:
1st wave: ~1172 DNS packets (for 1000 requests)
2nd wave: ~1458 DNS packets (altogether only 286 new pakets)
while I took care that no other app was using DNS too. Maybe (if my
Ubuntu has a DNS cache too or (which I assume much more) the multi
handle DNS cache worked here), it doesn't need to resolve hostnames
again, since most of the domain names in second wave are in first wave
too. If I rerun this experiment several times, the number of transmitted
packets are shrinking. (System DNS cache?, no idea)

>> Does it suffice to set curl_easy_setopt(handle,
>> CURLOPT_DNS_USE_GLOBAL_CACHE, 1);
>
> The global cache is evil and shouldn't be used by any new applications
> at all.

thanks, I should have read more carefully

>> or do I have to use additionally a share handle?
>
> If you insist on killing the multi handles all the time, then yes.

it seems, until I don't identify my bug, I have to..

>> What does the DNS cache do if it cannot resolve a domain name again
>> and again? Does it remember that this domain is not resolveable?
>
> No, it only stores successful resolves and never remembers any failures.

So the best way probably is to store those domain names in a black list,
and verify before each transmission that this request don't try to fetch
from a blacklisted domain name.

kind regards and thank you very much for your help
Mathias
Received on 2008-08-07