cURL / Mailing Lists / curl-library / Single Mail

curl-library

Re: Changes to connection timeout policy when multiple DNS records are present

From: Daniel Stenberg <daniel_at_haxx.se>
Date: Tue, 11 Nov 2014 11:09:40 +0100 (CET)

On Mon, 10 Nov 2014, Ryan Braud wrote:

> After doing some testing with libcurl lately I have noticed curl's connect
> behavior on timeouts has changed sometime between 7.22 and 7.39.

You're taking on three years of development there. More than 4200 commits.

[7.22]
> In this version, curl tries to connect to each IP and declares it as timed
> out it there was no response in 2 seconds.

Right, we used [timeout]/N seconds for each N hosts. That turned out a rather
silly algoritm since if a site suddenly added more addresses to its name it
would decrease the time curl would attempt to connect. Not really what users
expect.

Also, if you ask for a 10 seconds timeout, it isn't fair to assume you want to
try all possible IP addresses within that time but that you're basically
willing to try to get it done during 10 seconds. I'm not totally convinced the
current algorithm is perfect yet either since it takes the timeout period down
too much after a few attempts.

> * Rebuilt URL to: www.google.com:45/
> * Hostname was NOT found in DNS cache
> * Trying 74.125.239.51...
> * After 4998ms connect time, move on!

> Now curl seems to allocate half its time to the first connection, half of
> that time to the second, and so on.

Yes, that's the new algorithm. It is meant to favour the first addresses more
and not split the total time into just a fraction.

> CURLINFO_PRIMARY_IP now returns the empty string after these fetches.

Yes, but is that really wrong? What do you think the primary IP is on a failed
connect attempt to N different IP addresses? I see that it used to report the
last tried IP in the past, but that's not exactly how it is documented.

I can see how we can fix this back to the former data, but then we should also
update the docs accordingly.

> 7.39:

> * After 196ms connect time, move on!
> * connect to 74.125.239.51 port 45 failed: Connection timed out
> * Trying 74.125.239.50...
> * Connection timed out after 10000 milliseconds
> * Closing connection 0
>
> The strategy here seems mostly the same as in 7.36, except the values don't
> make as much sense. If you add up the times it spent on each individual
> connection, you end up well short of 10000 ms, even though the wallclock
> time of the program is very close to 10 seconds. CURLINFO_PRIMARY_IP is
> also missing here.

The times allowed seem to be roughly the same ones as used in 7.36. It splits
the maximum time for each IP tried. So 5 seconds for the first, 2.5 for the
next and so on which gives the fifth IP a mere 312 milliseconds (adjusted
somewhat since time is wasted every here and there so the last one actually
only got 196 ms).

> So I have a few questions:
> 1) When did the retry behavior change between 7.22 and 7.36? I don't see
> anything in the changelog relating to retries to timeouts on connections.

I couldn't find it right now.

> 2) Was it intentional to remove CURLINFO_PRIMARY_IP when a connection was
> not established? I was relying on this value before as long as the DNS
> resolution was successful and now it is mysteriously not there.

I can't recall that we removed it intentionally, but I also think that it was
kind of there unintentionally to begin with as I mentioned already.

> 3) What happened between 7.36 and 7.39 to make the timings "strange" in the
> current version?

I don't think they're that strange, it just looks like something eats up some
more time before the next address is used. Could of course be worth
investigating.

-- 
  / daniel.haxx.se
-------------------------------------------------------------------
List admin: http://cool.haxx.se/list/listinfo/curl-library
Etiquette:  http://curl.haxx.se/mail/etiquette.html
Received on 2014-11-11