Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DNS_CACHE_TIMEOUT == 0 but address might still be used for second transfer! #2169

Closed
p opened this issue Dec 9, 2017 · 12 comments
Closed

DNS_CACHE_TIMEOUT == 0 but address might still be used for second transfer! #2169

p opened this issue Dec 9, 2017 · 12 comments

Comments

@p
Copy link
Contributor

p commented Dec 9, 2017

Consider this:

>>> c=pycurl.Curl()
# disable dns cache
>>> c.setopt(c.DNS_CACHE_TIMEOUT,0)
>>> c.setopt(c.VERBOSE,1)
# set ip to .102
>>> c.setopt(c.RESOLVE,['ya.ru:80:127.0.0.102'])
>>> c.setopt(c.URL,'http://ya.ru')
>>> c.perform()
* Added ya.ru:80:127.0.0.102 to DNS cache
# ^ ok
* Rebuilt URL to: http://ya.ru/
* Hostname ya.ru was found in DNS cache
*   Trying 127.0.0.102...
# ^ ok
* TCP_NODELAY set
* connect to 127.0.0.102 port 80 failed: Connection refused
* Failed to connect to ya.ru port 80: Connection refused
* Closing connection 0
Traceback (most recent call last):
  File "", line 1, in 
pycurl.error: (7, 'Failed to connect to ya.ru port 80: Connection refused')
# clear fake dns
>>> c.setopt(c.RESOLVE,['-ya.ru:80'])
# set fake dns to another ip, .103
>>> c.setopt(c.RESOLVE,['ya.ru:80:127.0.0.103'])
>>> c.perform()
* Added ya.ru:80:127.0.0.103 to DNS cache
# ^ ok
* Rebuilt URL to: http://ya.ru/
* Hostname ya.ru was found in DNS cache
*   Trying 127.0.0.102...
# ^ ??? uses old ip
* TCP_NODELAY set
* connect to 127.0.0.102 port 80 failed: Connection refused
* Failed to connect to ya.ru port 80: Connection refused
* Closing connection 1
Traceback (most recent call last):
  File "", line 1, in 
pycurl.error: (7, 'Failed to connect to ya.ru port 80: Connection refused')
>>> 

If I remove fake dns entry a request is made to the real ip:

# continuing with the same curl handle
>>> c.setopt(c.RESOLVE,['-ya.ru:80'])
>>> c.perform()
* Rebuilt URL to: http://ya.ru/
*   Trying 87.250.250.242...
* TCP_NODELAY set
* Connected to ya.ru (87.250.250.242) port 80 (#2)
> GET / HTTP/1.1
Host: ya.ru
User-Agent: PycURL/7.43.0.1 libcurl/7.56.1 OpenSSL/1.0.2l zlib/1.2.8 libidn2/2.0.2 libpsl/0.18.0 (+libidn2/2.0.2) libssh2/1.8.0 nghttp2/1.26.0 librtmp/2.3
Accept: */*

< HTTP/1.1 302 Found
< Date: Sat, 09 Dec 2017 05:52:48 GMT
< Cache-Control: no-cache,no-store,max-age=0,must-revalidate
< Location: https://ya.ru/
< Expires: Sat, 09 Dec 2017 05:52:49 GMT
< Last-Modified: Sat, 09 Dec 2017 05:52:49 GMT
< P3P: policyref="/w3c/p3p.xml", CP="NON DSP ADM DEV PSD IVDo OUR IND STP PHY PRE NAV UNI"
< Set-Cookie: yandexuid=4809395281512798769; Expires=Tue, 07-Dec-2027 05:52:48 GMT; Domain=.ya.ru; Path=/
< X-XSS-Protection: 1; mode=block
< X-Content-Type-Options: nosniff
< Content-Length: 0
< 
* Connection #2 to host ya.ru left intact
>>> 

But now if I set another fake ip, curl makes a connection to the real ip:

# still using the same handle
>>> c.setopt(c.RESOLVE,['ya.ru:80:127.0.0.104'])
>>> c.perform()
* Added ya.ru:80:127.0.0.104 to DNS cache
# ^ ok
* Rebuilt URL to: http://ya.ru/
* Found bundle for host ya.ru: 0x559567269060 [can pipeline]
* Re-using existing connection! (#2) with host ya.ru
* Connected to ya.ru (87.250.250.242) port 80 (#2)
> GET / HTTP/1.1
Host: ya.ru
User-Agent: PycURL/7.43.0.1 libcurl/7.56.1 OpenSSL/1.0.2l zlib/1.2.8 libidn2/2.0.2 libpsl/0.18.0 (+libidn2/2.0.2) libssh2/1.8.0 nghttp2/1.26.0 librtmp/2.3
Accept: */*

< HTTP/1.1 302 Found
< Date: Sat, 09 Dec 2017 05:53:51 GMT
< Cache-Control: no-cache,no-store,max-age=0,must-revalidate
< Location: https://ya.ru/
< Expires: Sat, 09 Dec 2017 05:53:52 GMT
< Last-Modified: Sat, 09 Dec 2017 05:53:52 GMT
< P3P: policyref="/w3c/p3p.xml", CP="NON DSP ADM DEV PSD IVDo OUR IND STP PHY PRE NAV UNI"
< Set-Cookie: yandexuid=5327304491512798831; Expires=Tue, 07-Dec-2027 05:53:51 GMT; Domain=.ya.ru; Path=/
< X-XSS-Protection: 1; mode=block
< X-Content-Type-Options: nosniff
< Content-Length: 0
< 
* Connection #2 to host ya.ru left intact
>>> 

OK, maybe I need to also set FORBID_REUSE. Let's try that.

# still on the same curl handle
>>> c.setopt(c.FORBID_REUSE,1)
>>> c.perform()
* Rebuilt URL to: http://ya.ru/
* Found bundle for host ya.ru: 0x559567269060 [can pipeline]
* Re-using existing connection! (#2) with host ya.ru
# ^ ??? Didn't I forbid reuse?
* Connected to ya.ru (87.250.250.242) port 80 (#2)
> GET / HTTP/1.1
Host: ya.ru
User-Agent: PycURL/7.43.0.1 libcurl/7.56.1 OpenSSL/1.0.2l zlib/1.2.8 libidn2/2.0.2 libpsl/0.18.0 (+libidn2/2.0.2) libssh2/1.8.0 nghttp2/1.26.0 librtmp/2.3
Accept: */*

< HTTP/1.1 302 Found
< Date: Sat, 09 Dec 2017 05:55:05 GMT
< Cache-Control: no-cache,no-store,max-age=0,must-revalidate
< Location: https://ya.ru/
< Expires: Sat, 09 Dec 2017 05:55:05 GMT
< Last-Modified: Sat, 09 Dec 2017 05:55:05 GMT
< P3P: policyref="/w3c/p3p.xml", CP="NON DSP ADM DEV PSD IVDo OUR IND STP PHY PRE NAV UNI"
< Set-Cookie: yandexuid=3103926351512798905; Expires=Tue, 07-Dec-2027 05:55:05 GMT; Domain=.ya.ru; Path=/
< X-XSS-Protection: 1; mode=block
< X-Content-Type-Options: nosniff
< Content-Length: 0
< 
* Closing connection 2
>>> 

Let's try this all again:

>>> c=pycurl.Curl()
>>> c.setopt(c.VERBOSE,1)
>>> c.setopt(c.FORBID_REUSE,1)
>>> c.setopt(c.DNS_CACHE_TIMEOUT,0)
>>> c.setopt(c.RESOLVE,['ya.ru:80:127.0.0.104'])
>>> c.setopt(c.URL,'http://ya.ru')
>>> c.perform()
* Added ya.ru:80:127.0.0.104 to DNS cache
* Rebuilt URL to: http://ya.ru/
* Hostname ya.ru was found in DNS cache
*   Trying 127.0.0.104...
* TCP_NODELAY set
* connect to 127.0.0.104 port 80 failed: Connection refused
* Failed to connect to ya.ru port 80: Connection refused
* Closing connection 0
Traceback (most recent call last):
  File "", line 1, in 
pycurl.error: (7, 'Failed to connect to ya.ru port 80: Connection refused')
>>> c.setopt(c.RESOLVE,['-ya.ru:80'])
>>> c.setopt(c.RESOLVE,['ya.ru:80:127.0.0.105'])
>>> c.perform()
* Added ya.ru:80:127.0.0.105 to DNS cache
* Rebuilt URL to: http://ya.ru/
* Hostname ya.ru was found in DNS cache
*   Trying 127.0.0.104...
#  ^ shouldn't it try to connect to .105?
* TCP_NODELAY set
* connect to 127.0.0.104 port 80 failed: Connection refused
* Failed to connect to ya.ru port 80: Connection refused
* Closing connection 1
Traceback (most recent call last):
  File "", line 1, in 
pycurl.error: (7, 'Failed to connect to ya.ru port 80: Connection refused')
>>> 

I think something in the DNS cache/connection reuse does not interact properly with RESOLVE option.

This came out of me investigating pycurl/pycurl#443 which, strictly speaking, appears to be a user error as the user did not disable DNS cache and connection reuse.

@bagder
Copy link
Member

bagder commented Dec 9, 2017

To reduce some of the confusion and complexity:

  1. FORBID_REUSE marks the current connection to not get re-used after this one. If you want to force the new transfer you're about to make to use a new connection you want FRESH_CONNECT.
  2. Connection re-use is always considered before DNS (cache/resolve) based on host names. So a resolve has no effect if there's an existing connection in the pool already using that name.

@bagder
Copy link
Member

bagder commented Dec 9, 2017

I wrote up the little example below in an attempt to reproduce, but I can't see it fail. Can you figure out what needs to be changed to reproduce a problem?

#include <stdio.h>
#include <curl/curl.h>

int main(void)
{
  CURL *curl;
  CURLcode res;

  struct curl_slist *dns;

  dns = curl_slist_append(NULL, "example.com:80:127.0.0.102");

  curl = curl_easy_init();
  if(curl) {
    curl_easy_setopt(curl, CURLOPT_URL, "http://example.com");
    curl_easy_setopt(curl, CURLOPT_VERBOSE, 1L);
    curl_easy_setopt(curl, CURLOPT_HEADER, 1L);
    curl_easy_setopt(curl, CURLOPT_RESOLVE, dns);
    curl_easy_setopt(curl, CURLOPT_DNS_CACHE_TIMEOUT, 0L);

    res = curl_easy_perform(curl);
    curl_slist_free_all(dns);

    /* remove the old mapping, add a new */
    dns = curl_slist_append(NULL, "-example.com:80");
    dns = curl_slist_append(dns, "example.com:80:127.0.0.99");
    curl_easy_setopt(curl, CURLOPT_RESOLVE, dns);

    curl_easy_setopt(curl, CURLOPT_FRESH_CONNECT, 1L);

    res = curl_easy_perform(curl);

    curl_easy_cleanup(curl);

    curl_slist_free_all(dns);
  }
  return 0;
}

@p
Copy link
Contributor Author

p commented Dec 9, 2017

Your example matches my expectations, and I figured DNS cache was separate from connection reuse. The question I still have is what is the point of FORBID_REUSE if 1) I set it before any transfers and 2) second transfer uses the first transfer's connection.

@jay
Copy link
Member

jay commented Dec 9, 2017

FORBID_REUSE should do exactly what it says. The second transfer should not reuse the connection from the first transfer. Can you give an example in C?

edit: after reading your example it occurs to me maybe you misunderstand.

It looks like you are doing this:

  res = curl_easy_perform(curl); // this transfer's connection will be reused
  curl_easy_setopt(curl, CURLOPT_FORBID_REUSE, 1L);
  res = curl_easy_perform(curl); // this transfer's connection won't be reused

but really you want this:

  curl_easy_setopt(curl, CURLOPT_FORBID_REUSE, 1L);
  res = curl_easy_perform(curl); // this transfer's connection won't be reused
  res = curl_easy_perform(curl); // this transfer's connection won't be reused

@p
Copy link
Contributor Author

p commented Dec 9, 2017

@jay The last example I posted issues FORBID_REUSE before the first transfer.

@jay
Copy link
Member

jay commented Dec 9, 2017

Ok. I cannot reproduce that here. Can you give us an example in C?

@p
Copy link
Contributor Author

p commented Dec 9, 2017

Investigating, looks like the C code works as I expect and as you stated here.

@p
Copy link
Contributor Author

p commented Dec 9, 2017

My previous example was a user error, but I think this is legit. The following code connects to the real ip the second time.

#include <stdio.h>
#include <curl/curl.h>

int main(void)
{
  CURL *curl;
  CURLcode res;

  struct curl_slist *dns;

  //dns = curl_slist_append(NULL, "example.com:80:127.0.0.102");

  curl = curl_easy_init();
  if(curl) {
    curl_easy_setopt(curl, CURLOPT_URL, "http://example.com");
    curl_easy_setopt(curl, CURLOPT_VERBOSE, 1L);
    curl_easy_setopt(curl, CURLOPT_HEADER, 1L);
    //curl_easy_setopt(curl, CURLOPT_RESOLVE, dns);
    curl_easy_setopt(curl, CURLOPT_DNS_CACHE_TIMEOUT, 0L);

    res = curl_easy_perform(curl);
    //curl_slist_free_all(dns);

    /* remove the old mapping, add a new */
    //dns = curl_slist_append(NULL, "-example.com:80");
    dns = curl_slist_append(NULL, "example.com:80:127.0.0.99");
    curl_easy_setopt(curl, CURLOPT_RESOLVE, dns);

    curl_easy_setopt(curl, CURLOPT_FRESH_CONNECT, 1L);

    res = curl_easy_perform(curl);

    curl_easy_cleanup(curl);

    curl_slist_free_all(dns);
  }
  return 0;
}

@jay
Copy link
Member

jay commented Dec 9, 2017

That is different from FORBID_REUSE. Your example does not reuse the connection, however it appears to be skipping the 127.0.0.99 entry or zapping the wrong one maybe, which is why you get two connections to the same ip. That looks wrong I'll investigate

@p
Copy link
Contributor Author

p commented Dec 9, 2017

Yep thanks.

@bagder
Copy link
Member

bagder commented Dec 9, 2017

libcurl always puts entries into the dns cache, even when the timeout is zero, since the logic everywhere assumes it. And it remains in there as long as it is in use by the current transfer. The problem here is that we don't prune the cache immediately when the transfer is done but only when it disconnects, so in this case the first host name resolve remains in the cache when the second is tried and the RESOLVE instruction doesn't overwrite exiting entries.

This little patch that prunes the dns cache in multi_done fixes it for me:

diff --git a/lib/multi.c b/lib/multi.c
index 1a4618eb2..e35b8fa19 100644
--- a/lib/multi.c
+++ b/lib/multi.c
@@ -550,10 +550,11 @@ static CURLcode multi_done(struct connectdata **connp,
 
   if(conn->dns_entry) {
     Curl_resolv_unlock(data, conn->dns_entry); /* done with this */
     conn->dns_entry = NULL;
   }
+  Curl_hostcache_prune(data);
 
   /* if the transfer was completed in a paused state there can be buffered
      data left to free */
   for(i = 0; i < data->state.tempcount; i++) {
     free(data->state.tempwrite[i].buf);

bagder added a commit that referenced this issue Dec 9, 2017
Prune the DNS cache immediately after the dns entry is unlocked in
multi_done. Timed out entries will then get discarded in a more orderly
fashion.

Reported-by: Oleg Pudeyev

Fixes #2169
@bagder bagder changed the title RESOLVE & DNS_CACHE_TIMEOUT & FORBID_REUSE: curl uses old IP/reuses connections DNS_CACHE_TIMEOUT == 0 but address might still be used for second transfer! Dec 9, 2017
@p
Copy link
Contributor Author

p commented Dec 10, 2017

👍 sounds good.

bagder added a commit that referenced this issue Dec 10, 2017
Prune the DNS cache immediately after the dns entry is unlocked in
multi_done. Timed out entries will then get discarded in a more orderly
fashion.

Test506 is updated

Reported-by: Oleg Pudeyev

Fixes #2169
@bagder bagder closed this as completed in e959f16 Dec 10, 2017
@lock lock bot locked as resolved and limited conversation to collaborators May 6, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Development

No branches or pull requests

3 participants