cURL / Mailing Lists / curl-library / Single Mail

curl-library

Some Sites Don't Timeout

From: Jonathan A. Zdziarski <jonathan_at_zdziarski.com>
Date: Tue, 4 Mar 2008 15:46:32 -0500

Hey there,

I've been using libcurl successfully for about a week in some new
code, but am having a strange problem in that some sites just don't
timeout. Here's a gdb dump of where it gets stuck (it'd run all day if
I let it).

One domain my crawler is trying to fetch is abroeiu.com, which is
timing out on connect.

(gdb) info threads
   3 Thread 1094719840 (LWP 10069) 0x0000003cb1ebd9a2 in poll ()
from /lib64/tls/libc.so.6
   2 Thread 1084229984 (LWP 10068) 0x0000003cb1ebd9a2 in poll ()
from /lib64/tls/libc.so.6
* 1 Thread 182900061440 (LWP 10065) 0x0000003cb1e8f7d5 in
__nanosleep_nocancel () from /lib64/tls/libc.so.6
(gdb) thread 2
[Switching to thread 2 (Thread 1084229984 (LWP 10068))]#0
0x0000003cb1ebd9a2 in poll ()
    from /lib64/tls/libc.so.6
(gdb) bt
#0 0x0000003cb1ebd9a2 in poll () from /lib64/tls/libc.so.6
#1 0x0000002a958968ea in Curl_socket_ready (readfd=-1, writefd=10,
timeout_ms=1896113078) at select.c:218
#2 0x0000002a9588f2f2 in waitconnect (sockfd=Variable "sockfd" is not
available.
) at connect.c:200
#3 0x0000002a9588fab8 in singleipconnect (conn=0x5415d0, ai=Variable
"ai" is not available.
) at connect.c:766
#4 0x0000002a9588ff63 in Curl_connecthost (conn=0x5415d0,
remotehost=0x55de80, sockconn=0x5416e8,
     addr=0x40a00040, connected=0x40a0004f) at connect.c:894
#5 0x0000002a95883042 in SetupConnection (conn=0x5415d0,
hostaddr=0x55de80, protocol_done=0x40a000bf)
     at url.c:2633
#6 0x0000002a95884d23 in Curl_async_resolved (conn=0x5415d0,
protocol_done=Variable "protocol_done" is not available.
) at url.c:4390
#7 0x0000002a9588dea7 in Curl_perform (data=0x526180) at transfer.c:
2279
#8 0x0000000000403b55 in process_url (CTX=0x514d90, url=0x514df0) at
phishd.c:536
#9 0x0000000000403f74 in process_site (ptr=Variable "ptr" is not
available.
) at phishd.c:432
#10 0x0000003cb290610a in start_thread () from /lib64/tls/
libpthread.so.0
#11 0x0000003cb1ec68c3 in clone () from /lib64/tls/libc.so.6
#12 0x0000000000000000 in ?? ()
(gdb)

Here is the code I am using to invoke libcurl. Any suggestions to get
this working would be appreciated:

     CURL *curl;
     CURLcode res;
     struct curl_slist *slist = NULL;
     long one = 1;
     long max_redirs = 10;
     long curl_timeout = 30;
     long curl_connect_timeout = 15;

     curl = curl_easy_init();
     if (!curl) {
         LOG(LOG_CRIT, ERR_CURL_INIT_FAIL, strerror(errno));
         return EINVAL;
     }

     slist = curl_slist_append(slist,
         "User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT
5.1; SV1)");
     slist = curl_slist_append(slist, "Cache-Control: max-age=0");
     slist = curl_slist_append(slist, "Accept-Language: en-
us,en;q=0.5");
     slist = curl_slist_append(slist, "Accept-Encoding: ");
     slist = curl_slist_append(slist,
         "Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7");
     slist = curl_slist_append(slist,
        "Accept: text/xml,application/xml,application/xhtml+xml,text/
html;"
        "q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5");

     curl_easy_setopt(curl, CURLOPT_URL, url->url);
     curl_easy_setopt(curl, CURLOPT_SSL_VERIFYPEER, &one);
     curl_easy_setopt(curl, CURLOPT_SSL_VERIFYHOST, &one);
     curl_easy_setopt(curl, CURLOPT_FOLLOWLOCATION, 1);
     curl_easy_setopt(curl, CURLOPT_MAXREDIRS, &max_redirs);
     curl_easy_setopt(curl, CURLOPT_NOSIGNAL, &one);
     curl_easy_setopt(curl, CURLOPT_TCP_NODELAY, &one);
     curl_easy_setopt(curl, CURLOPT_AUTOREFERER, 1);
     curl_easy_setopt(curl, CURLOPT_HTTPHEADER, slist);
     curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, write_callback);
     curl_easy_setopt(curl, CURLOPT_WRITEDATA, &buffer);
     curl_easy_setopt(curl, CURLOPT_TIMEOUT, &curl_timeout);
     curl_easy_setopt(curl, CURLOPT_CONNECTTIMEOUT,
&curl_connect_timeout);

     res = curl_easy_perform(curl);

Jonathan
Received on 2008-03-04