cURL / Mailing Lists / curl-library / Single Mail

curl-library

Re: PHP/cURL: intensive calls send exception without caching connection resource

From: Daniel Stenberg <daniel_at_haxx.se>
Date: Fri, 17 Jan 2014 10:08:45 +0100 (CET)

On Mon, 13 Jan 2014, Stéphane HULARD wrote:

Hi, before you continue please read this:

   http://curl.haxx.se/mail/etiquette.html#Do_Not_Top_Post

> As I read your questions, I think my problem details are unclear about the
> environment and my curl utilisation.

Exactly. I ask questions to perhaps help us narrow down the problem!

> First I need to clarify some point, I did not know that your bug tracker is
> on SourceForge.

If you had googled for anything like "curl bugs" or "libcurl bug tracker" or
similar phrases you would have found that.

> The tool make from 500 requests to 10000 requests per seconds during our
> crawl. The request count trends to grow quickly. We made 90% of HTTP calls
> to a local server (127.0.0.1) and 10% to various url on 1-10 remote server
> (most are HTTP but can be FTP, or HTTPS).
>
> We do not use simultaneous request, we work in a synchronous way.

Wow. You do 10000 requests/second serially? I'm impressed. Without connection
re-use. I didn't think that was possible.

> I don't understand what you mean by "file descriptors".

I think that's unfortunate and it shows there's a knowledge gap here. A socket
is a file descriptor and so are other things. A process is only allowed to
have N open file descriptors, where N is configurable in your system. 1024 is
a common default on *nix systems.

But since you do synchronous transfers I don't think this is a problem for
you.

> It is the curl handle which are initialized inside the process? If it is, we
> use a curl handle different for each request (as Guzzle library make this
> happen) but we rewrite a HTTP client which cache handles to try maximum
> reuse.

A curl handle is a door knob leading to activities like sockets, sure. Of
course a connection is at least one socket/file descriptor but libcurl has a
connection pool that keeps connections open so subsequent handles shouldn't
have to create new sockets/connections. Also, libcurl does things like DNS
resolving and more so it'll often use more than one file descriptor.

If you don't re-use handles, it will always close/init connections between
requests and it'll go through MUCH more file descriptors with less performance
and more system load.

> I searched on the web about my error and I found that it can be encountered
> when the server can't allow a new php curl_init call because there isn't
> enough available “process” for a network call…

I don't know what that means. I assume that's slightly confused speak meaning
out of file descriptors (or similar) as I've just mentioned.

Anyway, to get closer on your actual problem I think you need to shave off a
couple of layers of cruft here. This is the libcurl mailing list, we talk
libcurl and libcurl API, you're using an API on top of PHP/CURL which seems to
be about two layers away from "us".

> Failed to connect to 127.0.0.1: Cannot assign requested address

When you get the failure, how many file descriptors is in use then? It looks
from the log text that connect() is what fails here, is that true? If it is,
the server is on 127.0.0.1 - what is the server's view of life at this point?

-- 
  / daniel.haxx.se

-------------------------------------------------------------------
List admin: http://cool.haxx.se/list/listinfo/curl-library
Etiquette: http://curl.haxx.se/mail/etiquette.html
Received on 2014-01-17