cURL / Mailing Lists / curl-library / Single Mail

curl-library

Re: parallel transfer techniques

From: Nick Gerner <nick_at_seomoz.org>
Date: Mon, 15 Jun 2009 09:40:50 -0700

On Mon, Jun 15, 2009 at 6:44 AM, <curl-library-request_at_cool.haxx.se> wrote:
> Looking for advice here. Let me start with a little description of the
> situation: there's a main control program which forks off a large number
> of children (which may have their own children, etc). Each of these
> descendants creates a data file and, having done so, reports back to the
> control process, which takes responsibility for uploading all data files
> to a server using libcurl.
>
> 3. Use the multi API. I'm leaning this way because it seems the most
> "curl-ish" solution. The problem I fear here is that I already have a
> select loop with a lot of file descriptors in play for the incoming
> data. The idea of managing two select loops in parallel feels painfully
> tricky. I'm not sure if it's possible to have just one loop and
> distinguish between 'input' and 'output' sockets.
>
> An additional point is that this must work on both Unix and Windows.
> Solutions #1 (process creation) and #2 (thread creation) would have to
> be implemented differently for each, so that's another argument for the
> multi API.
>
> Does anyone have a happy experience to report with any of these methods,
> or preferably even sample code? I'd be especially grateful for guidance
> on #3, using the multi API in the presence of an existing select loop.
>
> Thanks,
> MB

I would support option 3. In fact the easy interface is now powered,
under the covers by the multi interface. So it would seem things are
headed in that direction.

I've used the multi interface a lot in a lot of different situations
and I think it works well. I had a situation with a couple of
blocking i/o procedures similar in some ways to what you describe: I
was using curl to get data over HTTP as well as memcache to get and
put data. Because my memcache interface doesn't expose the sockets I
couldn't put everything in one select or poll loop, I had to break
that off into a separate thread which loops over memcache requests. I
ended up with the main program thread doing work, interacting with
requests processed by either my memcache thread or my curl thread.
But if you've got sockets for both curl and for your other processes
you could use the same select/poll loop for both. You'd need to keep
some state around to keep track of which socket is which, but sockets
are just file descriptors and so make excellent keys into maps or
hashes. CURL provides great support for working with sockets on the
multi interface (check out CURLMOPT_SOCKETFUNCTION ).

I would caution that having everything in one loop does make for
somewhat complicated logic as you need to have one logic path for some
sockets (subprocesses) and another path for other sockets (curl).
That can lead to bugs. But so can having separate threads :)

Let me know if you need more encouragement or code samples.

--Nick
Received on 2009-06-15