cURL / Mailing Lists / curl-library / Single Mail

curl-library

parallel transfer techniques

From: Mohun Biswas <m_biswas_at_mailinator.com>
Date: Mon, 15 Jun 2009 09:43:10 -0400

Looking for advice here. Let me start with a little description of the
situation: there's a main control program which forks off a large number
of children (which may have their own children, etc). Each of these
descendants creates a data file and, having done so, reports back to the
control process, which takes responsibility for uploading all data files
to a server using libcurl.

Only the controller process uses libcurl. Subprocesses communicate with
the controller via sockets but these do not involve curl, just regular
connect/send/close system calls. These "incoming" sockets are
multiplexed with a select loop. The controller uses the synchronous easy
API to upload files, and this creates a choke point since files can come
in asynchronously faster than they can go out synchronously. I'm trying
to fix that bottleneck, and as far as I can tell I have three options:

1. Fork a new process to do each upload, still using the easy API. This
is kind of heavyweight but could work. The problem is that I use SIGCHLD
to keep track of child processes and spurious SIGCHLDs from transfer
processes confuse the bookkeeping. This could probably be worked around
at the cost of perhaps making some already complex code painfully complex.

2. Use threads with easy handles. As I understand it, each thread would
need a dedicated curl handle and I'd need to maintain a pool of worker
threads. I have little experience of threaded programming so I don't
know how good or how hard this option is.

3. Use the multi API. I'm leaning this way because it seems the most
"curl-ish" solution. The problem I fear here is that I already have a
select loop with a lot of file descriptors in play for the incoming
data. The idea of managing two select loops in parallel feels painfully
tricky. I'm not sure if it's possible to have just one loop and
distinguish between 'input' and 'output' sockets.

An additional point is that this must work on both Unix and Windows.
Solutions #1 (process creation) and #2 (thread creation) would have to
be implemented differently for each, so that's another argument for the
multi API.

Does anyone have a happy experience to report with any of these methods,
or preferably even sample code? I'd be especially grateful for guidance
on #3, using the multi API in the presence of an existing select loop.

Thanks,
MB
Received on 2009-06-15