cURL / Mailing Lists / curl-library / Single Mail

curl-library

Re: libcurl and async I/O

From: Cory Nelson <phrosty_at_gmail.com>
Date: Sat, 16 Aug 2008 22:53:48 -0700

On Sat, Aug 16, 2008 at 1:12 PM, Daniel Stenberg <daniel_at_haxx.se> wrote:
> On Fri, 15 Aug 2008, Andrew Barnert wrote:
>
>> I've been investigating incorporating HTTP tunneling into a
>> boost-asio-centered tool. Writing all the HTTP stuff myself is no fun; I'd
>> much rather use libcurl. But I need the same asio engine to manage all of
>> the sockets, concurrency, etc., whether they're curl-tunneled or native.
>> After looking over the libcurl APIs, there appears to be no way to do what I
>> want.
>
> I have no idea what you want. I don't know boost nor boost-asio.
>
>> I found a thread from 2005
>> (http://curl.haxx.se/mail/lib-2005-11/0011.html) where Cory Nelson tried to
>> explain exactly this problem to Daniel, but they never quite connected; the
>> conversation went onto a side-track about threading and cancellation, but
>> the real problem was never mentioned-- ready notifications (or non-blocking
>> sync I/O) vs. completion notifications (or async I/O, aka "overlapped" in MS
>> terminology).
>
> Well, at the time I was planning the socket API and what Cory then suggested
> and talked about didn't match with what I was going to write. I must admit I
> still don't know what IOCP is or how it works.

I agree, I got sidetracked a bit and didn't present IOCP's case very
clearly. Sorry about that.

asio's design is almost an exact wrapper over iocp, so i will use it
to explain iocp:

/////
demuxer d;
socket s(d);

s.begin_recv(buf, len, callback); // returns immediately, calls
callback on completion.
d.run();
/////

now to explain a little:

d.run() blocks until a completion event occurs. callbacks are run
from within this function. no background threads are created (except
when emulation is needed, like dns) - you can call d.run() on a single
thread or multiple ones, it's your choice. you can also create
"strands" that ensure callbacks within the strand are synchronized if
you use more than one thread.

callbacks can start new i/o operations, so you essentially get a big
chain of callbacks. this sometimes sounds intimidating to
first-timers but it can actually decrease code complexity a great deal
by splitting up linear synchronous code into smaller easier to read
functions.

>> Ultimately, if your API is designed right, the difference is just where
>> the buffers go. So, here's what would need to be changed: Add a new callback
>> curl_socket_async_callback (and a new CURLMOPT). This takes an extra (void
>> *)buffer and (size_t)len. It does the async, and it's expected to do an
>> async read, write, or both, then call curl_multi_socket_async_action on
>> completion. This takes (size_t)bytes (actually read or written), and uses
>> the existing buffer instead of reading or writing, but is otherwise the
>> same.
>
> I don't quite understand this brief description. Can you add some more
> psuedo code for a client using this suggested API?

The good thing is that curl's public API doesn't need to change one
bit to use IOCP - curl_multi_perform() is the equivalent of the
d.run() above. How much the internal code would need to change is
another question though!

> In my view, asynchronous is mostly just another word for running the stuff
> in another thread until it has something, and then have a means of telling
> the first thread when it is done. And you can use libcurl fine already for
> doing exactly that.

Well, the current way of doing it (select) supports that model too --
it's just not anything close to IOCP's efficiency which scales to tens
of thousands of concurrent operations.

>> There's only one problem: SSL, SSH, and Kerberos.
>
> That sounds like three problems to me! ;-)
>
>> These are all wrapped by using their send/recv replacements, and you
>> obviously can't just tell Windows or boost.asio to do an overlapped OpenSSL
>> SSL_send call.
>
> Now you lost me again. Are you saying that you need to base this
> functionality on same particular magic functions of the OS to make it
> working? Or why can't these other protocols be made to work the same way?

asio does some magic to make a truly async OpenSSL socket (see
asio::ssl::stream), but OpenSSL's API, much like almost every other
protocol library, is not really designed well for this kind of async.

Ideally libraries would accommodate both, by providing a low-level API
that doesn't assume anything about I/O, only takes buffers of data,
and doesn't expect you to block:

ssl_context ctx;

for(;;) {
   recv(buf, len); // get data
   ctx.push_input_buffer(buf, len); // push data onto context

   while(ctx.pop_result(&res)) // parsing happens here. if a complete
chunk (like a http header line) is parsed, returns true and lets the
app handle it. otherwise return false meaning more input is needed.
      handle_result(&res);
}

This pseudocode shows a blocking method for parsing a protocol using
input from recv(), but it allows it to be made async in a very
straightforward way. By decoupling the protocol parsing from I/O, we
get a lot more freedom. This way, very little would need to be done
to integrate it into pretty much any I/O architecture.

-- 
Cory Nelson
Received on 2008-08-17