libcurl and async I/O
Date: Fri, 15 Aug 2008 15:06:34 -0700
I've been investigating incorporating HTTP tunneling into a boost-asio-centered tool. Writing all the HTTP stuff myself is no fun; I'd much rather use libcurl. But I need the same asio engine to manage all of the sockets, concurrency, etc., whether they're curl-tunneled or native. After looking over the libcurl APIs, there appears to be no way to do what I want. The same would be true with native Windows (or Solaris) I/O completion ports, or the ACE proactor classes, or anything else that works the same way; the libcurl-hiper stuff only works with a select/poll/kqueue-style ("reactor") API.
Neither Necko nor neon seems to be even close. I'm not sure about libwww even after spending a few hours with the docs and the code. If anyone has any other suggestions, I'd love to hear them. Otherwise, it's back to either writing it myself or hacking on libcurl.
I found a thread from 2005 (http://curl.haxx.se/mail/lib-2005-11/0011.html) where Cory Nelson tried to explain exactly this problem to Daniel, but they never quite connected; the conversation went onto a side-track about threading and cancellation, but the real problem was never mentioned-- ready notifications (or non-blocking sync I/O) vs. completion notifications (or async I/O, aka "overlapped" in MS terminology).
Ultimately, if your API is designed right, the difference is just where the buffers go. So, here's what would need to be changed: Add a new callback curl_socket_async_callback (and a new CURLMOPT). This takes an extra (void *)buffer and (size_t)len. It does the async, and it's expected to do an async read, write, or both, then call curl_multi_socket_async_action on completion. This takes (size_t)bytes (actually read or written), and uses the existing buffer instead of reading or writing, but is otherwise the same.
Most async engines are inherently threaded, so what about locking? I think the existing code already works fine as long as you don't make two simultaneous calls on the same socket, and most async engines already guarantee this much serialization.
There's only one problem: SSL, SSH, and Kerberos. These are all wrapped by using their send/recv replacements, and you obviously can't just tell Windows or boost.asio to do an overlapped OpenSSL SSL_send call.
The right answer is to wrap up all of the various SSL, SSH, and Kerberos libraries so they call out, in an identical way, to the application's async I/O engine instead of doing direct socket calls. The application gives libcurl pointers to recv_async, etc. functions, and gets back recv_async_complete (one per security library), etc. functions to call on completion, and curl massages things as needed. But this is not a trivial undertaking--it may not even be possible for all of the libraries-some are just not designed to be hooked in that way.
Another alternative is just to say that SSL, SSH, and Kerberos can't use used with the async calls. That would actually be fine for my purposes--I can do the SSL at the asio layer instead of the curl layer, and I don't need SSH and Kerberos--but it's obviously a serious limitation for many uses.
Anyway, I think it may make sense for me to hack up libcurl myself to do what I want, and just break the SSL, SSH, and Kerberos support. But I doubt that has any use to the rest of the community, except maybe as a proof of concept for how things could be done properly.
Received on 2008-08-16