cURL / Mailing Lists / curl-library / Single Mail

curl-library

Re: Zero-copy interface

From: G Drukier <gdrukier_at_afrinc.com>
Date: Mon, 19 Jul 2010 14:47:53 -0400

On 12 Jul 2010, at 12:18 PM, Daniel Stenberg wrote:

> On Thu, 8 Jul 2010, G Drukier wrote:
>
> (Please don't reply to a subject as a shortcut to start a new
> thread, you'll end up as a reply in the existing thread in clients
> and web archives and more, and it is messy.)

My apologies. I've put this in a new thread, which I hope won't mess
things up too much more as it's still the start of the discussion.

>
>> We're working on an application where we're taking a video feed off
>> of an IP camera via HTTP or RTSP and then processing the video in
>> memory without writing to disk. In order to maximize performance
>> we'd prefer to not have to copy the incoming data from the libcurl
>> internal buffer.
>>
>> So, has there actually been any work in implementing the zero-copy
>> interface?
>
> No.
>
>> I've been poking around the code-base, but I'm not that familiar
>> with it yet. Does anyone have any guidance they could give in
>> implementing zero-copy?
>
> My vision of a zero copy interface would be that you provide buffers
> in advance to libcurl, and as it goes about and stores data into the
> buffers it'll ask for more and use those accordingly.
>
> That way, it would only be a matter of updating the main "buffer
> pointer" at the suitable place in the code to not point to the
> internal buffer all the time but instead point to the correct new one.
>
> Possibly it could be made as another callback: getbuffer()
>
> getbuffer() gets called when libcurl needs a new buffer, and the
> buffer you provide to libcurl with that callback must be able to
> hold at least CURL_MAX_WRITE_SIZE bytes. When libcurl calls the
> write callback, it will pass on a pointer to within that buffer and
> a length. Note that it MAY not point to the first byte of the passed-
> in buffer.
>
> What what you say about that concept?

The way I've done this in the past is to have the code acquiring the
data, which in this case would be libcurl, assign an appropriate block
of memory itself for each incoming piece of data and pass that on. The
problem with that approach is knowing what allocator is being used so
that the buffer can be subsequently deallocated properly. This problem
is alleviated by your approach in which the user allocates and
provides the buffer.

If I understand the rest of your proposal, the user would set the zero-
copy option, and then, when libcurl
hits the location where it needs to write, it call the getbuffer
callback to get the buffer. When it is done reading, it then calls the
write callback. This seemed inefficient to me, until I thought about
it further in the context of how libcurl works.

I would have suggested instead, that the setopt mechanism be used to
set the location of the buffer to write to. The buffer should be
subject to the CURL_MAX_WRITE_SIZE minimum. Then the data gets read
and the write callback gets called as usual. The user then does what
he likes with the memory, and, if desired allocate new memory. The
problem, as I then realized, is that the write callback doesn't have
the handle, and so can't run setopt. Further, although I've only used
the easy interface until now, I imagine that having only one buffer is
going to cause a problem for multi.

So your proposal makes sense in that it minimizes the amount of
modification needed, and leaves the memory allocation problems in the
user's hands. It would require one or two new options. One to signify
that zero-copy is to be used, and one to set the getbuffer callback.
Alternatively, and more economically, the latter would set the former;
the default state being a NULL callback, and bypass of the code.

A question though. If the buffer is larger than CURL_MAX_WRITE_SIZE,
does the getbuffer callback would need to notify libcurl that this is
the case? Or is CURL_MAX_WRITE_SIZE a hard(ish) limit, and libcurl
shouldn't be called upon to do more than this?

At the other limit, currently if the user want's a smaller buffer, and
more frequent reference to the write callback, the CURLOPT_BUFFERSIZE
option is available, but its satisfaction is not guaranteed. What I'm
concerned about is excessive demands on memory in the zero-copy case
where the incoming data chunks are small with respect to
CURL_MAX_WRITE_SIZE.

Gordon

-------------------------------------------------------------------
List admin: http://cool.haxx.se/list/listinfo/curl-library
Etiquette: http://curl.haxx.se/mail/etiquette.html
Received on 2010-07-19