cURL / Mailing Lists / curl-library / Single Mail

curl-library

Re: Info request about the zero copy interface (2)

From: Legolas <legolas558_at_email.it>
Date: Mon, 05 Dec 2005 17:51:56 +0100

Jamie Lokier ha scritto:

>Legolas wrote:
>
>>>>MainLoop:
>>>> received_size = recv(yoursocket, internal_buffer,
>>>> internal_buffer_size, yourflags);
>>>> buffer_size = forecast_size(received_size);
>>>> /* forecast someway by libcurl */
>>>> buffer = write_buffer(custom_data, &buffer_size);
>>>> /* application may return a bigger buffer */
>>>> ... (decode SSL, join chunks...)
>>>> /* work on received data putting final result data into 'buffer'
>>>> */
>>>> write_callback(custom_data, final_size);
>>>> /* previous code must set 'final_size' to the size of data
>>>> written to 'buffer' */
>>>>
>>>>The line followed by the comment "work on received data putting final
>>>>result data into 'buffer'" copies data into 'buffer'. That copy is
>>>>not necessary. In what way is this doing "zero-copy"?
>>>>
>>>>
>>Does not copy! I have used the verb _putting_, i.e. each final result
>>byte or block is written directly to the 'buffer' by the algorithm
>>intended to be in place of the ellipsis (...). Overall, there is no copy
>>at the end of the work process (otherwise I would have expressely
>>pointed out it).
>>
>
>Ah, I think "each final result byte of block is written" is sometimes
>an unnecessary copy :)
>
>Sometimes, it's unavoidable. If we use zlib, or the openssl library,
>then they will always write their data to a caller-specified buffer,
>so the above does not cause any extra copying in that case.
>
Ok, I was thinking (up to now) that any possible algorithm allowed to
specify a destination caller-specific buffer (as zlib or openssl).

>However, for chunked decoding, then putting "each final result byte"
>in 'buffer' means copying the bytes from 'internal_buffer'. For small
>runs of bytes, that is not significant (because of other overheads).
>But for large runs, such as large HTTP nchunks, then it is a notable
>extra copy. The same applies to recv() blocks that contain part HTTP
>headers and part data.
>
libcurl should *smartly* choose to use the 'internal_buffer' (when
handling overheads for example) and then switch to the direct recv()

>I'm sure you understand, from my points a, b, c, d and e, what I mean
>though, so I won't argue more about this.
>
Right. I am currently writing that pseudo code source snippet and I'll
post it within some hours (I'm in hurry at the moment).
Can I report part of that resume email in the snippet?

>Unforunately, I wrote two (e)s - one should have been labelled (f). :)
>Do you mean the one about contiguous vs non-contiguous regions, or the
>one about negotiating the release of library-supplied buffers?
>
Yeah, I realized it after replying, and I shouldn't say I had understood
also that point. I mean the first (e): what kind of data processing
lead to multiple buffers? My idea is to have multiple calls to the
'write_buffer' callback, allowing the application to flush data into
files for example.

-- Giuseppe
Received on 2005-12-05