cURL / Mailing Lists / curl-library / Single Mail

curl-library

Re: Info request about the zero copy interface (2)

From: Jamie Lokier <jamie_at_shareable.org>
Date: Mon, 5 Dec 2005 11:20:42 +0000

Legolas wrote:
> >That is not the case.
> >
> >We know there is data available to read (but we cannot know how much).
> >We must already have a known buffer area to store data in, then call
> >recv() to get the data and then store it in the buffer (possibly after
> >some "decoding" like SSL, chunked HTTP and similar).
> >
> >Therefore, we must already have been given a buffer pointer (and size)
> >from the application layer where we can store the received data BEFORE
> >we call the write callback.

I have thought about zero-copy for many years, so I'll add me 2p:

All this talk of storing data to files, further processing in memory,
etc., assumes that the incoming data comes from a network socket, is
not filtered, and is not surrounded by protocol of unpredictable
length (such as HTTP headers). That is not true for some protocol
states.

If you want sinks and sources to handle: files, network streams,
further processing in memory, SSL, compression, HTTP chunk
encoding, etc., and you want them all to use the minimum number of
copies, then you _cannot_ use a fixed pattern such as "library
calls application to request buffer; library stores data into
application buffer".

That's the pattern Legolas suggests, and it works fine when data is
simply being read from a socket.

But, for example, if the library needs to decompress the data (zlib)
or decrypt it (SSL), then that interface actually causes more copies
than necessary. For example, the zlib decompression algorithm
_requires_ all the incoming data to be stored in a 32k circular buffer
- that's part of the algorithm.

Legolas' pattern requires each decompressed data chunk to be then
_copied_, from the algorithm's 32k buffer, to the application's
supplied buffer. That is more copies than necessary.

If the library supplies the buffer, then in the case of zlib
decompression, (and chunked encoding, etc.) it's possible to use fewer
copies. (I'm not saying the libraries we have available make this
practical, btw. - This discussion is more about an optimal API than
about what's practical to implement).

That's just an example. In _general_, a zero-copy interface should
look broadly like this:

   a. Application has a "allocate_write_buffer" function, but it is
      _optional_ for the library to use it.

   b. The library will call the "allocate_write_buffer" function
      _only_ when it does not already have the data in a fixed
      location due to algorithms such as chunked decoding, zlib and
      SSL decryption. So, for example, a direct recv() from the
      socket, after reading headers, or in the middle of a large chunk
      of chunked encoding, would let the application select the
      buffer. It would do the same if it's using a
      decompression/decryption library where that library's API forces
      a copy anyway, to avoid a second copy.

   c. The applications "write_callback" function must accept any
      combination of buffer that were allocated by the application, and
      buffers provided by the library.

   d. The library should be able to specify whether any buffer that
      _it_ provides to "write_callback" is writable in place by
      "write_callback" or not. This is needed for minimal copying by
      sinks that further filter the data.

   e. When data is available only in non-contiguous memory regions,
      the data must not be copied to make it contiguous. Instead,
      "write_callback" should be called more than once, or it should
      accept a list of buffers.

   e. The library and application should be able to negotiate how long
      they can retain library-allocated buffers after the callback
      returns. This is so that the write callback can gather multiple
      buffers without copying the data, if (for example) the
      application is intending to write the data using sendmsg() in
      chunks of a certain minimum size, or otherwise process more data
      than is available in one contiguous memory region from the
      library.

> I understand, I'm sorry but I thought you were using something similar to
> ioctlsocket(yoursocket, FIONREAD, &available_data_size);
> To determine it, but probably this interface is not available on all
> socket layers.

That's correct. It is not always available.

But more importantly: even that does not provide zero-copy in general.

Think about this:

   In one call to recv(), the library reads HTTP headers, plus the
   first 1000 bytes of the data. The library _cannot_ know how long
   the headers are, until it parses them. So it will always read some
   of the data in the recv().

   If you're serious about avoiding copies, that 1000 bytes of the
   data would not be copied. But your interface forces those bytes to
   be copied...

   FIONREAD doesn't change this.

-- Jamie
Received on 2005-12-05