cURL / Mailing Lists / curl-library / Single Mail

curl-library

Re: Info request about the zero copy interface (2)

From: Legolas <legolas558_at_email.it>
Date: Mon, 05 Dec 2005 23:32:52 +0100

Daniel Stenberg ha scritto:

> On Mon, 5 Dec 2005, Legolas wrote:
>
>> A great idea would be instead to provide an almost-zero copy
>> interface. I will attach A.S.A.P. a pseudo source snippet, but don't
>> try to take it apart looking for a zero copy interface: for a *real*
>> zero copy interface a major effort is needed.
>
>
> I think we can design an interface now that allows for a pretty good
> zero-copy interface, but it doesn't have to mean that libcurl would
> take full advantage of every aspect of the zero-copy from day 1. I
> agree that we don't have to overdo it: just start with a simple plain
> approach and expand it later if/when we feel the need and have the
> energy for it.
>
> Given the nature of libcurl, as very portable, on top of the transport
> layer and using a whole range of 3rd party libraries, we will of
> course have to live with a number of copies no matter how hard we try.
>
As a very bad example, give a look to the client pseudo code I have
written taking in account what I have read up to now. I have also put in
it my original idea in a soft way. However, that's not my idea, it's
just an idea :)

/*
        zcopycli.c - Pseudo code for a theorical client
                                 application able to handle zero-copy

        (c) legolas558 _at_ email.it

        Read more at:
        http://curl.haxx.se/mail/lib-2005-12/0000.html
          - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
        This pseudo code is going to cover two general types of application,
        the first with APPLICATION_ONE defined, the second with
        APPLICATION_TWO (messy messy messy!)

        1. applications that need to just 'look into' a buffer of
                downloaded data (file downloading for example)

    2. applications that need to stream the entire data into a larger
                ordered buffer (various purposes)
        
        A third, important, type of application is not schematized here:
        the case of an application able to handle multiple buffers.

        Excuse me for general code chaos, I am exploiting the fact this is
        just pseudo code...

        I was also thinking about a possible usage of specific structs
        between the library and the application to allow a better
        information exchange about used buffers; this approach is more
        likely to be used when a multi-buffers design will take place of this.
          - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
        Please use the mailing list to give new advices, and also to point out
        corrections if needed!
*/

#include <stdio.h>
#include <stdlib.h>
#include <curl/curl.h>

/**/
#define APPLICATION_ONE
/**/
/*
#define APPLICATION_TWO
*/

/* see the main() code for explanation of the followings */
#ifdef APPLICATION_ONE
FILE *output_file;
#else /* APPLICATION_TWO */
#define BUFFER_SIZE (1024*32)
#endif

typedef struct _simple_buf {
        char *base;
        char *here; /* used by application 2 */
        char *top;
        struct simple_buf *next; /* that's just a place holder,
                                                                                and also pads well... */
};

#define simple_buf struct _simple_buf

#define SB_SIZE(sb) ((int)(sb->top-sb->base))
#define SB_AVAIL(sb) ((int)(sb->top-sb->here))
#define SB_DELTA(sb) ((int)(sb->here - sb->base))
#define SB_IN_RANGE(sb, ptr) (((char *)ptr>=sb->base) && ((char *)ptr<=sb->top))

/* this function allows the caller to allocate
        buffer with a minimum fixed granularity */
int granularity_fix(int amount, int granularity)
{
        div_t dv;
        dv = div(amount, granularity);
        amount = (dv.quot + (dv.rem>0));
        if (!amount) amount++;
        return amount*granularity;
}

/* */
int sb_assert_size(simple_buf *sb, int desired_size) {
        int delta, new_size;
        if (desired_size > SB_SIZE(sb)) {
                new_size = granularity_fix(desired_size, 1024);
                delta = SB_DELTA(sb);
                sb->base = realloc(sb->base, new_size);
                if (sb->base == NULL)
                        return 0;
                sb->here = sb->base + delta;
                sb->top = sb->base + new_size;
                return new_size;
        }
        return SB_SIZE(sb);
}

typedef void * (awb_prototype(void *custom_data, int *desired_size));
/* from here on, AWB stands for Allocate Write Buffer).
        This function is defined by the application and specified to libcurl
        through an improbable CURLOPT_AWB parameter forcing the library to
        use it instead of the internal one. Follows quote from J.Loker:
        "The library will call it when data is not already available in a
        fixed location due to algorithms such as chunked decoding, zlib and
    SSL decryption."
        Basically, this function must return a buffer sized AT LEAST as the
        value specified in '*desired_size'. The function would eventually
        adjust that value with the allocated size of the buffer (at this
        point I can't yet figure out if the library needs this information
        however).
        An application following behaviour (1) will try to re-use the same
        buffer for any single-use operation it needs.
        In case (2) application will instead reallocate its buffer and pass
        the new pointer to the library.
        A similar behaviour is expected from the 'write_callback' function
        (see below).
*/

void *allocate_write_buffer(void *custom_data, int *desired_size) {
        simple_buf *sb;
        sb = (simple_buf *)custom_data;
#ifdef APPLICATION_ONE
        *desired_size = sb_assert_size(sb, *desired_size);
#else /* APPLICATION_TWO */
        *desired_size = sb_assert_size(sb, SB_DELTA(sb) + *desired_size);
#endif
        return sb->here;
}

typedef int (wcb_prototype(void *custom_data, void *real_buffer,
                int data_length, int writeable));
/* WCB stands for Write Call Back, this function is called when a discrete
        amount of data has been prepared for the client application.
        Note: if the library has called 'allocate_write_buffer' and is returning
        a buffer got in that way, it is expected 'data_length' being less or
        equal to the '*desired_size' value.
        The application is aware of 'real_buffer' ownership through the macro
        SB_IN_RANGE.
*/

int write_callback(void *custom_data, void *real_buffer,
                                           int data_length, int writeable) {
/* Note: 'writeable' is ignored in this example */
#ifdef APPLICATION_ONE
        return fwrite(real_buffer, 1, data_length, output_file);
#else
        simple_buf *sb;
        sb = (simple_buf *) custom_data;
        if (!SB_IN_RANGE(sb, real_buffer)) {
                sb_assert_size(sb, SB_DELTA(sb) + data_length);
        /* library is providing a private buffer,
                all our work here is to copy from that to our big streamed one.
                Again, a quote from J.Loker:
                "When receiver use its own buffer, and sender already has the
                 data in its own buffer, then and only then do we have to memcpy()"
        */
                memcpy(sb->here, real_buffer, data_length);
        } else
        /* library has written to our buffer, that's ok */
                sb->here += data_length;
        return data_length;
#endif
}

int main(void)
{
  simple_buf buffer;
  CURL *curl;
  CURLcode res;

  curl = curl_easy_init();
  if(!curl) return -1;

  memset(&buffer, 0, sizeof(simple_buf));
#ifdef APPLICATION_ONE
  output_file = fopen("index.htm", "wb");
/* since output will be flushed to file system,
        we do not use any starting memory buffer.
        Please note that it if needed (when the library
        calls 'allocate_write_buffer') it will be anyway
        dynamically allocated. */
#else /* APPLICATION_TWO */
  sb_assert_size(&buffer, BUFFER_SIZE);
/* since we need the entire downloaded file into
        an ordered memory stream, we allocate the huge
        memory block before everything begins */
#endif
  
  curl_easy_setopt(curl, CURLOPT_URL, "curl.haxx.se");

/* set the new 'allocate_write_buffer' handler */
  curl_easy_setopt(curl, CURLOPT_AWB, &allocate_write_buffer);
/* Note: CURLOPT_WRITEFUNCTION would have a different meaning */
  curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, &write_callback);
/* CURLOPT_WRITEDATA would be used (as now) to set a custom parameter
        ('custom_data') for calls to 'allocate_write_buffer' & 'write_callback'
*/
  curl_easy_setopt(curl, CURLOPT_WRITEDATA, &buffer);

  res = curl_easy_perform(curl);

  if (!res) {
          fprintf(stderr, "Perform error: %s\n", curl_easy_strerror(res));
          curl_easy_cleanup(curl);
#ifdef APPLICATION_ONE
          fclose(output_file);
#endif
          return -2;
  }

  /* now, if using application (1), we have a file called 'index.htm'
        with the downloaded content. No redundant copies have been made since
        the library should have passed its buffers or at worse only a buffer of
        the biggest chunk size has been allocated through the awb handler */

  /* in the 2nd case a memory stream starting from 'buffer.base' and ending
        at 'buffer.here' is available for post-processing.
        The usage of CURLOPT_AWB, CURLOPT_WRITEFUNCTION & CURLOPT_WRITEDATA was
        necessary only in this case actually.
        Implementing a zero copy interface is a very complex problem and this
        example is just a draw of 'what should it look like'
  */

  free(buffer.base);

  curl_easy_cleanup(curl);
#ifdef APPLICATION_ONE
  fclose(output_file);
#endif
  return 0;
}
Received on 2005-12-05