cURL / Mailing Lists / curl-library / Single Mail

curl-library

Re: sharing is caring

From: Sterling Hughes <sterling_at_bumblebury.com>
Date: Mon, 14 Jan 2002 06:57:56 +0100

> [ Warning: this mail contains numerous unfinished thoughts, brainstorming
> style ramblings and lots of questions! ]
>
> * Sharing
>
> With Sterling's newly introdced name resolve cache code, he provided an
> interesting approach to a problem that actually several other curl subsystems
> could benefit from.
>
> I'm talking about the way the DNS cache is shared between all easy-handles
> that are added to a multi-handle. As all handles within a multi-handle are
> used only one at a time, it is safe to do this without any mutexes or
> similar.
>
> I am of course thinking about the connection cache, SSL session cache and
> cookies (more?). They are all different lists with information that today are
> stored per easy-handle, but would benefit to work per multi-handle too,
> shared between multiple easy-handles.
>
    yippy, gives new meaning to the phrase "Cash(che)-Crazy" :)

> Then the question arises: how? And I'm not talking about the actual
> implementation, as the making the current code support this concept should be
> pretty straight-forward. I'm talking about how should the interface work that
> controls this sharing of various lists/caches/pools?
>
> Imagine that you want to transfer several HTTP simultaneous streams using the
> multi interface. By default it'll work as today, with all easy-handles having
> one individual cache of each kind.
>
> Somehow, you should be able to tell (in an order of inreasing complexity):
>
> A) All easy-handles added to a multi handle share all caches.
> B) Specified easy-handles share all caches, the rest have their own.
> C) Specified easy-handles share specificly mentioned caches.
>
> The question is, is level C necesary? Do we gain/lose anything significantly
> by only allowing level B or A?
>
    Funny, I'm thinking the other way around, are A and B necessary?
    Unless I misunderstand, I see C as the abstracted version of those
    two options... Perhaps an interface like (psuedo-code)::

    cache_id = curl_cache_id_get(CURL_MULTI_CACHE, CURL_TYPE_HTTP);
    curl_easy_setopt(ch, CURLOPT_CACHE_ID, cache_id);

    And then building any other functionality ontop of that.

> * Mutexing
>
> When we start thinking about sharing data between easy-handles while they're
> in a multi-stack, it is easy to let your thoughts drift off and yes, then
> someone will suggest being able to share the above mentioned lists between
> easy-handles that are NOT present in the same multi-handle. Multi-threaded
> applications could indeed benefit a lot from having all or some libcurl
> transfer threads share some or all of the information.
>
> Then we step right in the next pile of questions. How do we deal with the
> mutex problem? libcurl just cannot attempt to mutex sensitive parts, as
> there's no good enough standard for it. pthread might work for most systems,
> but there are just too many different versions that it would be silly to try
> to do mutexing natively for all operating systems libcurl can run on.
>
> Instead, I suggest that we have libcurl call two application specified
> callbacks for retrieving and releasing mutexes, leaving the actual
> implementation for the outside to decide.
>
    yep, I've attached a rough implementation of this (very simple)
    functionality/api...

> * Resource Owners
>
> When suddenly several handles would share one or more resources, we face a
> minor dilemma. Who owns the resources and when are they removed?
>
> I could imagine a system where we remove the resource completely when the
> last handle involved in the sharing is removed. But is that the best possible
> system?
>
> Perhaps we should allow the resources to "live" outside the strict control of
> the handles? I mean, so that you can create a "resource" that continue to
> live without any specific handle being around... Would there be any point
> support that kind of thing?
>

    yes, yes, yes :)

    Within Apache's architecture, where a process can serve more than
    one request, however, program execution needs to be completed for
    each request (and therefore handles would be cleaned up on a request
    basis).

> It could possibly allow us to introduce a separate API for querying for
> resources, like asking if we have a particular cookie set or setting a
> particular cookie in the resource "pool" etc.
>

    This does make things considerably less "easy" however... Resources
    Pools, Caching, Mutexes, *ouch* My poor head :)

    I think a resource pool option would be a good idea, but sensible
    defaults should be established, so this should be a not-so-commonly
    used feature (at least directly), analogous to the sbrk() function,
    never used unless people know exactly what they want :)

> * Pipeline
>
> While on the subject of sharing, I've come to think of another little feature
> we could think about for a while: pipelining. Pipelined HTTP requests are
> requests that are sent to the server before the previous request has been
> fulfilled, to minimize the gap between multiple responses from the same
> server.
>
> I came to think of that we could in fact offer pipelined requests using the
> multi interface. We could offer an option to an easy-handle that makes it
> "hook" on to an already existing connection (on another easy-handle) if such
> a one exists.
>
> It would be a little like two easy-handles sharing a connection cache, and if
> one of them would like to use the *exact* same connection that is already in
> use by the other one, the second request would get pipelined and served after
> the first one is done...
>
> Enough for now. There are many things we could do. What do you think?
>
    Pipelining would be quite nice, is it generally supported? If so, I
    think this would be a real neat feature...

    -Sterling

Received on 2002-01-14