cURL / Mailing Lists / curl-library / Single Mail

curl-library

Re: http headers free and multi

From: Mohun Biswas <m_biswas_at_mailinator.com>
Date: Fri, 13 Oct 2006 20:16:52 -0400

Daniel Stenberg wrote:
> On Tue, 10 Oct 2006, Mohun Biswas wrote:
>
>> This has come up before here, most recently in a similar request from
>> me a month or so back. At that time Daniel said he was open to
>> feedback from others who would like help from the API in retrieving
>> pointers in order to free them. I guess your message would count as a
>> vote in favor.
>
> "In favour of what?" is then my question.

I thought I stated it clearly above: In favor of having "help from the
API in retrieving pointers in order to free them". Or you could just say
in favor of curl_easy_getopt, though the name doesn't matter.

> Let's say we'd add a curl_easy_getopt() function that can return (some
> of the) options you can set to the easy handle.
> How would the application know what data to request from libcurl in
> order to free it after an operation?

I (truly, sincerely, and respectfully) do not understand the counter
argument here. It seems quite clear to me that the API could encourage
better programming practice by allowing users to retrieve pointers which
need to be freed. I apologize if I'm just repeating prior arguments, but
that's because I haven't seen them convincingly refuted yet.

The libcurl API has a very strict, very clearly defined wall between
that data which is managed by the library and that which must be managed
by the application. This is a good thing.

If I malloc something and pass it to the library, I can do one of 4
things: (1) forget about it and accept the leak (may be fine for a
short-lived program), (2) write some code to stash the pointer via
CURLOPT_PRIVATE, (3) write code to store it some other way, or (4) ask
the curl handle "what was that pointer again? I need to free it". Of
course (4) is not an option right now.

The standard advice is to use CURLOPT_PRIVATE. Yes, it can be done, and
it's quite easy if there's just one pointer to be tracked. But if you're
tracking and freeing multiple pointers per handle, or if (as in my case)
you're already using CURLOPT_PRIVATE for something else, you need to
allocate and populate an extra data structure. Now you're doing extra
allocations to track other allocations. This has at least the
deleterious side effects of increased memory usage, increased
complexity, and a (minor) performance cost.

But the bigger question is, in what universe is it good programming
practice to keep parallel databases with overlapping data? Isn't that
always a recipe for synchronization problems? Libcurl knows exactly what
pointers I've allocated and handed it. If I write my own code to track
them, the best that can happen is that I do extra work and use extra
memory to track the same data the handle already has. The worst is that
the two "databases" diverge. Not that I ever took a course called
"Programming 101", but this feels like something they'd teach in it.

I see that there have been a number of memory-leak issues raised on this
list lately. Of course they typically turn out to be application errors
rather than libcurl bugs but to me that reinforces the point: memory
management is one of the trickiest aspects of application programming
and one of the most persistent sources of bugs.

As to your specific question "how would the application know what data
to request from libcurl in order to free it after an operation?", I'm
baffled as to why you'd ask. Perhaps I misunderstand. How does that
question differ from "how would the application know what URL to use?"
The application knows what data it has allocated and needs to free. The
distinction I'd make is this: the FACT that I malloc an error buffer is
known at compile time. The ADDRESS of the buffer is only known at
runtime. Knowing things at compile time is better/easier/safer/more robust.

Thanks for listening,
MB
Received on 2006-10-14