curl-and-python

feature req (or bug) CurlMulti should keep reference to added Curl objects

From: Dima Tisnek <dimaqq_at_gmail.com>
Date: Mon, 7 May 2012 09:01:16 +0300

Hi,

I recently ran into a gotcha with pycurl that took me quite a while to
get my mind around, in a nutshell consider this code:

multi = pycurl.CurlMulti()
for url in alist:
    c = pycurl.Curl()
    c.setopt(...) # url, write function, header function, post, etc
    multi.add_handle(c)

while True:
    _, active = multi.perform()
    if not active: break

Somehow this turned out to perform only the last request from the url
list. What I figured a week later was happening was that Curl objects
were getting garbage collected, all except last a reference to which
was kept in `c`. That is CurlMulti doesn't keep references to added
Curl handles.

I think existing behaviour is not very pythonic, I instinctively
assumed CurlMulti.add_handle to have semantics similar to list.add.

I would rather CurlMulti kept references to added handles. I'm not
sure what it ought to release the references, quick counter-intuitive
hack is when request completed, a better solution to keep references
until explicitly removed, which allows to query error status per
handle and what not.

Of course what I propose is a semantic change.

And it might break someone's code.

I hope it doesn't break much existing, because those who kept an
explicit reference to Curl objects in a python data structure can
still explicitly call Curl.close on those handles if they want magic
CurlMulti auto-removal or explicitly remove their handles from
CurlMulti. I find it easier to discard and re-create a CurlMulti
object anyway.

Thoughts, comments?

For the time being I added a workaround like this in python:

+reqs = []
while url in alist:
    c = pycurl.Curl()
+ reqs.append(c)
_______________________________________________
http://cool.haxx.se/cgi-bin/mailman/listinfo/curl-and-python
Received on 2012-05-07