curl-and-python

RE: Function to retrieve mulltiple URLs asyncrously

From: Kjetil Jacobsen <kjetilja_at_gmail.com>
Date: Mon, 14 Mar 2005 13:29:04 +0100

the fix to your other problem seems simple, just remember to add the
successfully downloaded documents to the set you want to return at the right
place:
....
        while 1:
            num_q, ok_list, err_list = m.info_read()
            for c in ok_list:
                m.remove_handle(c)
                print "Success:", c.url, c.getinfo(pycurl.EFFECTIVE_URL)
                freelist.append(c)
                r.append((c.url, c.res.getvalue()))
...

the fix is the relocation of the statement 'r.append(...)'.

previously you looped through the set of handles (which is equal to the
number of connections) when everything was downloaded, resulting in only
getting the last <number of connections> documents returned.

-----Original Message-----
From: gf gf [mailto:unknownsoldier93_at_yahoo.com]
Sent: 11 March 2005 16:26
To: Kjetil Jacobsen; curl-and-python_at_cool.haxx.se; Daniel Stenberg
Subject: RE: Function to retrieve mulltiple URLs asyncrously

Great.

What about the other problem - the fact that only the last batch of pages
was returned? I can't figure that one out either.

--- Kjetil Jacobsen <kjetilja_at_gmail.com> wrote:

> ok, then perhaps that's the source of the problem the original poster
> encountered (as the code that was posted had a select call with a
> timeout).
>
> i'll change the select call to require a timeout to be set, and update
> the example and test code in pycurl to use this.
>
> curl_multi_timeout will surely be handy, in the meanwhile a 1.0 sounds
> like a good number to use :)
>
> - kjetil
>
> -----Original Message-----
> From: curl-and-python-bounces_at_cool.haxx.se
> [mailto:curl-and-python-bounces_at_cool.haxx.se] On Behalf Of Daniel
> Stenberg
> Sent: 11 March 2005 14:01
> To: curl stuff in python
> Cc: 'gf gf'
> Subject: RE: Function to retrieve mulltiple URLs asyncrously
>
> On Fri, 11 Mar 2005, Kjetil Jacobsen wrote:
>
> > that code is part of an outer loop which
> subsequently calls info_read
> > and select. this whole procedure (perform,
> info_read, select) is done
> > until there is no more work to do.
>
> Ah, ok. I missed that. info_read should however only be necessary if
> there actually is one or more transfers that completed.
>
> > one thing though -- the code does a select(..)
> without at timeout, so
> > unless there is activity on the file descriptors
> this will block
> > infinitely. is this harmless or does the multi
> api assume that select
> > times out periodically?
>
> You need to call it periodically to allow it to do its timers and
> things.
>
> An future libcurl release will feature a
> curl_multi_timeout() that'll let
> you know the longest possible time you should wait before calling
> libcurl again (unless action was detected on the sockets of course).
>
> --
> Daniel Stenberg -- http://curl.haxx.se -- http://daniel.haxx.se
> Dedicated custom curl help for hire:
> http://haxx.se/curl.html
> _______________________________________________
> http://cool.haxx.se/mailman/listinfo/curl-and-python
>
>

                
__________________________________
Do you Yahoo!?
Make Yahoo! your home page
http://www.yahoo.com/r/hs

_______________________________________________
http://cool.haxx.se/mailman/listinfo/curl-and-python
Received on 2005-03-14