curl-and-python

Program dies on call of multi.select...

From: Sam's Lists <samslists_at_gmail.com>
Date: Tue, 12 Aug 2014 15:09:47 -0700

I have a rather complicated crawler that seems to die often - but not
always at the same place.

What's exasperating is that there is no exceptions, stack traces, etc.,
printed. I was only able to find where it died by adding lots of print
statements, and seeing what was the last thing to be printed.

Here's a somewhat simplified version of the code:

  multi = pycurl.CurlMulti()
    print("ag2")
    now = datetime.datetime.utcnow()
    print("ag3")
    for counter, website in enumerate(websites, 1):
        print("ag4")
        assert website.crawl_type in ('standard', 'refresh', 'new')
        print("ag5")
        website.grabber = WebSite.Resource(website.next_page.original_url,
                                           anonymous=Options.anonymous)
        print("ag6")
        website.next_page.crawled_ts = now
        print("ag7")
        multi.add_handle(website.grabber._curl)
        print("ag8")

    print("ag9")
    # Number of seconds to wait for a timeout to happen
    if Options.test:
        SELECT_TIMEOUT = 30.0 # Set for longer cause blicker_pierce takes
forever
                                    # on the additional start page with all
the wines
    else:
        SELECT_TIMEOUT = 10.0
    print("ag10")

    #To do: implement it this way
http://www.josefassad.com/pycurl_curlmulti_mini_howto
    # Stir the state machine into action
    while 1:
        print("ag11")
        ret, num_handles = multi.perform()
        if ret != pycurl.E_CALL_MULTI_PERFORM:
            break

    print("ag12")
    #CauseError
    # Keep going until all the connections have terminated
    while num_handles:
        # The select method uses fdset internally to determine which file
descriptors
        # to check.

        # Todo: This code is looped a lot
        # Should there be a sleep here???? I got no idea

        print("ag12.5")
        print("calling multi.select with:", SELECT_TIMEOUT)
        print("Please don't die here!!!!")
        multi.select(SELECT_TIMEOUT)

_______________________________________________
http://cool.haxx.se/cgi-bin/mailman/listinfo/curl-and-python
Received on 2014-08-13