curl-and-python

Re: Pycurl and the curl_multi_socket_action API

From: Rayene Ben Rayana <rayene.benrayana_at_gmail.com>
Date: Fri, 6 Apr 2012 17:04:45 +0200

Thank you Utsav,

Are you sure that the pycurl multi does not accept additional easy handles
while performing ?

Actually, I managed to get my virtual users authenticate and start a web
browsing session with random urls that are added to the multi in the main
loop. It does work perfectly. I just have a problem with the m.select()
that does not block enough leading to CPU over-consumption. I didn't try to
diagnose it since I'm trying to avoid the select().
The main loop is very similar to the one given in the first example that
you gave.

Threading is not a good alternative for me because it not only about
crawling performance : The timing is also important to have a realistic
simulation. The virtual users simulate real web traffic: They select a
random page, download it, parse it to download all the related media at
once (images, css, js, etc.,) and finally, they "sleep" for a defined
duration before requesting another page.

A thread pool with a task queue would delay requests. A thread (or more)
per user would lead to too much concurrency and would give poor results !

I really wanted to stick with python because it makes it easier to parse
the html and because classes would make the code much more readable than
linear C (object inheritance is perfect to implement different user
behaviors). Maybe I'll use pycurl to download the html pages and send the
related media (images, css, js, etc.,) to hiperfifo.c to take advantage of
the strengths of both languages.

Ideas are welcome :)

Cheers,

On Fri, Apr 6, 2012 at 2:36 PM, Daniel Stenberg <daniel_at_haxx.se> wrote:

> On Fri, 6 Apr 2012, Utsav Sabharwal wrote:
>
> In general multi curl is non blocking so it could have provided us same
>> effects in a single thread if we keep adding urls even during the multi
>> curl run but then in pycurl trying to add while multi curl is performing is
>> not possible.
>>
>
> That seems like a really stupid restriction you should work on fixing...
>
>
> --
>
> / daniel.haxx.se
> ______________________________**_________________
> http://cool.haxx.se/cgi-bin/**mailman/listinfo/curl-and-**python<http://cool.haxx.se/cgi-bin/mailman/listinfo/curl-and-python>
>

_______________________________________________
http://cool.haxx.se/cgi-bin/mailman/listinfo/curl-and-python
Received on 2012-04-06