curl-and-python

Re: Performance less than ideal, suggestions?

From: my name <gm41lu53r_at_gmail.com>
Date: Fri, 5 Feb 2010 17:07:29 -0500

Thanks for the response. I profiled the code last night but made some
changes -- I'm going to reprofile it again tonight, not quite sure what I'm
looking at though. I'm using cprofiler & gprof2dot.

Today I was able to hit 11MB/s peak, however it's far from constant, and the
urls processed a second didn't increase much - ~50 urls a second, that
includes checking for keywords, and updating the database.

Is it not realistic to expect my bandwidth to stay at atleast 80% 24/7 while
my code is running? I was thinking I could accomplish this by presetting up
connection objections for libcurl multi and passing them when it's getting
low on existing connections. This way I wouldn't have to setup new
connection objects halfway through and then start them..

I guess all of these ideas are moot until I reprofile and figure out what's
going on.

Thanks.

On Fri, Feb 5, 2010 at 4:13 PM, <johansen_at_sun.com> wrote:

> On Thu, Feb 04, 2010 at 12:46:19PM -0500, my name wrote:
> > I've modified retriever-multi.py to constantly fetch URLs from a database
> > and do some work on it. I'm able to push out roughly 5MB/s despite having
> > over 80mbps at my disposal. Is there any way to get better performance
> out
> > of this? I'm thinking I should implement cStringIO rather than writing to
> a
> > file and re-reading it in.
>
> Have you done any performance analysis on this code? Python2.6 comes
> with a pretty decent C-based profiler. Using the profier might give you
> some insight into where your program is spending its time. On a basic
> level, though, it would help to figure out the general area where your
> performance is poor. 80mbps is 10MB/s but unless your destination is on
> the same uncongested network as the source, it may be hard to get the
> max theoretical throughput. Is the program bound by the disk, CPU, or
> network? If you can figure out the answer to that question, it will
> give you a better idea of what part of your code to change.
>
> -j
>
> _______________________________________________
> http://cool.haxx.se/cgi-bin/mailman/listinfo/curl-and-python
>

_______________________________________________
http://cool.haxx.se/cgi-bin/mailman/listinfo/curl-and-python
Received on 2010-02-05