curl-and-python

Re: Tricks to optimizing PUT performance?

From: Dima Tisnek <dimaqq_at_gmail.com>
Date: Thu, 24 Jan 2013 11:43:11 +0100

I went ahead an tried to reproduce your workload, sent 100M data in 10K
reads over http and then https aes128 sha1 over localhost

air:~ dima$ time openssl s_server -msg -debug -nocert -cipher
'ADH-AES128-SHA' -accept 8080 > somefile.ssl
^C

real 0m5.425s
user 0m1.316s
sys 0m0.429s

air:~ dima$ time ./test-pycurl-put.py
[snip]
real 0m4.078s
user 0m1.810s
sys 0m0.284s

Well I get a spike of 100% cpu usage for individual processes, but that's
all for the good cause, according to openssl speed, aes-128-cbc crunches up
to 120MB/s and sha1 some 300MB/s, in other words, ~60MB/s I get is not
superb, but quite acceptable.

For comparison, http pycurl time output:
real 0m0.946s
user 0m0.175s
sys 0m0.177s

yes it takes 1 second to push 100MB through, but it hardly taxes the
processor, namely a tenth of a single core.

If you get much lower throughput than this, perhaps it's down to how you
process the data you send in python, e.g. if you keep reallocating or
"resizing" large strings, that could lead to O(N^2).

d.

On 24 January 2013 01:35, Mark Seger <mjseger_at_gmail.com> wrote:

> I've managed to get to the point where I can now upload in-memory strings
> of data, via a REST interface. Very cool stuff. In fact the good news I
> can hit very high network rates with strings on the order of 100MB or more.
> The bad news is smaller strings upload very slowly and I have no idea why.
>
> To try to figure out what's going on I surrounded the perform() call with
> time.time() to measure the delay and I'm finding that even with payloads on
> the order of 32KB it's always taking over a second to execute the upload
> call whereas other interfaces go much faster on the order of under 0.1
> sec/upload. Has anyone else every observed this behavior?
>
> Digging a little deeper I've observed a few things:
> - when my callback is called for data, it is passed a chunk size of 16384
> and I wonder if asking for bigger chunks would result in fewer calls which
> in turn could speed things up
> - another thing I noticed is very high CPU loads, not for the small
> strings but for the larger ones I'm seeing close to 100% of a single CPU
> being saturated. Is this caused by encryption? is there any way to speed
> it up or choose a faster algorithm. Or is it something totally different?
> - I'm also guessing the overhead is not caused by data compression because
> I'm intentionally sending a string of all spaces which are highly
> compressible and I do see the full 100MB go over the network and if it were
> compressed I'd expect to see far less.
>
> I know pycurl is very heavily used everywhere and that this could simply
> be a case of operator error on my part. If anyone would like to see my
> code I'd be happy to send it along, but for now I thought I'd just keep it
> to a couple of simple questions in case the answer is an obvious one.
>
> -mark
>
>
> _______________________________________________
> http://cool.haxx.se/cgi-bin/mailman/listinfo/curl-and-python
>
>

_______________________________________________
http://cool.haxx.se/cgi-bin/mailman/listinfo/curl-and-python

Received on 2013-01-24