curl-and-python

Re: Tricks to optimizing PUT performance?

From: Mark Seger <mjseger_at_gmail.com>
Date: Fri, 25 Jan 2013 12:58:43 -0500

so I finally got around to running oprofile on this and it looks like it's
spending most of its time doing compression and I don't see how. if I copy
a 100MB file w/o curl of all spaces it compresses very nicely and I see a
lot less data going over the network. so I'd think if compression was used
it would have shrunk but it you look at the data below you can see 100MB
went over the wire.

I guess my question becomes why is compression using so much of the CPU and
no data is being compressed?

-mark

On Fri, Jan 25, 2013 at 9:30 AM, Mark Seger <mjseger_at_gmail.com> wrote:

> dima - thanks for the reply. sorry for not getting back yesterday but I
> was offline and don't want you to think this isn't important to me. I see
> where you're getting a pycul run in <1sec so should I assume you're on the
> same system as the target of the PUT? I'm going over a wire...
>
> My issue with large data isn't so much the speed, it's the CPU load and
> it's sustained at very high levels for a single upload. This is what a
> 200MB upload looks like when I monitor it with collectl:
>
>
> #<--------CPU--------><----------Disks-----------><----------Network---------->
> #cpu sys inter ctxsw KBRead Reads KBWrit Writes KBIn PktIn KBOut
> PktOut
> 6 0 36 18 0 0 0 0 1 6 1
> 6
> 3 0 129 51 0 0 0 0 4 107 790
> 53
> 20 0 627 95 0 0 0 0 30 771 7913
> 158
> 76 7 2557 116 0 0 0 0 67 1708 31656
> 1009
> 98 10 6757 127 0 0 0 0 178 4564 38086
> 2357
> 69 9 4715 107 0 0 0 0 122 3117 25116
> 1573
> 0 0 10 14 0 0 0 0 0 1 0
> 1
>
> and as you can see the load it quite high which on a small core system
> means you can't get much else done if you want to multi-thread. What I'm
> trying to figure out is where all the CPU is being spend and if it's
> possible to reduce it. It's certainly possibly I'm doing something wrong
> in my code. Does this look ok?
>
> c = pycurl.Curl()
> c.setopt(c.URL, '%s' % url)
> c.setopt(c.HTTPHEADER, [auth_token])
> c.setopt(c.UPLOAD, 1)
>
> c.setopt(pycurl.READFUNCTION, read_callback(1).callback)
> c.setopt(pycurl.INFILESIZE, objsize)
> c.perform()
>
> where the url and auth_token are build independent of this connection. My
> read_callback simply pulls data out of a big string and returns it in 16384
> size chunks. While I don't think it would do any thing to improve the CPU
> load, is there a way to increase the size of the chunks? Maybe some other
> setopt call?
>
> But my other issue is when I run with objects as little as 1k, the PUT
> takes over 1 full second just to execute the perform() call and that
> doesn't sound right either. I can do many more small object uploads with
> other libraries and I've gotta believe it's something wrong on the way I've
> written the code.
>
> -mark
>
>
>
> On Thu, Jan 24, 2013 at 5:43 AM, Dima Tisnek <dimaqq_at_gmail.com> wrote:
> >
> > I went ahead an tried to reproduce your workload, sent 100M data in 10K
> reads over http and then https aes128 sha1 over localhost
> >
> > air:~ dima$ time openssl s_server -msg -debug -nocert -cipher
> 'ADH-AES128-SHA' -accept 8080 > somefile.ssl
> > ^C
> >
> > real 0m5.425s
> > user 0m1.316s
> > sys 0m0.429s
> >
> > air:~ dima$ time ./test-pycurl-put.py
> > [snip]
> > real 0m4.078s
> > user 0m1.810s
> > sys 0m0.284s
> >
> > Well I get a spike of 100% cpu usage for individual processes, but
> that's all for the good cause, according to openssl speed, aes-128-cbc
> crunches up to 120MB/s and sha1 some 300MB/s, in other words, ~60MB/s I get
> is not superb, but quite acceptable.
> >
> > For comparison, http pycurl time output:
> > real 0m0.946s
> > user 0m0.175s
> > sys 0m0.177s
> >
> > yes it takes 1 second to push 100MB through, but it hardly taxes the
> processor, namely a tenth of a single core.
> >
> > If you get much lower throughput than this, perhaps it's down to how you
> process the data you send in python, e.g. if you keep reallocating or
> "resizing" large strings, that could lead to O(N^2).
> >
> > d.
> >
> >
> >
> > On 24 January 2013 01:35, Mark Seger <mjseger_at_gmail.com> wrote:
> >>
> >> I've managed to get to the point where I can now upload in-memory
> strings of data, via a REST interface. Very cool stuff. In fact the good
> news I can hit very high network rates with strings on the order of 100MB
> or more. The bad news is smaller strings upload very slowly and I have no
> idea why.
> >>
> >> To try to figure out what's going on I surrounded the perform() call
> with time.time() to measure the delay and I'm finding that even with
> payloads on the order of 32KB it's always taking over a second to execute
> the upload call whereas other interfaces go much faster on the order of
> under 0.1 sec/upload. Has anyone else every observed this behavior?
> >>
> >> Digging a little deeper I've observed a few things:
> >> - when my callback is called for data, it is passed a chunk size of
> 16384 and I wonder if asking for bigger chunks would result in fewer calls
> which in turn could speed things up
> >> - another thing I noticed is very high CPU loads, not for the small
> strings but for the larger ones I'm seeing close to 100% of a single CPU
> being saturated. Is this caused by encryption? is there any way to speed
> it up or choose a faster algorithm. Or is it something totally different?
> >> - I'm also guessing the overhead is not caused by data compression
> because I'm intentionally sending a string of all spaces which are highly
> compressible and I do see the full 100MB go over the network and if it were
> compressed I'd expect to see far less.
> >>
> >> I know pycurl is very heavily used everywhere and that this could
> simply be a case of operator error on my part. If anyone would like to see
> my code I'd be happy to send it along, but for now I thought I'd just keep
> it to a couple of simple questions in case the answer is an obvious one.
> >>
> >> -mark
> >>
> >>
> >> _______________________________________________
> >> http://cool.haxx.se/cgi-bin/mailman/listinfo/curl-and-python
> >>
> >
> >
> > _______________________________________________
> > http://cool.haxx.se/cgi-bin/mailman/listinfo/curl-and-python
> >
>
>

_______________________________________________
http://cool.haxx.se/cgi-bin/mailman/listinfo/curl-and-python
Received on 2013-01-25