cURL / Mailing Lists / curl-library / Single Mail

curl-library

Re: libcurl and Perl's WWW::Curl::Easy slower than LWP on small HTTP POSTs

From: Martin J. Evans <martin.evans_at_easysoft.com>
Date: Wed, 14 Sep 2011 15:15:16 +0100

On 14/09/11 12:27, Daniel Stenberg wrote:
> On Wed, 14 Sep 2011, Martin J. Evans wrote:
>
>> I am using libcurl under Perl's WWW::Curl::Easy module because it
>> works out a lot faster than the mostly pure Perl LWP module. For
>> HTTP GETs Curl works out a lot faster but with small POSTs Curl
>> works out a lot slower (in real time although faster in CPU time).
>
> Thanks a lot for these interesting results. Let's try to figure out
> why they look the way they do!
>
> I think you need to do some wiresharking of the TCP traffic during
> the tests to properly understand what's going on.

ok, I'll look into that but see below.

>> Through experimentation I have discovered setting either
>> CURLOPT_TCP_NODELAY or CURLOPT_FORBID_REUSE speeds up the POSTs so
>> Curl is faster than LWP.
>
> Have you tried with _only_ CURLOPT_TCP_NODELAY ? And in the other
> end, do you know if LWP disables the Nagle algorithm and does it
> re-use the same connection or not?

I did try CURLOPT_TCP_NODELAY and CURLOPT_FORBID_REUSE separately but did not include the numbers. Obviously these numbers are not exactly the same as before since I've re-run the tests. Here they are:

No special settings:
Benchmark: timing 1000 iterations of curl, lwp...
       curl: 56 wallclock secs ( 1.73 usr + 0.46 sys = 2.19 CPU) @ 456.62/s (n=1000)
        lwp: 22 wallclock secs ( 4.15 usr + 0.59 sys = 4.74 CPU) @ 210.97/s (n=1000)

CURLOPT_FORBID_REUSE:
Benchmark: timing 1000 iterations of curl, lwp...
       curl: 19 wallclock secs ( 0.35 usr + 0.31 sys = 0.66 CPU) @ 1515.15/s (n=1000)
        lwp: 22 wallclock secs ( 4.29 usr + 0.62 sys = 4.91 CPU) @ 203.67/s (n=1000)

CURLOPT_TCP_NODELAY:
Benchmark: timing 1000 iterations of curl, lwp...
       curl: 19 wallclock secs ( 0.43 usr + 0.09 sys = 0.52 CPU) @ 1923.08/s (n=1000)
        lwp: 22 wallclock secs ( 4.11 usr + 0.63 sys = 4.74 CPU) @ 210.97/s (n=1000)

LWP does not reuse the connection by default (and in my case) although it can be made to do so. It involves creating a shared object and adding the LWP objects to it. I didn't really go that far down that path because I saw curl was MUCH faster so LWP is only a fallback in my case for when we don't have curl available.

I don't think LWP sets TCP_NODELAY - at least I cannot see it does from the strace.

socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 3
ioctl(3, SNDCTL_TMR_TIMEBASE or TCGETS, 0xbfc77968) = -1 EINVAL (Invalid argument)
_llseek(3, 0, 0xbfc779b0, SEEK_CUR) = -1 ESPIPE (Illegal seek)
ioctl(3, SNDCTL_TMR_TIMEBASE or TCGETS, 0xbfc77968) = -1 EINVAL (Invalid argument)
_llseek(3, 0, 0xbfc779b0, SEEK_CUR) = -1 ESPIPE (Illegal seek)
fcntl64(3, F_SETFD, FD_CLOEXEC) = 0
fcntl64(3, F_GETFL) = 0x2 (flags O_RDWR)
fcntl64(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0
connect(3, {sa_family=AF_INET, sin_port=htons(82), sin_addr=inet_addr("172.20.0.102")}, 16) = -1 EINPROGRESS (Operation now in progress)

select(8, NULL, [3], NULL, {180, 0}) = 1 (out [3], left {179, 999993})
connect(3, {sa_family=AF_INET, sin_port=htons(82), sin_addr=inet_addr("172.20.0.102")}, 16) = 0
fcntl64(3, F_GETFL) = 0x802 (flags O_RDWR|O_NONBLOCK)
fcntl64(3, F_SETFL, O_RDWR) = 0
fcntl64(3, F_GETFL) = 0x2 (flags O_RDWR)
fcntl64(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0

write(3, "POST /xx/xxxx HTTP/1.1\r\nTE: deflate,gzip;q=0.3\r\nConnection: TE, close\r\nHost: xxxxxxxx.xxxxxxxx.xxxxx:82\r\nUser-Agent: libwww-perl/6.02\r\nContent-Length: 46\r\nContent-Type: application/x-www-form-urlencoded;charset=utf-8\r\nCookie: XXXXXXX=ACE6843B40A358E0E040007F01007083%3A634c10dc53fe309a299e4989e544fcea6d6e6f85155ba563c7d8c735f5a6938b\r\nCookie2: $Version=\"1\"\r\n\r\npassword=xxxxxxxx&method=login&username=xxxxxx", 406) = 406

I note post data written in one go and www-form-urlencoded so I agree that is a difference between the Curl test case and the LWP one - which I will rectify.

>> I was a little surprised setting CURLOPT_TCP_NOLDELAY made a
>> difference but as it does it sort of suggested to me that maybe the
>> POST headers are sent first then followed up by the form data
>> (i.e., at least 2 writes to the socket).
>
> Without NODELAY I would rather suspect that you'll see a short wait
> when a single request has been sent and the Nagle waits a short while
> to see if you send more data to merge into the same packet. When it
> doesn't come, it sends off the request anyway.

yeah, I know which is why I highlighted the double write.

>> Also, if CURLOPT_FORBID_REUSE is set I suppose it is possible
>> libcurl does a shutdown(write) on the socket after the last write
>> which would also expedite any unset data.
>
> Did you try with _only_ CURLOPT_FORBID_REUSE set and not NODELAY
> then? I can't recall any shutdown(write) done by libcurl...

I tried with CURLOPT_FORBID_REUSE and that was the strace from the original email.

The strace for CURPLOPT_TCP_NODELAY only is:

socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 3

# NOTE TCP_NODELAY set:

setsockopt(3, SOL_TCP, TCP_NODELAY, [1], 4) = 0
fcntl64(3, F_GETFL) = 0x2 (flags O_RDWR)
fcntl64(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0
connect(3, {sa_family=AF_INET, sin_port=htons(82), sin_addr=inet_addr("172.20.0.102")}, 16) = -1 EINPROGRESS (Operation now in progress)
poll([{fd=3, events=POLLOUT}], 1, 1000) = 1 ([{fd=3, revents=POLLOUT}])
getsockopt(3, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
getpeername(3, {sa_family=AF_INET, sin_port=htons(82), sin_addr=inet_addr("172.20.0.102")}, [16]) = 0
getsockname(3, {sa_family=AF_INET, sin_port=htons(49737), sin_addr=inet_addr("172.20.0.102")}, [16]) = 0
send(3, "POST /v1/user HTTP/1.1\r\nHost: xxxxxxxx.xxxxxxxx.xxxxx:82\r\nAccept: */*\r\nAccept-Encoding: gzip\r\nCookie: XXXXXXX=ACE6AD3069DBBA12E040007F01007149%3A6d95293bd5e789f88e87a5f000ff9250a1a3cec81524612f8686e1b5618bbeda\r\nContent-Length: 354\r\nContent-Type: multipart/form-data; boundary=----------------------------30bbaa93aace\r\n\r\n", 320, MSG_NOSIGNAL) = 320
poll([{fd=3, events=POLLIN|POLLPRI}, {fd=3, events=POLLOUT}], 2, 0) = 1 ([{fd=3, revents=POLLOUT}])
poll([{fd=3, events=POLLIN|POLLPRI}, {fd=3, events=POLLOUT}], 2, 0) = 1 ([{fd=3, revents=POLLOUT}])
poll([{fd=3, events=POLLIN|POLLPRI}, {fd=3, events=POLLOUT}], 2, 1000) = 1 ([{fd=3, revents=POLLOUT}])
poll([{fd=3, events=POLLIN|POLLPRI}, {fd=3, events=POLLOUT}], 2, 0) = 1 ([{fd=3, revents=POLLOUT}])
recv(3, "HTTP/1.1 200 OK\r\nServer: nginx/1.0.4\r\nDate: Wed, 14 Sep 2011 13:55:57 GMT\r\nContent-Type: application/json; charset=UTF-8\r\nConnection: keep-alive\r\nSet-Cookie: XXXXXXX=ACE690F10782A262E040007F010070C2%3Ad02e9863b75438bd469c60ffaa5bca0eb2d02020e56721e0b34ac77f467a81aa; path=/; expires=Thu, 15-Sep-2011 13:55:57 GMT\r\nContent-Length: 68\r\nAccess-Control-Allow-Headers: *\r\nAccess-Control-Allow-Methods: GET, POST, OPTIONS\r\nAccess-Control-Max-Age: 1728000\r\nAccess-Control-Allow-Origin: *\r\n\r\n[\"[XXX-XXXXXXX-XXXX]\",0,0,\"\",1,[\"ACE690F10782A262E040007F010070C2\"]]", 16384, 0) = 552

and for the second and subsequent posts all to the same url:

send(3, "POST /v1/user HTTP/1.1\r\nHost: xxxxxxxx.xxxxxxxx.xxxxx:82\r\nAccept: */*\r\nAccept-Encoding: gzip\r\nCookie: XXXXXXX=ACE690F10783A262E040007F010070C2%3A1f43327f8fe791f6ecdc056ce9ef34782f400ddd6077d5a46f5afe1beb44598e\r\nContent-Length: 156\r\nContent-Type: multipart/form-data; boundary=----------------------------cbb999035fb2\r\n\r\n", 320, MSG_NOSIGNAL) = 320
send(3, "------------------------------cbcf74d4560b\r\nContent-Disposition: form-data; name=\"method\"\r\n\r\nsend_keep_alive\r\n------------------------------cbcf74d4560b--\r\n", 156, MSG_NOSIGNAL) = 156
poll([{fd=3, events=POLLIN|POLLPRI}, {fd=3, events=POLLOUT}], 2, 0) = 1 ([{fd=3, revents=POLLOUT}])
poll([{fd=3, events=POLLIN|POLLPRI}, {fd=3, events=POLLOUT}], 2, 0) = 1 ([{fd=3, revents=POLLOUT}])
poll([{fd=3, events=POLLIN|POLLPRI}, {fd=3, events=POLLOUT}], 2, 1000) = 1 ([{fd=3, revents=POLLOUT}])
poll([{fd=3, events=POLLIN|POLLPRI}, {fd=3, events=POLLOUT}], 2, 0) = 1 ([{fd=3, revents=POLLOUT}])
poll([{fd=3, events=POLLIN|POLLPRI}], 1, 1000) = 1 ([{fd=3, revents=POLLIN}])
poll([{fd=3, events=POLLIN|POLLPRI}], 1, 0) = 1 ([{fd=3, revents=POLLIN}])
recv(3, "HTTP/1.1 200 OK\r\nServer: nginx/1.0.4\r\nDate: Wed, 14 Sep 2011 13:59:22 GMT\r\nContent-Type: application/json; charset=UTF-8\r\nConnection: keep-alive\r\nSet-Cookie: XXXXXXX=ACE690F10783A262E040007F010070C2%3A1f43327f8fe791f6ecdc056ce9ef34782f400ddd6077d5a46f5afe1beb44598e; path=/; expires=Thu, 15-Sep-2011 13:59:22 GMT\r\nContent-Length: 81\r\nAccess-Control-Allow-Headers: *\r\nAccess-Control-Allow-Methods: GET, POST, OPTIONS\r\nAccess-Control-Max-Age: 1728000\r\nAccess-Control-Allow-Origin: *\r\n\r\n[\"[XXX-XXXXXXX-XXXX]\",0,0,\"\",1,[\"1316012362\",\"ACE690F10783A262E040007F010070C2\"]]", 16384, 0) = 565

>> When I strace the code I do in fact see 2 separate writes to the
>> socket for the header and the form data.
>
> Oh right, because you use multipart formpost. If you'd change your
> test to instead use a "normal" small POST you'll see that libcurl
> won't do that. It might make sense to change the test to use that. It
> should make the speed very similar to a plain GET test.

You are of course right and I did not spot that.

> The reason multipart formpost causes two (or more) writes instead of
> a single one, even when the post is very small, is mostly due to the
> layout of the internal APIs and convenience/laziness. It could very
> well be optimized to do better...
>
>> Ideally I'd like to reuse the socket but I am wary of setting
>> CURLOPT_TCP_NOLDELAY.
>
> For this kind of use case, I'd say using CURLOPT_TCP_NODELAY makes
> sense. Of course, in a real life application you might not send so
> small posts and you might not fire them off that rapidly immediately
> next after each other I guess...

Unfortunately, our "real" life is not that far removed from this test case - a lot of posts are being fired off to the same url with relatively small amounts of data.

>> If libcurl breaks the post header and the form data into 2 writes
>> why?
>
> You'll have to see the lib/formdata.c and how the data is presented
> to the libcurl uploader for that to be clear. I wouldn't mind having
> libcurl optimized to prefer a single write() for small multipart
> formposts, but I'm not sure I'm willing to waste a lot of time or
> code readability for it. It's really a corner case.

Point taken - I think my first steps are to make the 2 cases the same - the same form encoding that is.

Thanks for your help.

I'll get back with results.

Martin

-- 
Martin J. Evans
Easysoft Limited
http://www.easysoft.com
-------------------------------------------------------------------
List admin: http://cool.haxx.se/list/listinfo/curl-library
Etiquette:  http://curl.haxx.se/mail/etiquette.html
Received on 2011-09-14