cURL / Mailing Lists / curl-library / Single Mail

curl-library

libcurl and Perl's WWW::Curl::Easy slower than LWP on small HTTP POSTs

From: Martin J. Evans <martin.evans_at_easysoft.com>
Date: Wed, 14 Sep 2011 10:44:37 +0100

I am using libcurl under Perl's WWW::Curl::Easy module because it works out a lot faster than the mostly pure Perl LWP module. For HTTP GETs Curl works out a lot faster but with small POSTs Curl works out a lot slower (in real time although faster in CPU time).

When I run a benchmark doing 1000 POSTs to a local HTTP server with a very small form data I results like this:

Benchmark: timing 1000 iterations of curl, lwp...
       curl: 57 wallclock secs ( 2.08 usr + 0.33 sys = 2.41 CPU) @ 414.94/s (n=1000)
        lwp: 22 wallclock secs ( 4.41 usr + 0.67 sys = 5.08 CPU) @ 196.85/s (n=1000)

In other words, libcurl is slower in real time but uses far less CPU. It appears as though the curl library or something under it is waiting.

Through experimentation I have discovered setting either CURLOPT_TCP_NODELAY or CURLOPT_FORBID_REUSE speeds up the POSTs so Curl is faster than LWP. I was a little surprised setting CURLOPT_TCP_NOLDELAY made a difference but as it does it sort of suggested to me that maybe the POST headers are sent first then followed up by the form data (i.e., at least 2 writes to the socket). Also, if CURLOPT_FORBID_REUSE is set I suppose it is possible libcurl does a shutdown(write) on the socket after the last write which would also expedite any unset data.

Benchmark with CURLOPT_FORBID_REUSE set:
Benchmark: timing 1000 iterations of curl, lwp...
       curl: 19 wallclock secs ( 0.33 usr + 0.34 sys = 0.67 CPU) @ 1492.54/s (n=1000)
        lwp: 22 wallclock secs ( 4.11 usr + 0.64 sys = 4.75 CPU) @ 210.53/s (n=1000)

When I strace the code I do in fact see 2 separate writes to the socket for the header and the form data. However, when I set CURLOPT_FORBID_REUSE it is still much faster but I don't see any shutdown on the socket before the reading of the response. For CURLOPT_FORBID_REUSE I see (slightly edited for strings and to remove clock_gettime calls):

send(3, "POST /v1/user HTTP/1.1\r\nHost: xxx.yyy.zzz:82\r\nAccept: */*\r\nAccept-Encoding: gzip\r\nCookie: XXXXXXX=ACE4697958E69149E040007F01003280%3A251023611bb7823d6d80d187bc9ca3a137137216e1ec57ca267666d5c680f009\r\nContent-Length: 156\r\nContent-Type: multipart/form-data; boundary=----------------------------931a9489e128\r\n\r\n", 320, MSG_NOSIGNAL) = 320
poll([{fd=3, events=POLLIN|POLLPRI}, {fd=3, events=POLLOUT}], 2, 0) = 1 ([{fd=3, revents=POLLOUT}])
poll([{fd=3, events=POLLIN|POLLPRI}, {fd=3, events=POLLOUT}], 2, 0) = 1 ([{fd=3, revents=POLLOUT}])
send(3, "------------------------------931a9489e128\r\nContent-Disposition: form-data; name=\"method\"\r\n\r\nsend_keep_alive\r\n------------------------------931a9489e128--\r\n", 156, MSG_NOSIGNAL) = 156
poll([{fd=3, events=POLLIN|POLLPRI}, {fd=3, events=POLLOUT}], 2, 1000) = 1 ([{fd=3, revents=POLLOUT}])
poll([{fd=3, events=POLLIN|POLLPRI}, {fd=3, events=POLLOUT}], 2, 0) = 1 ([{fd=3, revents=POLLOUT}])
poll([{fd=3, events=POLLIN|POLLPRI}], 1, 1000) = 1 ([{fd=3, revents=POLLIN}])
poll([{fd=3, events=POLLIN|POLLPRI}], 1, 0) = 1 ([{fd=3, revents=POLLIN}])
recv(3, "HTTP/1.1 200 OK\r\nServer: nginx/1.0.4\r\nDate: Wed, 14 Sep 2011 09:30:34 GMT\r\nContent-Type: application/json; charset=UTF-8\r\nConnection: keep-alive\r\nSet-Cookie: XXXXXXX=ACE4697958E69149E040007F01003280%3A251023611bb7823d6d80d187bc9ca3a137137216e1ec57ca267666d5c680f009; path=/; expires=Thu, 15-Sep-2011 09:30:34 GMT\r\nContent-Length: 81\r\nAccess-Control-Allow-Headers: *\r\nAccess-Control-Allow-Methods: GET, POST, OPTIONS\r\nAccess-Control-Max-Age: 1728000\r\nAccess-Control-Allow-Origin: *\r\n\r\n[\"[XXX-XXXXXXX-XXXX]\",0,0,\"\",1,[\"1315996234\",\"ACE4697958E69149E040007F01003280\"]]", 16384, 0) = 565
time(NULL) = 1315992634
close(3) = 0

It has been some time since seriously wrote C socket code but does this sound right? Ideally I'd like to reuse the socket but I am wary of setting CURLOPT_TCP_NOLDELAY. If libcurl breaks the post header and the form data into 2 writes why? and is this the source of the problem I'm seeing?

Any ideas?

BTW, I first posted this on perl monks at http://www.perlmonks.org/?node_id=925760

Martin

-- 
Martin J. Evans
Easysoft Limited
http://www.easysoft.com
-------------------------------------------------------------------
List admin: http://cool.haxx.se/list/listinfo/curl-library
Etiquette:  http://curl.haxx.se/mail/etiquette.html
Received on 2011-09-14