Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

curl_multi_socket_action with large posts won't multiplex properly #3436

Closed
TvdW opened this issue Jan 4, 2019 · 7 comments
Closed

curl_multi_socket_action with large posts won't multiplex properly #3436

TvdW opened this issue Jan 4, 2019 · 7 comments
Labels

Comments

@TvdW
Copy link

TvdW commented Jan 4, 2019

I did this

I'm performing multiple concurrent requests to a HTTP/2 server, each with a large POST body. The server execution takes exactly 1 second (because sleep(1000)), and can run many in parallel. Using curl_multi_perform, this takes one second as expected. Using curl_multi_socket_action, it takes one second per request.

Code: https://gist.github.com/TvdW/c41cb1e74feb0c5f79be83f9d330a74b - argv[1] refers to the server to test with, I've been using https://home.xifon.eu/sleep/1000 which runs a http2 server that sleeps for exactly one second. When commenting out the "BAD" section (thus allowing the "GOOD" section to run), the difference can be seen.

I expected the following

Doing 10 concurrent requests that each take one second should take one second total.

curl/libcurl version

$ ./bin/curl -V
curl 7.63.0 (x86_64-pc-linux-gnu) libcurl/7.63.0 OpenSSL/1.0.2k zlib/1.2.7 c-ares/1.15.0 nghttp2/1.35.1
Release-Date: 2018-12-12
Protocols: dict file ftp ftps gopher http https imap imaps pop3 pop3s rtsp smb smbs smtp smtps telnet tftp 
Features: AsynchDNS IPv6 Largefile NTLM NTLM_WB SSL libz HTTP2 UnixSockets HTTPS-proxy 

operating system

CentOS 7

@TvdW
Copy link
Author

TvdW commented Jan 4, 2019

Some further debugging shows that when I get the fd_set from curl_multi_fdset, it differs from the ones passed to me via socketfunction in that it asks for a write on the socket, but using those sets instead does not mitigate the problem. It seems that curl_multi_perform is doing some additional processing that curl_multi_socket_action does not.

@TvdW
Copy link
Author

TvdW commented Jan 4, 2019

Just to be sure this isn't a recent regression: this is reproducible on 7.55.1 on Ubuntu (but also 7.63.0 on CentOS as indicated in the initial post).

@bagder bagder added the HTTP/2 label Jan 4, 2019
@bagder
Copy link
Member

bagder commented Jan 4, 2019

The code looks like it uses libcurl wrongly. It uses a mix of event-based and non event-based functions in a way I'm not convinced is fine. It should wait only for the sockets told about by the socket callback and it should not use curl_multi_fdset() at all. Also the timeout logic looks broken.

@TvdW
Copy link
Author

TvdW commented Jan 4, 2019

Thanks for checking. I'm not using curl_multi_fdset in the code, and the curl_multi_perform in the bottom of the file is no-opped by the loop above it so there should be no mix of event-based and non event-based code in the file (unless I'm missing something).

The timeout logic is indeed broken but should be sufficient for the purpose of demonstrating the issue.

@bagder
Copy link
Member

bagder commented Jan 5, 2019

runs a http2 server that sleeps for exactly one second

And exactly how sure are we that this sleep only affects one particular stream and not the whole connection?

@TvdW
Copy link
Author

TvdW commented Jan 6, 2019

runs a http2 server that sleeps for exactly one second

And exactly how sure are we that this sleep only affects one particular stream and not the whole connection?

That's a fair question. Here's the output of h2load:

$ dd if=/dev/zero of=./zero_data count=100000 bs=1
100000+0 records in
100000+0 records out
100000 bytes (100 kB) copied, 0.211724 s, 472 kB/s

$ h2load -d zero_data -m 100 -n 10 -c 1 https://home.xifon.eu/sleep/1000
starting benchmark...
spawning thread #0: 1 total client(s). 10 total requests
TLS Protocol: TLSv1.2
Cipher: ECDHE-RSA-AES128-GCM-SHA256
Server Temp Key: ECDH P-256 256 bits
Application protocol: h2
progress: 10% done
progress: 20% done
progress: 30% done
progress: 40% done
progress: 50% done
progress: 60% done
progress: 70% done
progress: 80% done
progress: 90% done
progress: 100% done

finished in 1.03s, 9.71 req/s, 1.47KB/s
requests: 10 total, 10 started, 10 done, 10 succeeded, 0 failed, 0 errored, 0 timeout
status codes: 10 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 1.51KB (1550) total, 681B (681) headers (space savings 33.24%), 510B (510) data
                     min         max         mean         sd        +/- sd
time for request:      1.02s       1.02s       1.02s       260us    70.00%
time for connect:     6.29ms      6.29ms      6.29ms         0us   100.00%
time to 1st byte:      1.03s       1.03s       1.03s         0us   100.00%
req/s           :       9.71        9.71        9.71        0.00   100.00%

In other words, the server is able to handle 10 concurrent requests in 1 second, despite all being executed over the same connection. Each request takes one second.

The code I posted (test.c) should be doing the same thing (including the 100KB POST), but it takes about ten times as long, and considering the h2load output here I'm pretty convinced it's not a server-side thing.

Swapping the event loop code for curl_multi_perform changes the behavior of libcurl completely, and suddenly it will perform all ten requests concurrently.

@bagder
Copy link
Member

bagder commented Jan 8, 2019

I've figured it out. Now I just need to fix it too ... =)

bagder added a commit that referenced this issue Jan 8, 2019
Fixes #3436

After LOTS of scratching my head, I eventually realized that even when doing
10 uploads in parallel, sometimes the socket callback to the application that
tells it what to wait for on the socket, looked like it would reflect the
status of just the single transfer that just changed state.

Digging into the code revealed that this was indeed the truth. When multiple
transfers are using the same connection, the application did not correctly get
the *combined* flags for all transfers which then could make it switch to READ
(only) when in fact most transfers wanted to get told when the socket was
WRITEABLE.

A separate but related regression had also been introduced by me when I
cleared connection/transfer association better a while ago, as now the logic
couldn't find the connection and see if that was marked as used by more
transfers and then it would also prematurely remove the socket from the socket
hash table even in times other transfers were still using it!

Make sure that each socket stored in the socket hash has a "combined" action
field of what to ask the application to wait for, that is potentially the ORed
action of multiple parallel transfers. And remove that socket hash entry only
if there are no transfers left using it.

The socket hash entry stored an association to a single transfer using that
socket - and when curl_multi_socket_action() was called to tell libcurl about
activities on that specific socket only that transfer was "handled".

This was WRONG, as a single socket/connection can be used by numerous parallel
transfers and not necessarily a single one.

We now store a list of handles in the socket hashtable entry and when libcurl
is told there's traffic for a particular socket, it now iterates over all
known transfers using that single socket.
bagder added a commit that referenced this issue Jan 8, 2019
Fixes #3436

 Problem 1

After LOTS of scratching my head, I eventually realized that even when doing
10 uploads in parallel, sometimes the socket callback to the application that
tells it what to wait for on the socket, looked like it would reflect the
status of just the single transfer that just changed state.

Digging into the code revealed that this was indeed the truth. When multiple
transfers are using the same connection, the application did not correctly get
the *combined* flags for all transfers which then could make it switch to READ
(only) when in fact most transfers wanted to get told when the socket was
WRITEABLE.

 Problem 1b

A separate but related regression had also been introduced by me when I
cleared connection/transfer association better a while ago, as now the logic
couldn't find the connection and see if that was marked as used by more
transfers and then it would also prematurely remove the socket from the socket
hash table even in times other transfers were still using it!

 Fix 1

Make sure that each socket stored in the socket hash has a "combined" action
field of what to ask the application to wait for, that is potentially the ORed
action of multiple parallel transfers. And remove that socket hash entry only
if there are no transfers left using it.

 Problem 2

The socket hash entry stored an association to a single transfer using that
socket - and when curl_multi_socket_action() was called to tell libcurl about
activities on that specific socket only that transfer was "handled".

This was WRONG, as a single socket/connection can be used by numerous parallel
transfers and not necessarily a single one.

 Fix 2

We now store a list of handles in the socket hashtable entry and when libcurl
is told there's traffic for a particular socket, it now iterates over all
known transfers using that single socket.
@bagder bagder closed this as completed in 4c35574 Jan 10, 2019
@lock lock bot locked as resolved and limited conversation to collaborators Apr 10, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Development

No branches or pull requests

2 participants