New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segfaults with http2 on libcurl 7.60.0 #2674
Comments
Here is a backtrace from the above program on macos:
Also sometimes it crashes here:
|
This commit might be the fix you want: 2c15693. It fixed basically the same symptoms. |
I tried rebuilding against the master branch. That fixes the example but my R program still crashes. I haven't been able to translate it to a C example yet but here is the backtrace:
|
Try this additional patch, which really is a follow-up to that commit I mentioned above: diff --git a/lib/url.c b/lib/url.c
index 0cab0a303..f1a6e8db4 100644
--- a/lib/url.c
+++ b/lib/url.c
@@ -1463,10 +1463,11 @@ ConnectionExists(struct Curl_easy *data,
}
if(chosen) {
/* mark it as used before releasing the lock */
chosen->inuse = TRUE;
+ chosen->data = data; /* own it! */
Curl_conncache_unlock(needle);
*usethis = chosen;
return TRUE; /* yes, we found one to use! */
}
Curl_conncache_unlock(needle); |
Yes this seems to resolve my immediate crash, so that is an improvement :-) I would need to investigate if this also resolves my other issues. |
Great! I'll hold off merging this for a while to see if you find something else as well... |
Everything looks good on MacOS now. The crashes were more frequent on Debian, but I don't have time to build a version of libcurl with nghttp2 from source on Linux. Perhaps once this fix is in master I can ask the Debian maintainer of libcurl to give it a try. |
I'm still getting an occosional crash. Might be another problem. Here is a backtrace:
|
so which pointer is NULL there? http.c:1862 doesn't seem to be the same line in my code... |
I think that is a byte offset, not a line number. Itried building libcurl with
|
I can reproduce (using multi-stress.cpp, Linux) using 7.60. Failures I'm seeing appear similar to the ones discussed in #2669 and recently fixed on master branch. When building against curl 7.60 + first fix on that PR => can't seem to reproduce Testing using address-sanitizer and debug configure flags. If you're able to reproduce on Linux using patched curl (or latest git), please let me know --and any suggestions of things to "encourage" seeing this bug would help reproduce--which is important for ability to produce a fix. |
@dtzWill Perhaps this is another unrelated problem. It happens randomly in a very complex web crawler application, so it's not trivial to produce a minimal C example. The good news is I was finally able to create a proper backtrace:
|
Ahh, okay. Great, would have been difficult to proceed if the same test crashed for you but not me (or perhaps others).
Hooray! Hopefully this helps identify the problem :). |
It looks as if |
I don't seem to be able to reproduce this on git master on my two primary linux dev machines. I've run the multi-stress.c example code numerous times on both. |
I don't think the
|
Indeed, the original problem was solved your earlier fix. This is another crash that remains. It happens in my web crawler application after visiting a few hundred pages. I haven't been able to extract a small C example yet. |
Ah, I actually have to read what you write as well! =) |
The application is written in R. It uses one multi-handle, and the user gives it a starting url. Then it repeatedly downloads the html file, extracts hyperlinks, and randomly visits a few of those links. If those links return Each request is done by a new fresh easy-handle, but they all share the same multi-handle. So you end up with a multi handle that has hundreds of connections to random servers. The application still works fine with my OSX stock libcurl, but not with the 7.60+ versions. |
I have created a C example now! Please see the example from my web crawler PR: #2706. For me the example is stable as-is but the crash is triggered when you uncomment the line that does: curl_multi_setopt(multi_handle, CURLMOPT_MAX_HOST_CONNECTIONS, 6); It doesn't consistently crash every time but with this parameter set it happens quite frequently. |
Thanks. I'll dig in asap! |
I tweaked the example in #2706 a bit more. This program is stable using the libcurl that ships with OSX high sierra ( However using the curl master branch, there are several different places where it may crash. It's hard to narrow down why it happens but it appears frequently when we have many connections to different http servers. Thanks for looking into it! |
I can reproduce, and I'm on to something... |
Turned out to be a bit hairy. I figured out some problems but when tightening the bolts around those I stepped in some other goo that needs cleanup for this to work fine... |
@tatsuhiro-t I'm facing an issue here that I can't really explain after having starred on it for a while, so I just wanted to bounce it with you to hear if you might have any thought on A) what might cause this or perhaps B) some thought on what to check/debug next. Here's what I observe: With this crawler example code, we do a lot of parallel transfers and quite a lot of them are multiplexed h2. The crawler has an aggressive time-out on 5 seconds and every once in a while this time-out triggers, which makes curl to stop that particular transfer. When curl stops a http2 transfer due to a time-out, it will (in the Curl_http2_done() function) send a RST_STREAM and then call But here's the strange part: our test program appears to get such a "trailing" packet, a period after the clearing of the association mentioned above, and nghttp2 seems to still associate it with the former handle, returning to curl a handle that is no longer in the game !! This leads to a crash here where 'stream' is used, which is an invalid pointer since the handle is not used for a transfer any longer so I've specifically logged the stream id => easy handle association when set to nghttp2 and I get a log output for when it is cleared and yet several lines below Here's some sample output from a typical crash:
Searching backwards in the debug output I have, grepping for the specific handle involved in this crash
The last line being the output from the crawler.c application, when libcurl has return a time-out error for that particular transfer. The I've failed to reproduce this problem with just two connections where one gets timed out. I can reproduce it fairly easy but sometimes it takes upward 20,000 transfers for it to trigger! :-O If anyone else has any bright ideas or suggestions, throw them my way! |
Hey @bagder as you mentioned, #2688 seams to have the same root-cause. Here some more information from my investigations. Sorry for the wall of text. I fear there is something wrong with the handling of this h2 (nghttp2_session) and somehow the stream is still in its stream-map?! Backtrace
frame-5 0x0000007f84d3246c in http2_send (conn=..., sockindex=..., mem=..., len=..., err=...) at http2.c:1862
frame-4 0x0000007f84d3177c in h2_session_send (data=..., h2=...) at http2.c:1526
frame-3 0x0000007f84cc6044 in nghttp2_session_send (session=...) at nghttp2_session.c:3215
frame-2 0x0000007f84cc579c in nghttp2_session_mem_send_internal (session=..., data_ptr=..., fast_cb=...) at nghttp2_session.c:2945
frame-1 0x0000007f84cc1420 in nghttp2_session_close_stream (session=..., stream_id=..., error_code=...) at nghttp2_session.c:1162
frame-0 on_stream_close (session=..., stream_id=..., error_code=..., userp=...) at http2.c:882
easy_handle times out gets freed and on processing a second one the already vanished stream is closed
Log
Same easy_handle after initThe easy_handle where the http2_send starts with Log
ConclusionShould work since calloc is used to allocated the memory (inits with 0). Is the easy_handle removed from the stream?The easy_handle Log
ConclusionShould work since nghttp2_session_set_stream_user_data is called. |
I can reproduce the crash, and ASAN showed the same backtrace as #2674 (comment) I could not see on_stream_close is called and it gets non NULL stream user data after Curl_http2_done is called. What I saw is opposite, on_stream_close is called and then Curl_http2_done is called. I'm not sure it can cause a problem. conn->bundle was set to NULL well before the crash was observed. It was set to NULL by Curl_conncache_extract_oldest. Does curl use native thread? |
@jeroen /* HTTP/2 cannot blacklist multiplexing since it is a core
functionality of the protocol */
conn->bundle->multiuse = BUNDLE_MULTIPLEX; << !! crash Above |
@gvanem we're using that example to repro this issue. @tatsuhiro-t yeah, I've actually fixed a few other problems in my local branch to end up where I am, the My current dev branch is now available at https://github.com/curl/curl/tree/bagder/2674-http2-segfault note that it includes a few crude fprintf()s and and asserts that will be removed before this gets merged anywhere. |
Status updateI merged a few fixes (#2733) I've found so far, but that still just changed the nature of the problems. The crawler code still crashes reproducibly. The hard part is that it takes a whole lot of transfers before it crashes so there's a lot going on at the moment which makes it difficult to trace the origin of the problem and not just work on the results of a bug done "a while ago". |
I think I've found the most problematic issues. It's (again) an issue where things are wrongly modified in the easy handle when a connection/stream is handled, so at the end of a stream it resets things in the handle that causes havoc for other streams on the same connection. I'm now testing my fixes an extra round and making sure that all current tests also keep running. I hope to push another PR with this work early next week. |
That's great news! Thanks for getting to the bottom of this. |
They will be merged into master. I'll let you know when it's time! (I've been out and about a bit the past few days) |
Hello I'm from the same company as ThomasObenaus, and helped debugging the crash too. The crash resulted in accessing a former valid, but now invalid (deleted) Curl_easy handle. After a timeout thomas did a curl_multi_remove_handle and a curl_easy_cleanup. When curl wants to close the session, or just find a frame with this now invalid stream id in the session, albeit handling a another Curl_easy handle, it called a callback with a invalid Curl_easy handle -> BOOM, My fast workaround is, track the session and stream_id in Curl_easy and clear the stream_userdata it when removing the handle in curl_multi_remove_handle. Its just a bell wire around the engine block, but it seems to work for a now. But i prefer a proper bugfix too :-). |
@ReneReusner I already landed a few fixes and there are more coming. Once I've landed these fixes I will be very interested in further debugging and testing to make sure we iron out all the last wrinkles. |
Thanks, looking good. Are you still planning on a patch release soon? |
@ThomasObenaus Did you try 81377ca ? |
Let me remind you that I can't reproduce your problem. With @jeroen's example I managed to find and fix a whole slew of issues, but as they've gradually been fixed that sample program now works really well. I'm not entirely convinced it runs 100% fine all the time yet, but without a way to reproduce your case @ThomasObenaus, we risk having to live with that until someone can reproduce it. |
@bagder I totally understand this. But yet I found no other way to reproduce than the one I described (which is not transferable to you). Maybe one last question: |
Hello. I managed to get some hours at work to look in this problem again. What i had found: On a curl time out in multi.c (~1400) timeout_ms = Curl_timeleft(data, &now,
(data->mstate <= CURLM_STATE_WAITDO)?
TRUE:FALSE);
if(timeout_ms < 0) {
/* Handle timed out */
............
/* Force connection closed if the connection has indeed been used */
if(data->mstate > CURLM_STATE_DO) {
streamclose(data->easy_conn, "Disconnected with pending data");
stream_error = TRUE;
}
result = CURLE_OPERATION_TIMEDOUT;
(void)multi_done(&data->easy_conn, result, TRUE);
/* Skip the statemachine and go directly to error handling section. */
goto statemachine_end; The multi_done here static CURLcode multi_done(struct connectdata **connp,
CURLcode status, /* an error if this is called
after an error was detected */
bool premature) needs to clear the user data in the associated stream. If it don't to it, then sometimes when nghttp2 iterate over all stream in a session, sometimes it will find a "stale" user_data and may call a callback with a deleted Curl_easy -> BOOM. I added it in void Curl_http2_done(struct connectdata *conn, bool premature)
{ Please also take a look in multi.c ~ 1350 static CURLMcode multi_runsingle(struct Curl_multi *multi,
struct timeval now,
struct Curl_easy *data)
...
data->state.pipe_broke = FALSE;
data->easy_conn = NULL;
/* My comment for a debug assertion
clear http2 session infos, the http2 session is already gone,
but the assert in http2.c:1979
assert(!(conn->data->tmp1) || (conn->data->tmp1 == h2)); & co
can still bite us, if we get a new connection to continue */ On a broken pipe, Cause i'm not familiar with curl, i don't know if the associated connection can be still used? If yes (maybe its placed in the connection cache), we need to remove the stream_userdata here too. One another thing. Its not a error but can break on 64bit machines a backtrace in gdb: if(data->set.fdebug) {
return curlDebugCallback(data, type, ptr, size, data->set.debugdata);
/*return (*data->set.fdebug)(data, type, ptr, size, For example on a stale Curl_easy, set.fdebug points to a invalid address, when the cpu jump to it, it crashes yes. BUT (cause of some brain damaged ABI designers, redzone, framepointer optional) the gdb needs a "valid" PC to get the unwind information. If its not in a known region -> no backtrace. Then you need a walk_stack macro. Callbacks for debugging on 64bit machines are not always a good idea. |
As of recently my programs started to segfault when Debian switch to libcurl with openssl+nghttp2. I think there are may be multiple issues.
Here is a simple program (link) that does the following:
The program works on my macos with native libcurl (
curl 7.54.0 (x86_64-apple-darwin17.0) libcurl/7.54.0 LibreSSL/2.0.20 zlib/1.2.11 nghttp2/1.24.0
)However on Debian (buster) the program segfaults after a while, both when compiled with libcurl4-openssl-dev as well as libcurl4-gnutls-dev. Both versions are libcurl 7.60.0.
I can also reproduce the problem on MacOS if I build libcurl with nghttp2 support:
And now the program will segfault here as well:
And if I then
brew unlink curl
and recompile, everything is fine again.The text was updated successfully, but these errors were encountered: