New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
curl infinite loop calling CURLMOPT_TIMERFUNCTION callback #3537
Comments
Without a way to reproduce or figure out the internal state at the time of the crash this is really hard for us to handle. libcurl has a system to avoid calling the callback again for the same timeout. This rather looks like the time tree is damaged or something. Have you made sure that you're not accessing any libcurl handles simultaneously from more than one thread? |
Yes, I am aware of that. Unfortunately, this appears to happen randomly
I had the opportunity to look into curl code. From my understanding I can't see how that mechanism will stop the callback from being called multiple times due to this piece of code: else {
/* Asked to run due to time-out. Clear the 'lastcall' variable to force
update_timer() to trigger a callback to the app again even if the same
timeout is still the one to run after this call. That handles the case
when the application asks libcurl to run the timeout prematurely. */
memset(&multi->timer_lastcall, 0, sizeof(multi->timer_lastcall));
} maybe it is handled somewhere else?
Yes, libcurl is running in 1 thread only and on top of that it is stranded. This started to occur when we upgraded libcurl from version 7.58.0 to 7.63.0, not sure if it is related. |
Sure, but that logic is there to handle when libcurl is called to handle a timeout before the timeout has actually expired internally. Since before that time, the necessary actions won't be performed and thus libcurl wants to get told again when the timeout actually has expired. That shouldn't be possible to occur in a loop, unless I'm missing something. Additionally: even when this happens in a rapid loop (which of course it shouldn't), that still shouldn't be a reason for a crash. You seem to call libcurl back again in the callback that curl tells you about the new timeout in. What if you don't do that and instead call it using the regular timeout handling so that you don't get this recursive effect? Or as a test, consider "now" (the zero value) to be 1 millisecond and see if that changes anything. libcurl counts expiry based on milliseconds internally, could that loop happen within the same millisecond? Do we get at least millisecond accuracy on the timer in iOS?
Lots of things have changed in curl in that time period, but we haven't changed any behavior in this area - at least not on purpose. |
This is a copy of the CURLMOPT_TIMERFUNCTION callback: //CURLMOPT_TIMERFUNCTION
static int curlMultiTimerCB(CURLM *multi, long timeoutMS, NetworkManager* nmanager)
{
nmanager->m_asioTimer.cancel();
if (timeoutMS > 0)
{
nmanager->m_asioTimer.expires_from_now(boost::posix_time::millisec(timeoutMS));
nmanager->m_asioTimer.async_wait([nmanager](const boost::system::error_code& error)
{
if (error)
return;
int stillRunningHandles;
auto ccode = curl_multi_socket_action(nmanager->m_curlMulti, CURL_SOCKET_TIMEOUT, 0, &stillRunningHandles);
std::lock_guard<std::mutex> lock(nmanager->m_mutex);
nmanager->checkCURLHandles();
});
return 0;
}
if (timeoutMS == 0) //success
{
int stillRunningHandles;
auto ccode = curl_multi_socket_action(nmanager->m_curlMulti, CURL_SOCKET_TIMEOUT, 0, &stillRunningHandles);
nmanager->checkCURLHandles();
}
return 0;
} can you identify something wrong in the code? anything that can be improved?
Note that curl documentation tells that with a timeout_ms value of 0 we should call
I think it is up to curl to ensure that millisecond accuracy is guaranteed in iOS. This is the thing that bothers me the most, this loop is not happening in Android nor Windows for some reason. Reference: https://curl.haxx.se/libcurl/c/CURLMOPT_TIMERFUNCTION.html |
(Was digging into a different thing but I noticed this issue and know the answer so I may as well answer)
This may be a misinterpretation and/or miswording of the documentation. You're supposed to call the function from the next iteration of your event loop, instead of calling it immediately. The same thing would happen if you actually did set a 0 ms timer, and the documentation could be worded as "0 is not a special value: a timer should be set whenever timeout_ms is not -1". |
Maybe, but calling back like that is done by example already and should probably work. I asked @rgoomes to try the other approach just to see if that changes anything or not. @rgoomes wrote:
I don't know iOS and I don't know what time accuracy it provides. I was hoping someone who works on that platform could just confirm it for us. libcurl is highly portable and runs on way more platforms than any of us in the core team owns or masters ourselves... |
Yes, assuming a timeout of 0 as 1 millisecond apparently fixes the problem. However, I don't like this workaround of delaying the process by 1 millisecond. |
Just to be clear here you're saying basically you have asynchronous callback but they are set to run sequentially in the same thread (ie stranded)? Like what thread is the async call to curl_multi_socket_action and who's to say that's not also running in another thread and where is the synchronization? We don't guarantee accessing the same handle from more than one thread at a time, libcurl doesn't have internal thread synchronization you must provide that on your own. Review thread safety. This is the only report I've heard of this and so I think it may be your code. If this is entirely reproducible you can try bisecting it. |
Next attempt I suggest would be to follow @TvdW's advice and use the 0 as a normal timeout as well, so that it'll just be scheduled to run immediately next, but not called back directly from within the callback. That will avoid the recursive risk and at the same time avoid a 1 millisecond delay. |
That approach also seems to work. Apparently, any approach that avoids the recursive risk is enough to fix the loop. However, everyone seems to be doing exactly the opposite, at least implementations using boost asio with curl, that is, they call immediately |
There are several possible explanations to why you report this and not others:
We would need more debugging to figure out the truth. Personally I'm not a fan of calling libcurl back from within its own callbacks. I think I'll update the examples doing this to stop them from encouraging that. |
From within the timer callbacks. Recursive is problematic for several reasons. Discussed in #3537
To reiterate "you must never use a single handle from more than one thread at any given time." If your async call is in a different thread then that could cause this issue. AFAIK forcing consecutive execution of the calls (what I assume you mean by stranded) doesn't guarantee they're all in the same thread, that will depend on your code. If you were going to debug this further I'd record the thread id/. Like why do you need that guard in the async function but not otherwise |
I will suggest that we close this issue because we have not managed to reproduce it and thus nobody is working on a fix, there's a working work-around provided that makes it not trigger anymore and the examples are now modified to not encourage the pattern that triggered this issue in the first place. |
Yes, you can close it. No more issues were found with the suggested advice. |
When libcurl is built with --enable-threaded-resolver (which is default) and the version of the library is 7.60 or above, libcurl calls a timer callback with exponentially increasing timeout_ms value during DNS resolving. This behaviour was introduced in curl-7_59_0-36-g67636222f (see [1], [2]). During first ten milliseconds the library sets a timer to a passed time divided by three (see Curl_resolver_getsock()). It is possible that passed time is zero during at least several thousands of iterations. Before this commit we didn't set a libev timer in curl_multi_timer_cb() when a timeout_ms value is zero, but call curl_multi_process() immediately. Libcurl however can call curl_multi_timer_cb() again and here we're going into a recursion that stops only when timeous_ms becomes positive. Often we generate several thousands of stack frames within this recursion and exceed 512KiB of a fiber stack size. The fix is easy: always set a libev timer to call curl_multi_process() even when a timeout_ms value is zero. The reason why we did the call to curl_multi_process() immediately is the unclear wording in the CURLMOPT_TIMERFUNCTION option documentation. This documentation page was fixed in curl-7_64_0-88-g47e540df8 (see [3], [4], [5]). There is also the related change in curl-7_60_0-121-g3ef67c686 (see [6], [7]): after this commit libcurl calls a timer callback with zero timeout_ms during a first three milliseconds of asynchronous DNS resolving. Fixes #4179. [1]: curl/curl#2419 [2]: curl/curl@6763622 [3]: curl/curl#3537 [4]: curl/curl#3601 [5]: curl/curl@47e540d [6]: curl/curl#2685 [7]: curl/curl@3ef67c6
When libcurl is built with --enable-threaded-resolver (which is default) and the version of the library is 7.60 or above, libcurl calls a timer callback with exponentially increasing timeout_ms value during DNS resolving. This behaviour was introduced in curl-7_59_0-36-g67636222f (see [1], [2]). During first ten milliseconds the library sets a timer to a passed time divided by three (see Curl_resolver_getsock()). It is possible that passed time is zero during at least several thousands of iterations. Before this commit we didn't set a libev timer in curl_multi_timer_cb() when a timeout_ms value is zero, but call curl_multi_process() immediately. Libcurl however can call curl_multi_timer_cb() again and here we're going into a recursion that stops only when timeous_ms becomes positive. Often we generate several thousands of stack frames within this recursion and exceed 512KiB of a fiber stack size. The fix is easy: set a libev timer to call curl_multi_process() even when a timeout_ms value is zero. The reason why we did the call to curl_multi_process() immediately is the unclear wording in the CURLMOPT_TIMERFUNCTION option documentation. This documentation page was fixed in curl-7_64_0-88-g47e540df8 (see [3], [4], [5]). There is also the related change in curl-7_60_0-121-g3ef67c686 (see [6], [7]): after this commit libcurl calls a timer callback with zero timeout_ms during a first three milliseconds of asynchronous DNS resolving. Fixes #4179. [1]: curl/curl#2419 [2]: curl/curl@6763622 [3]: curl/curl#3537 [4]: curl/curl#3601 [5]: curl/curl@47e540d [6]: curl/curl#2685 [7]: curl/curl@3ef67c6
When libcurl is built with --enable-threaded-resolver (which is default) and the version of the library is 7.60 or above, libcurl calls a timer callback with exponentially increasing timeout_ms value during DNS resolving. This behaviour was introduced in curl-7_59_0-36-g67636222f (see [1], [2]). During first ten milliseconds the library sets a timer to a passed time divided by three (see Curl_resolver_getsock()). It is possible that passed time is zero during at least several thousands of iterations. Before this commit we didn't set a libev timer in curl_multi_timer_cb() when a timeout_ms value is zero, but call curl_multi_process() immediately. Libcurl however can call curl_multi_timer_cb() again and here we're going into a recursion that stops only when timeous_ms becomes positive. Often we generate several thousands of stack frames within this recursion and exceed 512KiB of a fiber stack size. The fix is easy: set a libev timer to call curl_multi_process() even when a timeout_ms value is zero. The reason why we did the call to curl_multi_process() immediately is the unclear wording in the CURLMOPT_TIMERFUNCTION option documentation. This documentation page was fixed in curl-7_64_0-88-g47e540df8 (see [3], [4], [5]). There is also the related change in curl-7_60_0-121-g3ef67c686 (see [6], [7]): after this commit libcurl calls a timer callback with zero timeout_ms during a first three milliseconds of asynchronous DNS resolving. Fixes #4179. [1]: curl/curl#2419 [2]: curl/curl@6763622 [3]: curl/curl#3537 [4]: curl/curl#3601 [5]: curl/curl@47e540d [6]: curl/curl#2685 [7]: curl/curl@3ef67c6
When libcurl is built with --enable-threaded-resolver (which is default) and the version of the library is 7.60 or above, libcurl calls a timer callback with exponentially increasing timeout_ms value during DNS resolving. This behaviour was introduced in curl-7_59_0-36-g67636222f (see [1], [2]). During first ten milliseconds the library sets a timer to a passed time divided by three (see Curl_resolver_getsock()). It is possible that passed time is zero during at least several thousands of iterations. Before this commit we didn't set a libev timer in curl_multi_timer_cb() when a timeout_ms value is zero, but call curl_multi_process() immediately. Libcurl however can call curl_multi_timer_cb() again and here we're going into a recursion that stops only when timeous_ms becomes positive. Often we generate several thousands of stack frames within this recursion and exceed 512KiB of a fiber stack size. The fix is easy: set a libev timer to call curl_multi_process() even when a timeout_ms value is zero. The reason why we did the call to curl_multi_process() immediately is the unclear wording in the CURLMOPT_TIMERFUNCTION option documentation. This documentation page was fixed in curl-7_64_0-88-g47e540df8 (see [3], [4], [5]). There is also the related change in curl-7_60_0-121-g3ef67c686 (see [6], [7]): after this commit libcurl calls a timer callback with zero timeout_ms during a first three milliseconds of asynchronous DNS resolving. Fixes #4179. [1]: curl/curl#2419 [2]: curl/curl@6763622 [3]: curl/curl#3537 [4]: curl/curl#3601 [5]: curl/curl@47e540d [6]: curl/curl#2685 [7]: curl/curl@3ef67c6 (cherry picked from commit 47bd51b)
When libcurl is built with --enable-threaded-resolver (which is default) and the version of the library is 7.60 or above, libcurl calls a timer callback with exponentially increasing timeout_ms value during DNS resolving. This behaviour was introduced in curl-7_59_0-36-g67636222f (see [1], [2]). During first ten milliseconds the library sets a timer to a passed time divided by three (see Curl_resolver_getsock()). It is possible that passed time is zero during at least several thousands of iterations. Before this commit we didn't set a libev timer in curl_multi_timer_cb() when a timeout_ms value is zero, but call curl_multi_process() immediately. Libcurl however can call curl_multi_timer_cb() again and here we're going into a recursion that stops only when timeous_ms becomes positive. Often we generate several thousands of stack frames within this recursion and exceed 512KiB of a fiber stack size. The fix is easy: set a libev timer to call curl_multi_process() even when a timeout_ms value is zero. The reason why we did the call to curl_multi_process() immediately is the unclear wording in the CURLMOPT_TIMERFUNCTION option documentation. This documentation page was fixed in curl-7_64_0-88-g47e540df8 (see [3], [4], [5]). There is also the related change in curl-7_60_0-121-g3ef67c686 (see [6], [7]): after this commit libcurl calls a timer callback with zero timeout_ms during a first three milliseconds of asynchronous DNS resolving. Fixes #4179. [1]: curl/curl#2419 [2]: curl/curl@6763622 [3]: curl/curl#3537 [4]: curl/curl#3601 [5]: curl/curl@47e540d [6]: curl/curl#2685 [7]: curl/curl@3ef67c6 (cherry picked from commit 47bd51b)
@bagder are you aware of the analysis in issue tarantool/tarantool#4179 ? According to the issue, this commit #2419 seems to be the culprit of this behavior. Can you please confirm? Full analysis is in tarantool/tarantool@70821c6 |
Yes, that seems to be entirely correct and a fine analysis. The reason is quite simply the "optmization" I did in #2419 that now makes it possible to call the timer function again with a zero millisecond timeout and for many systems calling libcurl back again recursively will then risk this kind of stack smash. I can't figure out a sensible or easy fix for this in another way than the way we already discussed and have documented. |
I did this
Recently, we got into a situation where curl is calling our CURLMOPT_TIMERFUNCTION callback in loop and then in crashes possible due to stack buffer overflow. The stack trace looks something like this:
and it goes forever until it crashes. Our timer callback is very similar to multi_timer_cb callback of asiohiper. The callback is being called in loop with
timeout_ms = 0
. We are unable to replicate the issue, but it already happened 3 times in a week and it is always in iOS.curl/libcurl version
7.63.0 release
operating system
iOS only, iPhone 7, iOS 12, 64-bit
additional information
it is not reproducible with libcurl 7.58.0
The text was updated successfully, but these errors were encountered: