Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[regression] rtorrent breaks by commit "extract_if_dead: follow-up to 54b201b48c90a" #3542

Closed
tholin opened this issue Feb 8, 2019 · 9 comments

Comments

@tholin
Copy link

tholin commented Feb 8, 2019

I don't know if this is a bug in curl or rtorrent.

The bittorrent client rtorrent uses libcurl for http(s) trackers. If commit bbae24c "extract_if_dead: follow-up to 54b201b" is present rtorrent will loop over epoll_wait() and use 100% cpu. The commit was found by bisecting. The symptoms are identical to this older curl bug that also affected rtorrent rakshasa/rtorrent#580 (comment). I'm not familiar with the rtorrent code or exactly how it's using libcurl.

To reproduce the problem use a linux system with curl-7.64.0 or above with rtorrent-0.9.7. As long as there are some torrents running rtorrent should start executing around 500,000 epoll_wait calls/s after a few minutes.

$ curl -V
curl 7.64.0 (x86_64-pc-linux-gnu) libcurl/7.64.0 OpenSSL/1.0.2q zlib/1.2.11
Release-Date: 2019-02-06
Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 pop3s rtsp smtp smtps telnet tftp
Features: Largefile NTLM SSL libz TLS-SRP UnixSockets HTTPS-proxy

@bagder
Copy link
Member

bagder commented Feb 8, 2019

I would urge you to try to come up with an example source code that only uses libcurl to allow us to reproduce the problem and help out the debugging. I will not be productive to have us build and try to untangle a full rtorrent build for this purpose.

@tholin
Copy link
Author

tholin commented Feb 9, 2019

I don't know anything about the libcurl api, how it's supposed to be used or how rtorrent use it. Investigating would take a lot of time and since I already have a workaround it would not be productive for me.

I wanted to at least let you know that commit broke a downstream project. I'll file a bug report to my distribution. Perhaps the maintainer there wants to investigate it.

@jay
Copy link
Member

jay commented Feb 10, 2019

@tholin can you try the patch at #3541 (comment)

@tholin
Copy link
Author

tholin commented Feb 11, 2019

@tholin can you try the patch at #3541 (comment)

I applied the patch to curl-7.64.0 but it had no effect. Rtorrent still started epolling like before.

@jay
Copy link
Member

jay commented Feb 11, 2019

Ok. Patch worked for the other reporter and is in master now. I don't know what could be causing it. Are you using proxies by any chance? Can you try master and this:

diff --git a/lib/url.c b/lib/url.c
index bc47685..f24a650 100644
--- a/lib/url.c
+++ b/lib/url.c
@@ -964,8 +964,10 @@ static bool extract_if_dead(struct connectdata *conn,
       /* The protocol has a special method for checking the state of the
          connection. Use it to check if the connection is dead. */
       unsigned int state;
+      struct Curl_easy *olddata = conn->data;
       conn->data = data; /* use this transfer for now */
       state = conn->handler->connection_check(conn, CONNCHECK_ISDEAD);
+      conn->data = olddata;
       dead = (state & CONNRESULT_DEAD);
     }
     else {

@tholin
Copy link
Author

tholin commented Feb 11, 2019

I first confirmed that the problem still exists in master (cac0e4a) and then I applied the patch but it had no effect.

I'm not using any proxy. Here is my rtorrent.rc config if it helps:

bind=0.0.0.0
throttle.min_peers.normal.set = 60
throttle.max_peers.normal.set = 150
network.port_range.set = 8000-8000
protocol.pex.set = yes
directory.default.set = /mnt/virtual/torrent
dht.mode.set = on
dht.port.set = 6881
schedule = watch_directory,10,10,load.start=/mnt/virtual/torrent/*.torrent
group.seeding.ratio.enable=
ratio.min.set=110
session.path.set = /home/tholin/.rtorrent
ipv4_filter.load = ~/.rtorrent_blocklist,unwanted
file.prioritize_toc.set=1
encoding.add = UTF-8

@jay
Copy link
Member

jay commented Feb 11, 2019

Ok. How about this:

diff --git a/lib/url.c b/lib/url.c
index bc47685..46c8fb5 100644
--- a/lib/url.c
+++ b/lib/url.c
@@ -964,8 +964,10 @@ static bool extract_if_dead(struct connectdata *conn,
       /* The protocol has a special method for checking the state of the
          connection. Use it to check if the connection is dead. */
       unsigned int state;
+      struct Curl_easy *olddata = conn->data;
       conn->data = data; /* use this transfer for now */
       state = conn->handler->connection_check(conn, CONNCHECK_ISDEAD);
+      conn->data = olddata;
       dead = (state & CONNRESULT_DEAD);
     }
     else {
@@ -994,7 +996,6 @@ struct prunedead {
 static int call_extract_if_dead(struct connectdata *conn, void *param)
 {
   struct prunedead *p = (struct prunedead *)param;
-  conn->data = p->data; /* transfer to use for this check */
   if(extract_if_dead(conn, p->data)) {
     /* stop the iteration here, pass back the connection that was extracted */
     p->extracted = conn;

@tholin
Copy link
Author

tholin commented Feb 11, 2019

This latest patch on master works fine. I ran it for 45 min to make sure and there is no excessive epolling with it.

jay added a commit to jay/curl that referenced this issue Feb 12, 2019
Follow-up to 38d8e1b 2019-02-11.

Background:

It was discovered a month ago that before checking whether to extract a
dead connection that that connection should be associated with a "live"
transfer for the check (ie original conn->data ignored and set to the
passed in data). A fix was landed in 54b201b which did that and also
cleared conn->data after the check. The original conn->data was not
restored, so presumably it was thought that a valid conn->data was no
longer needed.

Several days later it was discovered that a valid conn->data was needed
after the check and follow-up fix was landed in bbae24c which partially
reverted the original fix and attempted to limit the scope of when
conn->data was changed to only when pruning dead connections. In that
case conn->data was not cleared and the original conn->data not
restored.

A month later it was discovered that the original fix was somewhat
correct; a "live" transfer is needed for the check in all cases
because original conn->data could be null which could cause a bad deref
at arbitrary points in the check. A fix was landed in 38d8e1b which
expanded the scope to all cases. conn->data was not cleared and the
original conn->data not restored.

A day later it was discovered that not restoring the original conn->data
may lead to busy loops in applications that use the event interface, and
given this observation it's pretty safe assumption there is some code
path that still needs the original conn->data. This commit is the
follow-up fix for that, it restores the original conn->data after the
connection check.

Assisted-by: tholin@users.noreply.github.com
Reported-by: tholin@users.noreply.github.com

Fixes curl#3542
Closes #xxxx
jay added a commit to jay/curl that referenced this issue Feb 12, 2019
- Save the original conn->data before it's changed to the specified
  data transfer for the connection check and then restore it afterwards.

This is a follow-up to 38d8e1b 2019-02-11.

History:

It was discovered a month ago that before checking whether to extract a
dead connection that that connection should be associated with a "live"
transfer for the check (ie original conn->data ignored and set to the
passed in data). A fix was landed in 54b201b which did that and also
cleared conn->data after the check. The original conn->data was not
restored, so presumably it was thought that a valid conn->data was no
longer needed.

Several days later it was discovered that a valid conn->data was needed
after the check and follow-up fix was landed in bbae24c which partially
reverted the original fix and attempted to limit the scope of when
conn->data was changed to only when pruning dead connections. In that
case conn->data was not cleared and the original conn->data not
restored.

A month later it was discovered that the original fix was somewhat
correct; a "live" transfer is needed for the check in all cases
because original conn->data could be null which could cause a bad deref
at arbitrary points in the check. A fix was landed in 38d8e1b which
expanded the scope to all cases. conn->data was not cleared and the
original conn->data not restored.

A day later it was discovered that not restoring the original conn->data
may lead to busy loops in applications that use the event interface, and
given this observation it's a pretty safe assumption that there is some
code path that still needs the original conn->data. This commit is the
follow-up fix for that, it restores the original conn->data after the
connection check.

Assisted-by: tholin@users.noreply.github.com
Reported-by: tholin@users.noreply.github.com

Fixes curl#3542
Closes #xxxx
@jay
Copy link
Member

jay commented Feb 12, 2019

Thanks for testing, I turned the patch into #3559 for review.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Development

Successfully merging a pull request may close this issue.

3 participants