Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

progress: Track total times following redirects #1602

Closed
wants to merge 1 commit into from

Conversation

rylwin
Copy link
Contributor

@rylwin rylwin commented Jun 21, 2017

Update the progress timers t_nslookup, t_connect, t_appconnect,
t_pretransfer, and t_starttransfer to track the total times for
these activities when a redirect is followed. Previously, only the times
for the most recent request would be tracked.

As a related change, rename Curl_pgrsResetTimesSizes to
Curl_pgrsResetTransferSizes now that the function only resets transfer
sizes and no longer modified any of the progress timers.

Added test case 1399.

Fixes #522 and Known Bug 1.8
Reported-by: joshhe

@coveralls
Copy link

Coverage Status

Coverage decreased (-73.8%) to 0.0% when pulling 6d4cac1 on rylwin:total-times-on-redirect into 8de8f4e on curl:master.

Copy link
Member

@bagder bagder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good to me!

static void complete_aggregate_timed_activities(struct Curl_easy *data)
{
for(unsigned int i = 0; i < sizeof(aggregate_timerids); i++) {
Curl_pgrsTime(data, aggregate_timerids[i]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a comment above the Curl_pgrsTime() function in progress.c to mention @unittest: 1399. We marked other unit-tested functions that way before and it could be worth keeping that tradition (to help us slowly test more and more functions).

@rylwin
Copy link
Contributor Author

rylwin commented Jun 21, 2017

Thanks for the review @bagder!

I've added the @unittest comment as you suggested. I also realized that I forgot to declare a few vars in the test as static, so I've also made this change to quiet warnings. Both of these changes have been squashed into the original commit.

A couple other items to flag in case they are issues:

  • One of the builds failed on test 0500 (https://travis-ci.org/curl/curl/jobs/245550898). I don't think this is related; hopefully this test passes on the next build now that I've pushed a couple minor changes.
  • I used 1399, which I think is the last test number available for unit tests per tests/README. I'd be happy to open another PR with changes to this doc if you know what other # range you'd like to use.

I'll drop another note in this conversation once the new builds finish to confirm the static warnings have been resolved.

@bagder
Copy link
Member

bagder commented Jun 21, 2017

I'd be happy to open another PR with changes to this doc if you know what other # range you'd like to use.

The ranges are only for convenience, but I see that we also have 1600-1699 marked for unit tests so I think we should be good a little longer!

@coveralls
Copy link

Coverage Status

Coverage decreased (-73.8%) to 0.0% when pulling 7852e3c on rylwin:total-times-on-redirect into 8de8f4e on curl:master.

@bagder
Copy link
Member

bagder commented Jun 21, 2017

I don't think this [test 500 failure] is related

I suspect it is. It ls the lib500.c and seems to detect that the starttransfer time is larger than the total time, which does sound odd:

starttransfer vs total: 0.006207 0.005473

@coveralls
Copy link

Coverage Status

Coverage decreased (-73.8%) to 0.0% when pulling d7ab12b on rylwin:total-times-on-redirect into 9b167fd on curl:master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-73.8%) to 0.0% when pulling 46e6cce on rylwin:total-times-on-redirect into 9b167fd on curl:master.

@rylwin
Copy link
Contributor Author

rylwin commented Jun 22, 2017

I noticed something interesting when looking into the issue of starttransfer being greater than total, and I could use some input on what the correct behavior is and/or how to further debug.

What I've found is that sometimes Curl_pgrsTime is invoked with TIMER_STARTTRANSFER twice for what appears to be the same request. I've added some additional logging to help show this (46e6cce) and have a couple of examples that demonstrate what I'm seeing:

  • In the travic ci build where 0500 fails, you can see these new debug messages being printed twice (here and here).
  • I've also been able to consistently replicate this with a real world request (though similar requests to other domains don't replicate the issue).

Previously, capturing the time of TIMER_STARTTRANSFER wouldn't have caused 500 to fail because the time recorded would always be less than the total. However, now that we add to t_starttransfer each time (instead of overwriting the value), it's possible to exceed the total time.

Even more interesting is that I can apply the same debug/logging code to master and see TIMER_STARTTRANSFER get captured twice for a single request (though starttransfer doesn't exceed total since we're overwriting the value) just like in the linked gist above.

Conceptually, I wouldn't think it would be correct for TIMER_STARTTRANSFER to be recorded more than once for a single request. I've been trying to trace why this timer is being recorded twice on some requests, but haven't figured it out yet. Do you have any ideas what might be going on? Or any thoughts on where I should continue looking?

@bagder
Copy link
Member

bagder commented Jun 22, 2017

I agree that setting TIMER_STARTTRANSFER twice is a bug. We should fix that as an issue separate from this PR, probably even before this PR to make way for your work better.

I think the reason for the double setting is simply because the state in which it sets it doesn't change when set so it can detect a believed start situation again!

@rylwin
Copy link
Contributor Author

rylwin commented Jun 22, 2017

Makes sense to resolve before merging this PR. I'm happy to take a stab at it, though not being very familiar with the code I could use some guidance on what you think the best approach might be.

Do you think it's as simple as checking within Curl_pgrsTime whether we've already set t_starttransfer (which could be as simple as checking if the value is already non-zero)?

Or is the double invocation a symptom of a larger problem that needs to be solved? e.g., perhaps there's a bug in transfer.c#readwrite_data() that's causing Curl_pgrsTime(data, TIMER_STARTTRANSFER); to be invoked extra times when it shouldn't be? (though nothing jumps out immediately from looking through this code)

Let me know your thoughts and I'll start working on this in a new issue/PR. Thanks!

@bagder
Copy link
Member

bagder commented Jun 23, 2017

checking within Curl_pgrsTime whether we've already set t_starttransfer (which could be as simple as checking if the value is already non-zero)?

I think that should be enough. And at least an improvement from what we have now.

bagder pushed a commit that referenced this pull request Jun 30, 2017
Prevent `Curl_pgrsTime` from modifying `t_starttransfer` when invoked
with `TIMER_STARTTRANSFER` more than once during a single request.

When a redirect occurs, this is considered a new request and
`t_starttransfer` can be updated to reflect the `t_starttransfer` time
of the redirect request.

Closes #1616

Bug: #1602 (comment)
@rylwin rylwin force-pushed the total-times-on-redirect branch 3 times, most recently from 568e24b to 826ee42 Compare July 3, 2017 18:28
Update the progress timers `t_nslookup`, `t_connect`, `t_appconnect`,
`t_pretransfer`, and `t_starttransfer` to track the total times for
these activities when a redirect is followed. Previously, only the times
for the most recent request would be tracked.

Related changes:

  - Rename `Curl_pgrsResetTimesSizes` to `Curl_pgrsResetTransferSizes`
    now that the function only resets transfer sizes and no longer
    modifies any of the progress timers.

  - Add a bool to the `Progress` struct that is used to prevent
    double-counting `t_starttransfer` times.

Added test case 1399.

Fixes curl#522 and Known Bug 1.8
Reported-by: joshhe
Copy link
Contributor Author

@rylwin rylwin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated this PR now that f8f040e (progress: prevent resetting t_starttransfer) and related fixes for broken tests are in master.

The logic for handling multiple invocations of Curl_pgrsTime with TIMER_STARTTRANSFER had to change a bit: I've adding a bool to track whether we should be modifying t_starttransfer (or not if it is a repeated call for the same request).

Looking forward to your comments!

@@ -1255,6 +1255,8 @@ struct Progress {
struct timeval t_startop;
struct timeval t_acceptdata;

bool is_t_startransfer_set;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it OK to modify the Progress struct? And to insert this variable in the middle of the struct? Is there any code (internal or external) that would depend on specific offsets accessing vars here?

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.004%) to 74.019% when pulling 6872ff8 on rylwin:total-times-on-redirect into c0cdc68 on curl:master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.004%) to 74.019% when pulling 6872ff8 on rylwin:total-times-on-redirect into c0cdc68 on curl:master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.01%) to 74.011% when pulling 6872ff8 on rylwin:total-times-on-redirect into c0cdc68 on curl:master.

@bagder
Copy link
Member

bagder commented Jul 30, 2017

I'm on track for merging this now. Minor edit needed since 4dee50b but I fixed it locally.

@bagder bagder closed this in 43d036e Aug 15, 2017
@bagder
Copy link
Member

bagder commented Aug 15, 2017

Thank you, and sorry for taking so long to merge it!

@rylwin
Copy link
Contributor Author

rylwin commented Aug 15, 2017

Not a problem--thanks for the support and guidance on this and the related PRs!

@lock lock bot locked as resolved and limited conversation to collaborators Jan 18, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants