Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] First steps to rebalance CI jobs with focus on macOS - squash me! #5124

Closed
wants to merge 14 commits into from

Conversation

mback2k
Copy link
Member

@mback2k mback2k commented Mar 19, 2020

My current plan is to layout the CI jobs like this:

  • travis-ci: mostly Linux and non-Linux (macOS) builds with special dependencies
  • gh-actions: mostly macOS builds
  • azure-pipelines: mostly MSYS-based Windows builds
  • appveyor: only Windows builds using various build-systems
  • cirrus-ci: special builds, like FreeBSD

This is work-in-progress and subject to change with additional commits and CI load testing.

@bagder
Copy link
Member

bagder commented Mar 19, 2020

We also use Cirrus-CI, right now only for two FreeBSD builds which means it is underutilized.

.github/workflows/macos.yml Outdated Show resolved Hide resolved
@mback2k mback2k force-pushed the rebalance-macos-ci branch 2 times, most recently from 6557f0e to 7c0c7e6 Compare March 19, 2020 11:16
@mback2k
Copy link
Member Author

mback2k commented Mar 19, 2020

Overview of our (con)current CI capacity before this PR:

  • travis-ci: 31/20 (big problem, even though the recent limit increase)
  • gh-actions: 2/20 (CIFuzz and Ubuntu default build which only ran on push, not for PRs)
  • azure-pipelines: 20/20
  • appveyor: 16/10 (not that big of a problem, because 8 jobs take less than 3 minutes)
  • cirrus-ci:
    • 0/8 Linux Containers
    • 0/2 Windows Containers
    • 2/2 FreeBSD
    • 0/1 macOS

@bagder maybe it is a good idea to include the limits in your stats script and have a check for them?

@bagder
Copy link
Member

bagder commented Mar 19, 2020

include the limits in your stats script and have a check for them?

I don't think they belong in there. The limits also vary over time and the script can't figure out what limits that are set for what service when.

I also don't think the limits themselves is the problem as much as the total time the jobs take to complete.

@mback2k mback2k force-pushed the rebalance-macos-ci branch 2 times, most recently from 2a3e36a to 41efad7 Compare March 19, 2020 18:46
@mback2k
Copy link
Member Author

mback2k commented Mar 19, 2020

Note to myself: turn build combinations into matrix strategy elements to reduce YAML duplication.

@mback2k
Copy link
Member Author

mback2k commented Mar 19, 2020

@bagder can we drop cpp.yml from GH Actions? I think that is already covered by other CIs.

@bagder
Copy link
Member

bagder commented Mar 20, 2020

can we drop cpp.yml from GH Actions

Sure, if the same build runs elsewhere it isn't necessary to have there!

@mback2k mback2k force-pushed the rebalance-macos-ci branch 2 times, most recently from 24df551 to bc075ae Compare March 20, 2020 20:59
TODO: test against real gcc instead of just gcc-aliased clang.
@mback2k mback2k changed the title [WIP] First steps to rebalance CI jobs with focus on macOS [WIP] First steps to rebalance CI jobs with focus on macOS - squash me! Mar 20, 2020
@mback2k
Copy link
Member Author

mback2k commented Mar 20, 2020

@bagder do you think the number of macOS jobs is crazy or would it be okay to have those?

@bagder
Copy link
Member

bagder commented Mar 20, 2020

do you think the number of macOS jobs is crazy

How many are they and why do you suggest we need that amount?

@mback2k
Copy link
Member Author

mback2k commented Mar 22, 2020

do you think the number of macOS jobs is crazy

How many are they and why do you suggest we need that amount?

Previously it came to 29 (9 autotools builds with clang, gcc-8 and gcc-9, plus 2 cmake builds).
Now I switched the various compiler testing to the cmake builds and have 15 builds (9 autotools and 6 cmake builds).

I basically transferred and merged, but also logically combined/expanded upon the build variants we had on travis-ci and azure-pipelines to form this list.

@mback2k mback2k marked this pull request as ready for review March 22, 2020 19:02
@mback2k
Copy link
Member Author

mback2k commented Mar 22, 2020

@bagder please give a Go for merge. I will do the other CI changes in separate PRs.

@mback2k mback2k requested a review from bagder March 22, 2020 19:03
@mback2k
Copy link
Member Author

mback2k commented Mar 22, 2020

BTW-- the reason I am moving macOS builds away from travis-ci is the backlog on there:
image
During peak hours the amount of macOS jobs on their platform results in delayed job execution.

@bagder
Copy link
Member

bagder commented Mar 22, 2020

so the same build combos as before still run, just spread out differently over the services? And then some new combos are added? This is 14 commits and the yaml is impossible to understand. It's hard to review....

@mback2k
Copy link
Member Author

mback2k commented Mar 23, 2020

so the same build combos as before still run, just spread out differently over the services?

Yes, but instead of spreading them over travis-ci and azure-pipelines, macOS is for now only used with gh-actions.

And then some new combos are added?

Yes, to compensate for some gcc to clang transitions. See below for details.

This is 14 commits and the yaml is impossible to understand. It's hard to review....

I am sorry for that, I guess you can see the summarized changes here, but in order to make a review and overview of the changes easier, I created the following illustration (click for original/full size):
curl-ci-macos

  • on the left = before this PR: top = travis-ci & bottom = azure-pipelines
  • on the right = after this PR: gh-actions only
  • blue lines = moved and merged to other CI without build configuration change
  • orange lines = moved to other CI while switching from gcc to the macOS default clang
  • green dots = compensate for switch to clang by adding similar builds using gcc with cmake

@mback2k
Copy link
Member Author

mback2k commented Mar 23, 2020

I don't think they belong in there. The limits also vary over time and the script can't figure out what limits that are set for what service when.

@bagder Okay, but what about splitting the dashboard graph per CI platform? and make the graph lines stacked so that it is possible to tell the amount of jobs per CI platform?

@bagder
Copy link
Member

bagder commented Mar 24, 2020

Once this is merged, I need to double-check that the stats script actually counts the jobs correctly still...

@mback2k mback2k closed this in 840df8b Mar 24, 2020
@mback2k
Copy link
Member Author

mback2k commented Mar 24, 2020

Thanks, I just merged this PR and also did the other tidy-ups we talked about.

Regarding 7e8a1a0 I forgot to mention in the commit description that a similar job is also running on Azure Pipelines besides Travis CI.

@mback2k
Copy link
Member Author

mback2k commented Mar 24, 2020

Once this is merged, I need to double-check that the stats script actually counts the jobs correctly still...

I guess it won't since I am now making use of the matrix feature and also plan to do the same for Azure Pipelines to reduce duplication clutter.

I guess you could adapt it to count the combinations of - install and - CC lines, though the later is only relevant for the 2nd job related to cmake. So the script will need to be aware of jobs.

cc @bagder

@mback2k
Copy link
Member Author

mback2k commented Mar 24, 2020

Overview of our (con)current CI capacity after this PR was merged:

  • travis-ci: 24/20 (Ubuntu variants only, less of a problem now due to job limit increase)
  • gh-actions: 16/20 (15 macOS + 1 CIFuzz)
  • azure-pipelines: 16/20 (6 Ubuntu + 10 Windows msys)
  • appveyor: 16/10 (Windows variants only, not a problem since 8 jobs take less than 3 minutes)
  • cirrus-ci:
    • 0/8 Linux Containers
    • 0/2 Windows Containers
    • 2/2 FreeBSD
    • 0/1 macOS

My next plan is to fix the randomly breaking AppVeyor jobs with #5034, move some Linux jobs to CirrusCI and also add some more Windows jobs (most likely WSL-based ones which are currently only possible on AppVeyor, because PRs for actions/runner-images#50 are still pending). So I may also try to do some rebalancing of Windows jobs if AppVeyor becomes too slow or utilized.

cc @bagder @MarcelRaad

bagder added a commit to curl/stats that referenced this pull request Mar 24, 2020
Since it changes the github file to use "matrix" style.

PR 5124: curl/curl#5124
Closes #6
@mback2k
Copy link
Member Author

mback2k commented May 28, 2020

Improving Azure Pipelines is in the works with #5468. Next up: move more builds to CirrusCI.

@Vampire
Copy link

Vampire commented Sep 4, 2020

most likely WSL-based ones which are currently only possible on AppVeyor, because PRs for actions/runner-images#50 are still pending

That issue got fixed.
To use WSL you first have to install a distribution.
You can do so using my shiny new setup-wsl action: https://github.com/marketplace/actions/setup-wsl if you like :-)

@mback2k
Copy link
Member Author

mback2k commented Sep 4, 2020

Definitely on my todo list, thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

None yet

3 participants