diff options
author | David Spickett <david.spickett@linaro.org> | 2022-10-26 11:56:07 +0100 |
---|---|---|
committer | David Spickett <david.spickett@linaro.org> | 2022-10-27 09:43:38 +0100 |
commit | d90e09ac78ad7db816375ffb356016c6894c1804 (patch) | |
tree | 8c7be77c8149e9907913fb4acbe965574b3abda8 | |
parent | a4f23fe62b045c3e6db25c7245dac01e85d922e4 (diff) |
llvmbot monitor: Update README
This removes a lot of now incorrect information and dead links.
The description of the JSON format has been updated also.
Change-Id: I9001baa2e9af5f64ea031e8b4dacf64337f72e8c
-rw-r--r-- | monitor/README.txt | 195 |
1 files changed, 91 insertions, 104 deletions
diff --git a/monitor/README.txt b/monitor/README.txt index 4faef1a..2ca5a4a 100644 --- a/monitor/README.txt +++ b/monitor/README.txt @@ -1,128 +1,115 @@ -Monitoring Tools for LLVM Development -===================================== +LLVM Buildbot Monitor +===================== -These tools are not meant to be used for development or testing, but to be -left running on a server or desktop as monitoring for your buildbots. They -are also meant to be used in conjunction, not as a replacement, to Nagios -and other hardware-level monitoring tools. +This is to be left running on a server or desktop as monitoring for your buildbots. +It purely reports the status of the builds. If you want hardware monitoring, look +elsewhere. -Currently we only have one: bot-monitor, which I keep running on Linaro's -public server (people.linaro.org) and keep it as a bookmark to quickly check -the bot status. It's also a helpful bookmark for all bots we care. +It supports Buildbot (as used by LLVM) and Buildkite (https://buildkite.com/, used by LLVM's +libcxx project). It does not support LLVM Green Dragon (https://green.lab.llvm.org/green/). -JSON Documentation ------------------- +Buildkite does require an API key to be available, see the script for how to do that +and contact your Builkite org admin to get one. -The JSON file should be self-explanatory, but just in case, here's a few -of the behaviours it exhibits when rendered by the current version of the -bot-monitor. +Currently we have one monitor running at http://llvm.validation.linaro.org/. +Bookmark this if you have a need to check bot status at a glance. -The base structure is a list of masters, which has a few properties and a list -of builder groups, which in turn also have some properties and a list of slaves. - -Master properties: - - "name": "Name of the master, which will appear in bold big letters", - "base_url": "http://SERVER:PORT/BASE", - "builder_url": "part of the URL that refers to the list of builders", - "build_url": "part of the URL that refers to the list of builds", - "ignore" : "true | false, shows or hide the entire master from the page" - "builders": [ ... ] - -Builder properties: - - "name": "Name of this group (fast bots, self-hosting, etc)", - "ignore" : "true | false, shows or hide the entire builder from the page" - "bots": [ ... ] - -Bots properties: - - "name": "Exact name of the buildbot (becomes part of the URL)", - "ignore": "true | false, to ignore or not failures in this bot" - -Note that "ignore" has two different behaviour: - - * On masters and builders, it omits the entire class from the output - * On bots, it still shows them, but ignores their status +JSON Format +----------- -Note on bots: +The JSON file describes the bots we want to monitor and which master/build service +they connect to. - * You can repeat bots across builders, if they belong to multiple classes, for - example "self-hosting" and "test-suite". The script will cache the results - and simply re-print them, so this is *only* for visualisation / organisation - purposes. - * Using the same bot name on different masters means *different* bots. It may - be the same configuration on two different masters, or it may be completely - different bots. Beware. +Buildbot JSON Format +-------------------- +The base structure is a list of masters, which has a few properties and a list +of builder groups, which in turn also have some properties and a list of bots +(which in Buildbot terms are actually called "Builders" but we ended up calling +them bots here). -HTML Page ---------- - -For now, there's only HTML output, but there's nothing stopping we to develop -more forms of communication (email, IRC bots, etc). - -The HTML page is separated into blocks: Masters, Builder Groups, Bots. It also -has a date on the top, to make sure you're looking at an up-to-date page, and -it changes the page icon from green to red if at least one (non-ignored) bot -is broken. - -Bots offline are considered broken, as they may require attention. But when the -admin restarts the master, that kills all buildslaves, and this show up as -"slave lost". You don't need to do anything, just wait for the next successful -build. - -Each buildbot has four columns: +Master properties: - * Name & link: The bot name with a link to its page on its master. Good for - easy access to buildbots and masters. - * Status: Can only be "PASS" or "FAIL", but contains additional information - if it fails, ex. "slave lost" or "build stage 1" or "test-suite". These are - the name of the stages that failed. - * Time: the total time spent in build reported as HH:MM:SS. - * Build number: The build number, to help identify if there is a change from - a specific number. Not very useful, but there just for reference. - * Commit range: The range of commits that were tested on that build. This is - very helpful to identify if a slow bot is failing because it hasn't yet - reached the commit range on a fast bot that is passing, or not. - * Comments: The reported status string in the case of a failure. + "name": Name of the master, which will appear as the section title. + "base_url": The base URL of the master, which will be used to make API calls. + For example for LLVM this might be "https://lab.llvm.org/buildbot". + "builder_url": The part of the URL that refers to the list of builders. + Will be added to base_url when making API calls. + "build_url": Part of the URL that refers to the list of builds. Added to base_url + when making API calls. + "ignore" : Set to "true" to hide the master from the page. + "builders": [ ...a list of builder groups as detailed below... ] +Builder group properties: -LLVM Masters ------------- + "name": Name of this group. "fast bots", "self-hosting", etc. Used as the section title. + "ignore" : Set to "true" to hide this builder group from the page. + "bots": [ ...a list of bots as decribed below... ] -There are a number of masters in the LLVM upstream infrastructure, and we may -need to monitor bots in all of those, or switch between them, depending on the -need. +Bot properties: -* LLVM Upstream main master: http://lab.llvm.org:8011/ + "name": The exact name of the buildbot. This will be used to build URLs for API calls. + "ignore": Set to "true" to ignore the status of this bot. -This is the main master that spams everyone every time one of the bots break. -Unless there is any specific concern, bots should be in this master. +Notes on bots: + * Bots may be repeated across builder groups if they fall into multiple categories + (this does not slow down the monitor as results are cached). + * The same bot name on 2 different masters refers to 2 different bots. -* LLVM Upstream silent master: http://lab.llvm.org:8014/ +Note that "ignore" has two different behaviours: -Exactly the same as above, but no emails are sent. This master is usually empty -except for the bots that may be noise temporarily, in active development, or -being a bot that doesn't track compiler regressions, but performance regressions -which is monitored on another page (http://llvm.org/perf/) + * On masters and builder groups, it omits the entire section from the output. + * On bots it shows the bot but ignores their status. Meaning that an ignored bot failing + does not make the overall page status failed. -* LLVM Japan master: http://bb.pgr.jp/ +Buildkite JSON Format +--------------------- -A side master built by Nakamura Takumi with some x86 and x86_64 buildbots. We -rarely need to monitor anything there, but it's good to know it's there. +The Buildkite format follows the Buildbot format closely with some differences. -* Linaro Downstream master: http://buildmaster.tcwglab.linaro.org/ + * Since Buildkite is a centralised service the "name" is always "Buildkite". + * The "base_url" is always "https://www.buildkite.com". + * There is a new key "buildkite_org". This is used to find our particular bots + via the API. + * The "builder_url" and "build_url" keys are used to form clickable links, + and are always "builders" and "builds". + * Bots have an extra key "buildkite_pipeline" (explained below). -Our local master, that we use for development. Individual developers can have -their own containers, in which case, the masters will be in different ports. +When querying Buildkite we make a request like this: +"show me the results of the pipeline <buildkite_pipeline> in the organisation <buildkite_org>" -These bots should always be ignored for their global status, or we'll generate -a lot of noise to ourselves. Unless, of course, they're in their way upstream -and going through staging deployment. +From there the script looks for the bot's name in the last finished build. After that +all the processing is the same as the Buildbot results. -* Green Dragon bots: http://lab.llvm.org:8080/green/ +HTML Page +--------- -This is not a buildbot master, but Jenkins. We don't monitor those in our page -but they do have IRC bots in the #llvm channel and are already quite good at -displaying success and failures. +The script will generate an HTML page. This page is separated into blocks: + * Masters which contain... + * Builder Groups which contain... + * Bots + +The date is printed at the top of the page so you know when the results were generated. +The favicon will change from green to red when there is at least one failed bot, that +has not itself been ignored. + +Bots that are offline or partially fail to read via the API will show up with a message +along the lines of "<bot name> is offline!". The page should still update correctly +for the rest of the bots. + +Each listed bot has these columns: + + * "Buildbot": This shows the name and a link to the master's web interface for the bot. + * "Status": The status of the last finished build. PASS or FAIL (currently cancelled + is also treated as a failure). + * "T Since": The time since the last build finished. This is useful for spotting bots + that have gotten disconnected. If this time is greater than 24 hours, it will be shown + in red. + * "Duration": The length of the last build. + * "Build #": The build number of the last finished build, which itself will be a link + to the results page for that build. + * "Commits": The commit range (if known) of the build, if that build was a failure. + * "Failing steps": The failed build steps, if it was a failed build. + +Note: "finished" here refers to the build ending be that by success, cancellation or +failure. |