From 662105b973c7eb557c97a9f82568f4180c584ed6 Mon Sep 17 00:00:00 2001 From: Renato Golin Date: Fri, 10 Jun 2016 12:47:43 +0100 Subject: [monitor] JSON documentation Change-Id: Ifc143115fa3f6387af27ac4fddbc4bdaa1fca917 --- monitor/README.txt | 115 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 115 insertions(+) (limited to 'monitor') diff --git a/monitor/README.txt b/monitor/README.txt index c5cc6e8..652c4eb 100644 --- a/monitor/README.txt +++ b/monitor/README.txt @@ -9,3 +9,118 @@ and other hardware-level monitoring tools. Currently we only have one: bot-monitor, which I keep running on Linaro's public server (people.linaro.org) and keep it as a bookmark to quickly check the bot status. It's also a helpful bookmark for all bots we care. + +JSON Documentation +------------------ + +The JSON file should be self-explanatory, but just in case, here's a few +of the behaviours it exhibits when rendered by the current version of the +bot-monitor. + +The base structure is a list of masters, which has a few properties and a list +of builder groups, which in turn also have some properties and a list of slaves. + +Master properties: + + "name": "Name of the master, which will appear in bold big letters", + "base_url": "http://SERVER:PORT/BASE", + "builder_url": "part of the URL that refers to the list of builders", + "build_url": "part of the URL that refers to the list of builds", + "ignore" : "true | false, shows or hide the entire master from the page" + "builders": [ ... ] + +Builder properties: + + "name": "Name of this group (fast bots, self-hosting, etc)", + "ignore" : "true | false, shows or hide the entire builder from the page" + "bots": [ ... ] + +Bots properties: + + "name": "Exact name of the buildbot (becomes part of the URL)", + "ignore": "true | false, to ignore or not failures in this bot" + +Note that "ignore" has two different behaviour: + + * On masters and builders, it omits the entire class from the output + * On bots, it still shows them, but ignores their status + +Note on bots: + + * You can repeat bots across builders, if they belong to multiple classes, for + example "self-hosting" and "test-suite". The script will cache the results + and simply re-print them, so this is *only* for visualisation / organisation + purposes. + * Using the same bot name on different masters means *different* bots. It may + be the same configuration on two different masters, or it may be completely + different bots. Beware. + + +HTML Page +--------- + +For now, there's only HTML output, but there's nothing stopping we to develop +more forms of communication (email, IRC bots, etc). + +The HTML page is separated into blocks: Masters, Builder Groups, Bots. It also +has a date on the top, to make sure you're looking at an up-to-date page, and +it changes the page icon from green to red if at least one (non-ignored) bot +is broken. + +Bots offline are considered broken, as they may require attention. But when the +admin restarts the master, that kills all buildslaves, and this show up as +"slave lost". You don't need to do anything, just wait for the next successful +build. + +Each buildbot has four columns: + + * Name & link: The bot name with a link to its page on its master. Good for + easy access to buildbots and masters. + * Status: Can only be "PASS" or "FAIL", but contains additional information + if it fails, ex. "slave lost" or "build stage 1" or "test-suite". These are + the name of the stages that failed. + * Build number: The build number, to help identify if there is a change from + a specific number. Not very useful, but there just for reference. + * Commit range: The range of commits that were tested on that build. This is + very helpful to identify if a slow bot is failing because it hasn't yet + reached the commit range on a fast bot that is passing, or not. + + +LLVM Masters +------------ + +There are a number of masters in the LLVM upstream infrastructure, and we may +need to monitor bots in all of those, or switch between them, depending on the +need. + +* LLVM Upstream main master: http://lab.llvm.org:8011/ + +This is the main master that spams everyone every time one of the bots break. +Unless there is any specific concern, bots should be in this master. + +* LLVM Upstream silent master: http://lab.llvm.org:8014/ + +Exactly the same as above, but no emails are sent. This master is usually empty +except for the bots that may be noise temporarily, in active development, or +being a bot that doesn't track compiler regressions, but performance regressions +which is monitored on another page (http://llvm.org/perf/) + +* LLVM Japan master: http://bb.pgr.jp/ + +A side master built by Nakamura Takumi with some x86 and x86_64 buildbots. We +rarely need to monitor anything there, but it's good to know it's there. + +* Linaro Downstream master: http://buildmaster.tcwglab.linaro.org/ + +Our local master, that we use for development. Individual developers can have +their own containers, in which case, the masters will be in different ports. + +These bots should always be ignored for their global status, or we'll generate +a lot of noise to ourselves. Unless, of course, they're in their way upstream +and going through staging deployment. + +* Green Dragon bots: http://lab.llvm.org:8080/green/ + +This is not a buildbot master, but Jenkins. We don't monitor those in our page +but they do have IRC bots in the #llvm channel and are already quite good at +displaying success and failures. -- cgit v1.2.3