aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--monitor/README.txt115
1 files changed, 115 insertions, 0 deletions
diff --git a/monitor/README.txt b/monitor/README.txt
index c5cc6e8..652c4eb 100644
--- a/monitor/README.txt
+++ b/monitor/README.txt
@@ -9,3 +9,118 @@ and other hardware-level monitoring tools.
Currently we only have one: bot-monitor, which I keep running on Linaro's
public server (people.linaro.org) and keep it as a bookmark to quickly check
the bot status. It's also a helpful bookmark for all bots we care.
+
+JSON Documentation
+------------------
+
+The JSON file should be self-explanatory, but just in case, here's a few
+of the behaviours it exhibits when rendered by the current version of the
+bot-monitor.
+
+The base structure is a list of masters, which has a few properties and a list
+of builder groups, which in turn also have some properties and a list of slaves.
+
+Master properties:
+
+ "name": "Name of the master, which will appear in bold big letters",
+ "base_url": "http://SERVER:PORT/BASE",
+ "builder_url": "part of the URL that refers to the list of builders",
+ "build_url": "part of the URL that refers to the list of builds",
+ "ignore" : "true | false, shows or hide the entire master from the page"
+ "builders": [ ... ]
+
+Builder properties:
+
+ "name": "Name of this group (fast bots, self-hosting, etc)",
+ "ignore" : "true | false, shows or hide the entire builder from the page"
+ "bots": [ ... ]
+
+Bots properties:
+
+ "name": "Exact name of the buildbot (becomes part of the URL)",
+ "ignore": "true | false, to ignore or not failures in this bot"
+
+Note that "ignore" has two different behaviour:
+
+ * On masters and builders, it omits the entire class from the output
+ * On bots, it still shows them, but ignores their status
+
+Note on bots:
+
+ * You can repeat bots across builders, if they belong to multiple classes, for
+ example "self-hosting" and "test-suite". The script will cache the results
+ and simply re-print them, so this is *only* for visualisation / organisation
+ purposes.
+ * Using the same bot name on different masters means *different* bots. It may
+ be the same configuration on two different masters, or it may be completely
+ different bots. Beware.
+
+
+HTML Page
+---------
+
+For now, there's only HTML output, but there's nothing stopping we to develop
+more forms of communication (email, IRC bots, etc).
+
+The HTML page is separated into blocks: Masters, Builder Groups, Bots. It also
+has a date on the top, to make sure you're looking at an up-to-date page, and
+it changes the page icon from green to red if at least one (non-ignored) bot
+is broken.
+
+Bots offline are considered broken, as they may require attention. But when the
+admin restarts the master, that kills all buildslaves, and this show up as
+"slave lost". You don't need to do anything, just wait for the next successful
+build.
+
+Each buildbot has four columns:
+
+ * Name & link: The bot name with a link to its page on its master. Good for
+ easy access to buildbots and masters.
+ * Status: Can only be "PASS" or "FAIL", but contains additional information
+ if it fails, ex. "slave lost" or "build stage 1" or "test-suite". These are
+ the name of the stages that failed.
+ * Build number: The build number, to help identify if there is a change from
+ a specific number. Not very useful, but there just for reference.
+ * Commit range: The range of commits that were tested on that build. This is
+ very helpful to identify if a slow bot is failing because it hasn't yet
+ reached the commit range on a fast bot that is passing, or not.
+
+
+LLVM Masters
+------------
+
+There are a number of masters in the LLVM upstream infrastructure, and we may
+need to monitor bots in all of those, or switch between them, depending on the
+need.
+
+* LLVM Upstream main master: http://lab.llvm.org:8011/
+
+This is the main master that spams everyone every time one of the bots break.
+Unless there is any specific concern, bots should be in this master.
+
+* LLVM Upstream silent master: http://lab.llvm.org:8014/
+
+Exactly the same as above, but no emails are sent. This master is usually empty
+except for the bots that may be noise temporarily, in active development, or
+being a bot that doesn't track compiler regressions, but performance regressions
+which is monitored on another page (http://llvm.org/perf/)
+
+* LLVM Japan master: http://bb.pgr.jp/
+
+A side master built by Nakamura Takumi with some x86 and x86_64 buildbots. We
+rarely need to monitor anything there, but it's good to know it's there.
+
+* Linaro Downstream master: http://buildmaster.tcwglab.linaro.org/
+
+Our local master, that we use for development. Individual developers can have
+their own containers, in which case, the masters will be in different ports.
+
+These bots should always be ignored for their global status, or we'll generate
+a lot of noise to ourselves. Unless, of course, they're in their way upstream
+and going through staging deployment.
+
+* Green Dragon bots: http://lab.llvm.org:8080/green/
+
+This is not a buildbot master, but Jenkins. We don't monitor those in our page
+but they do have IRC bots in the #llvm channel and are already quite good at
+displaying success and failures.