blob: 652c4eb0b6e6ef82f3ee5bc613b02b4aa221b7a8 [file] [log] [blame]
Renato Golin94cc1042016-04-26 11:02:23 +01001Monitoring Tools for LLVM Development
2=====================================
3
4These tools are not meant to be used for development or testing, but to be
5left running on a server or desktop as monitoring for your buildbots. They
6are also meant to be used in conjunction, not as a replacement, to Nagios
7and other hardware-level monitoring tools.
8
9Currently we only have one: bot-monitor, which I keep running on Linaro's
10public server (people.linaro.org) and keep it as a bookmark to quickly check
11the bot status. It's also a helpful bookmark for all bots we care.
Renato Golin662105b2016-06-10 12:47:43 +010012
13JSON Documentation
14------------------
15
16The JSON file should be self-explanatory, but just in case, here's a few
17of the behaviours it exhibits when rendered by the current version of the
18bot-monitor.
19
20The base structure is a list of masters, which has a few properties and a list
21of builder groups, which in turn also have some properties and a list of slaves.
22
23Master properties:
24
25 "name": "Name of the master, which will appear in bold big letters",
26 "base_url": "http://SERVER:PORT/BASE",
27 "builder_url": "part of the URL that refers to the list of builders",
28 "build_url": "part of the URL that refers to the list of builds",
29 "ignore" : "true | false, shows or hide the entire master from the page"
30 "builders": [ ... ]
31
32Builder properties:
33
34 "name": "Name of this group (fast bots, self-hosting, etc)",
35 "ignore" : "true | false, shows or hide the entire builder from the page"
36 "bots": [ ... ]
37
38Bots properties:
39
40 "name": "Exact name of the buildbot (becomes part of the URL)",
41 "ignore": "true | false, to ignore or not failures in this bot"
42
43Note that "ignore" has two different behaviour:
44
45 * On masters and builders, it omits the entire class from the output
46 * On bots, it still shows them, but ignores their status
47
48Note on bots:
49
50 * You can repeat bots across builders, if they belong to multiple classes, for
51 example "self-hosting" and "test-suite". The script will cache the results
52 and simply re-print them, so this is *only* for visualisation / organisation
53 purposes.
54 * Using the same bot name on different masters means *different* bots. It may
55 be the same configuration on two different masters, or it may be completely
56 different bots. Beware.
57
58
59HTML Page
60---------
61
62For now, there's only HTML output, but there's nothing stopping we to develop
63more forms of communication (email, IRC bots, etc).
64
65The HTML page is separated into blocks: Masters, Builder Groups, Bots. It also
66has a date on the top, to make sure you're looking at an up-to-date page, and
67it changes the page icon from green to red if at least one (non-ignored) bot
68is broken.
69
70Bots offline are considered broken, as they may require attention. But when the
71admin restarts the master, that kills all buildslaves, and this show up as
72"slave lost". You don't need to do anything, just wait for the next successful
73build.
74
75Each buildbot has four columns:
76
77 * Name & link: The bot name with a link to its page on its master. Good for
78 easy access to buildbots and masters.
79 * Status: Can only be "PASS" or "FAIL", but contains additional information
80 if it fails, ex. "slave lost" or "build stage 1" or "test-suite". These are
81 the name of the stages that failed.
82 * Build number: The build number, to help identify if there is a change from
83 a specific number. Not very useful, but there just for reference.
84 * Commit range: The range of commits that were tested on that build. This is
85 very helpful to identify if a slow bot is failing because it hasn't yet
86 reached the commit range on a fast bot that is passing, or not.
87
88
89LLVM Masters
90------------
91
92There are a number of masters in the LLVM upstream infrastructure, and we may
93need to monitor bots in all of those, or switch between them, depending on the
94need.
95
96* LLVM Upstream main master: http://lab.llvm.org:8011/
97
98This is the main master that spams everyone every time one of the bots break.
99Unless there is any specific concern, bots should be in this master.
100
101* LLVM Upstream silent master: http://lab.llvm.org:8014/
102
103Exactly the same as above, but no emails are sent. This master is usually empty
104except for the bots that may be noise temporarily, in active development, or
105being a bot that doesn't track compiler regressions, but performance regressions
106which is monitored on another page (http://llvm.org/perf/)
107
108* LLVM Japan master: http://bb.pgr.jp/
109
110A side master built by Nakamura Takumi with some x86 and x86_64 buildbots. We
111rarely need to monitor anything there, but it's good to know it's there.
112
113* Linaro Downstream master: http://buildmaster.tcwglab.linaro.org/
114
115Our local master, that we use for development. Individual developers can have
116their own containers, in which case, the masters will be in different ports.
117
118These bots should always be ignored for their global status, or we'll generate
119a lot of noise to ourselves. Unless, of course, they're in their way upstream
120and going through staging deployment.
121
122* Green Dragon bots: http://lab.llvm.org:8080/green/
123
124This is not a buildbot master, but Jenkins. We don't monitor those in our page
125but they do have IRC bots in the #llvm channel and are already quite good at
126displaying success and failures.