| Monitoring Tools for LLVM Development |
| ===================================== |
| |
| These tools are not meant to be used for development or testing, but to be |
| left running on a server or desktop as monitoring for your buildbots. They |
| are also meant to be used in conjunction, not as a replacement, to Nagios |
| and other hardware-level monitoring tools. |
| |
| Currently we only have one: bot-monitor, which I keep running on Linaro's |
| public server (people.linaro.org) and keep it as a bookmark to quickly check |
| the bot status. It's also a helpful bookmark for all bots we care. |
| |
| JSON Documentation |
| ------------------ |
| |
| The JSON file should be self-explanatory, but just in case, here's a few |
| of the behaviours it exhibits when rendered by the current version of the |
| bot-monitor. |
| |
| The base structure is a list of masters, which has a few properties and a list |
| of builder groups, which in turn also have some properties and a list of slaves. |
| |
| Master properties: |
| |
| "name": "Name of the master, which will appear in bold big letters", |
| "base_url": "http://SERVER:PORT/BASE", |
| "builder_url": "part of the URL that refers to the list of builders", |
| "build_url": "part of the URL that refers to the list of builds", |
| "ignore" : "true | false, shows or hide the entire master from the page" |
| "builders": [ ... ] |
| |
| Builder properties: |
| |
| "name": "Name of this group (fast bots, self-hosting, etc)", |
| "ignore" : "true | false, shows or hide the entire builder from the page" |
| "bots": [ ... ] |
| |
| Bots properties: |
| |
| "name": "Exact name of the buildbot (becomes part of the URL)", |
| "ignore": "true | false, to ignore or not failures in this bot" |
| |
| Note that "ignore" has two different behaviour: |
| |
| * On masters and builders, it omits the entire class from the output |
| * On bots, it still shows them, but ignores their status |
| |
| Note on bots: |
| |
| * You can repeat bots across builders, if they belong to multiple classes, for |
| example "self-hosting" and "test-suite". The script will cache the results |
| and simply re-print them, so this is *only* for visualisation / organisation |
| purposes. |
| * Using the same bot name on different masters means *different* bots. It may |
| be the same configuration on two different masters, or it may be completely |
| different bots. Beware. |
| |
| |
| HTML Page |
| --------- |
| |
| For now, there's only HTML output, but there's nothing stopping we to develop |
| more forms of communication (email, IRC bots, etc). |
| |
| The HTML page is separated into blocks: Masters, Builder Groups, Bots. It also |
| has a date on the top, to make sure you're looking at an up-to-date page, and |
| it changes the page icon from green to red if at least one (non-ignored) bot |
| is broken. |
| |
| Bots offline are considered broken, as they may require attention. But when the |
| admin restarts the master, that kills all buildslaves, and this show up as |
| "slave lost". You don't need to do anything, just wait for the next successful |
| build. |
| |
| Each buildbot has four columns: |
| |
| * Name & link: The bot name with a link to its page on its master. Good for |
| easy access to buildbots and masters. |
| * Status: Can only be "PASS" or "FAIL", but contains additional information |
| if it fails, ex. "slave lost" or "build stage 1" or "test-suite". These are |
| the name of the stages that failed. |
| * Build number: The build number, to help identify if there is a change from |
| a specific number. Not very useful, but there just for reference. |
| * Commit range: The range of commits that were tested on that build. This is |
| very helpful to identify if a slow bot is failing because it hasn't yet |
| reached the commit range on a fast bot that is passing, or not. |
| |
| |
| LLVM Masters |
| ------------ |
| |
| There are a number of masters in the LLVM upstream infrastructure, and we may |
| need to monitor bots in all of those, or switch between them, depending on the |
| need. |
| |
| * LLVM Upstream main master: http://lab.llvm.org:8011/ |
| |
| This is the main master that spams everyone every time one of the bots break. |
| Unless there is any specific concern, bots should be in this master. |
| |
| * LLVM Upstream silent master: http://lab.llvm.org:8014/ |
| |
| Exactly the same as above, but no emails are sent. This master is usually empty |
| except for the bots that may be noise temporarily, in active development, or |
| being a bot that doesn't track compiler regressions, but performance regressions |
| which is monitored on another page (http://llvm.org/perf/) |
| |
| * LLVM Japan master: http://bb.pgr.jp/ |
| |
| A side master built by Nakamura Takumi with some x86 and x86_64 buildbots. We |
| rarely need to monitor anything there, but it's good to know it's there. |
| |
| * Linaro Downstream master: http://buildmaster.tcwglab.linaro.org/ |
| |
| Our local master, that we use for development. Individual developers can have |
| their own containers, in which case, the masters will be in different ports. |
| |
| These bots should always be ignored for their global status, or we'll generate |
| a lot of noise to ourselves. Unless, of course, they're in their way upstream |
| and going through staging deployment. |
| |
| * Green Dragon bots: http://lab.llvm.org:8080/green/ |
| |
| This is not a buildbot master, but Jenkins. We don't monitor those in our page |
| but they do have IRC bots in the #llvm channel and are already quite good at |
| displaying success and failures. |