Renato Golin | 94cc104 | 2016-04-26 11:02:23 +0100 | [diff] [blame] | 1 | Monitoring Tools for LLVM Development |
| 2 | ===================================== |
| 3 | |
| 4 | These tools are not meant to be used for development or testing, but to be |
| 5 | left running on a server or desktop as monitoring for your buildbots. They |
| 6 | are also meant to be used in conjunction, not as a replacement, to Nagios |
| 7 | and other hardware-level monitoring tools. |
| 8 | |
| 9 | Currently we only have one: bot-monitor, which I keep running on Linaro's |
| 10 | public server (people.linaro.org) and keep it as a bookmark to quickly check |
| 11 | the bot status. It's also a helpful bookmark for all bots we care. |
Renato Golin | 662105b | 2016-06-10 12:47:43 +0100 | [diff] [blame] | 12 | |
| 13 | JSON Documentation |
| 14 | ------------------ |
| 15 | |
| 16 | The JSON file should be self-explanatory, but just in case, here's a few |
| 17 | of the behaviours it exhibits when rendered by the current version of the |
| 18 | bot-monitor. |
| 19 | |
| 20 | The base structure is a list of masters, which has a few properties and a list |
| 21 | of builder groups, which in turn also have some properties and a list of slaves. |
| 22 | |
| 23 | Master properties: |
| 24 | |
| 25 | "name": "Name of the master, which will appear in bold big letters", |
| 26 | "base_url": "http://SERVER:PORT/BASE", |
| 27 | "builder_url": "part of the URL that refers to the list of builders", |
| 28 | "build_url": "part of the URL that refers to the list of builds", |
| 29 | "ignore" : "true | false, shows or hide the entire master from the page" |
| 30 | "builders": [ ... ] |
| 31 | |
| 32 | Builder properties: |
| 33 | |
| 34 | "name": "Name of this group (fast bots, self-hosting, etc)", |
| 35 | "ignore" : "true | false, shows or hide the entire builder from the page" |
| 36 | "bots": [ ... ] |
| 37 | |
| 38 | Bots properties: |
| 39 | |
| 40 | "name": "Exact name of the buildbot (becomes part of the URL)", |
| 41 | "ignore": "true | false, to ignore or not failures in this bot" |
| 42 | |
| 43 | Note that "ignore" has two different behaviour: |
| 44 | |
| 45 | * On masters and builders, it omits the entire class from the output |
| 46 | * On bots, it still shows them, but ignores their status |
| 47 | |
| 48 | Note on bots: |
| 49 | |
| 50 | * You can repeat bots across builders, if they belong to multiple classes, for |
| 51 | example "self-hosting" and "test-suite". The script will cache the results |
| 52 | and simply re-print them, so this is *only* for visualisation / organisation |
| 53 | purposes. |
| 54 | * Using the same bot name on different masters means *different* bots. It may |
| 55 | be the same configuration on two different masters, or it may be completely |
| 56 | different bots. Beware. |
| 57 | |
| 58 | |
| 59 | HTML Page |
| 60 | --------- |
| 61 | |
| 62 | For now, there's only HTML output, but there's nothing stopping we to develop |
| 63 | more forms of communication (email, IRC bots, etc). |
| 64 | |
| 65 | The HTML page is separated into blocks: Masters, Builder Groups, Bots. It also |
| 66 | has a date on the top, to make sure you're looking at an up-to-date page, and |
| 67 | it changes the page icon from green to red if at least one (non-ignored) bot |
| 68 | is broken. |
| 69 | |
| 70 | Bots offline are considered broken, as they may require attention. But when the |
| 71 | admin restarts the master, that kills all buildslaves, and this show up as |
| 72 | "slave lost". You don't need to do anything, just wait for the next successful |
| 73 | build. |
| 74 | |
| 75 | Each buildbot has four columns: |
| 76 | |
| 77 | * Name & link: The bot name with a link to its page on its master. Good for |
| 78 | easy access to buildbots and masters. |
| 79 | * Status: Can only be "PASS" or "FAIL", but contains additional information |
| 80 | if it fails, ex. "slave lost" or "build stage 1" or "test-suite". These are |
| 81 | the name of the stages that failed. |
| 82 | * Build number: The build number, to help identify if there is a change from |
| 83 | a specific number. Not very useful, but there just for reference. |
| 84 | * Commit range: The range of commits that were tested on that build. This is |
| 85 | very helpful to identify if a slow bot is failing because it hasn't yet |
| 86 | reached the commit range on a fast bot that is passing, or not. |
| 87 | |
| 88 | |
| 89 | LLVM Masters |
| 90 | ------------ |
| 91 | |
| 92 | There are a number of masters in the LLVM upstream infrastructure, and we may |
| 93 | need to monitor bots in all of those, or switch between them, depending on the |
| 94 | need. |
| 95 | |
| 96 | * LLVM Upstream main master: http://lab.llvm.org:8011/ |
| 97 | |
| 98 | This is the main master that spams everyone every time one of the bots break. |
| 99 | Unless there is any specific concern, bots should be in this master. |
| 100 | |
| 101 | * LLVM Upstream silent master: http://lab.llvm.org:8014/ |
| 102 | |
| 103 | Exactly the same as above, but no emails are sent. This master is usually empty |
| 104 | except for the bots that may be noise temporarily, in active development, or |
| 105 | being a bot that doesn't track compiler regressions, but performance regressions |
| 106 | which is monitored on another page (http://llvm.org/perf/) |
| 107 | |
| 108 | * LLVM Japan master: http://bb.pgr.jp/ |
| 109 | |
| 110 | A side master built by Nakamura Takumi with some x86 and x86_64 buildbots. We |
| 111 | rarely need to monitor anything there, but it's good to know it's there. |
| 112 | |
| 113 | * Linaro Downstream master: http://buildmaster.tcwglab.linaro.org/ |
| 114 | |
| 115 | Our local master, that we use for development. Individual developers can have |
| 116 | their own containers, in which case, the masters will be in different ports. |
| 117 | |
| 118 | These bots should always be ignored for their global status, or we'll generate |
| 119 | a lot of noise to ourselves. Unless, of course, they're in their way upstream |
| 120 | and going through staging deployment. |
| 121 | |
| 122 | * Green Dragon bots: http://lab.llvm.org:8080/green/ |
| 123 | |
| 124 | This is not a buildbot master, but Jenkins. We don't monitor those in our page |
| 125 | but they do have IRC bots in the #llvm channel and are already quite good at |
| 126 | displaying success and failures. |