monitor/README.txt - toolchain/llvm/linaro-scripts.git - Linaro Git Browser

 Monitoring Tools for LLVM Development
 =====================================

 These tools are not meant to be used for development or testing, but to be
 left running on a server or desktop as monitoring for your buildbots. They
 are also meant to be used in conjunction, not as a replacement, to Nagios
 and other hardware-level monitoring tools.

 Currently we only have one: bot-monitor, which I keep running on Linaro's
 public server (people.linaro.org) and keep it as a bookmark to quickly check
 the bot status. It's also a helpful bookmark for all bots we care.

 JSON Documentation
 ------------------

 The JSON file should be self-explanatory, but just in case, here's a few
 of the behaviours it exhibits when rendered by the current version of the
 bot-monitor.

 The base structure is a list of masters, which has a few properties and a list
 of builder groups, which in turn also have some properties and a list of slaves.

 Master properties:

     "name": "Name of the master, which will appear in bold big letters",
     "base_url": "http://SERVER:PORT/BASE",
     "builder_url": "part of the URL that refers to the list of builders",
     "build_url": "part of the URL that refers to the list of builds",
     "ignore" : "true | false, shows or hide the entire master from the page"
     "builders": [ ... ]

 Builder properties:

     "name": "Name of this group (fast bots, self-hosting, etc)",
     "ignore" : "true | false, shows or hide the entire builder from the page"
     "bots": [ ... ]

 Bots properties:

     "name": "Exact name of the buildbot (becomes part of the URL)",
     "ignore": "true | false, to ignore or not failures in this bot"

 Note that "ignore" has two different behaviour:

  * On masters and builders, it omits the entire class from the output
  * On bots, it still shows them, but ignores their status

 Note on bots:

   * You can repeat bots across builders, if they belong to multiple classes, for
     example "self-hosting" and "test-suite". The script will cache the results
     and simply re-print them, so this is *only* for visualisation / organisation
     purposes.
   * Using the same bot name on different masters means *different* bots. It may
     be the same configuration on two different masters, or it may be completely
     different bots. Beware.


 HTML Page
 ---------

 For now, there's only HTML output, but there's nothing stopping we to develop
 more forms of communication (email, IRC bots, etc).

 The HTML page is separated into blocks: Masters, Builder Groups, Bots. It also
 has a date on the top, to make sure you're looking at an up-to-date page, and
 it changes the page icon from green to red if at least one (non-ignored) bot
 is broken.

 Bots offline are considered broken, as they may require attention. But when the
 admin restarts the master, that kills all buildslaves, and this show up as
 "slave lost". You don't need to do anything, just wait for the next successful
 build.

 Each buildbot has four columns:

  * Name & link: The bot name with a link to its page on its master. Good for
    easy access to buildbots and masters.
  * Status: Can only be "PASS" or "FAIL", but contains additional information
    if it fails, ex. "slave lost" or "build stage 1" or "test-suite". These are
    the name of the stages that failed.
  * Build number: The build number, to help identify if there is a change from
    a specific number. Not very useful, but there just for reference.
  * Commit range: The range of commits that were tested on that build. This is
    very helpful to identify if a slow bot is failing because it hasn't yet
    reached the commit range on a fast bot that is passing, or not.


 LLVM Masters
 ------------

 There are a number of masters in the LLVM upstream infrastructure, and we may
 need to monitor bots in all of those, or switch between them, depending on the
 need.

 * LLVM Upstream main master: http://lab.llvm.org:8011/

 This is the main master that spams everyone every time one of the bots break.
 Unless there is any specific concern, bots should be in this master.

 * LLVM Upstream silent master: http://lab.llvm.org:8014/

 Exactly the same as above, but no emails are sent. This master is usually empty
 except for the bots that may be noise temporarily, in active development, or
 being a bot that doesn't track compiler regressions, but performance regressions
 which is monitored on another page (http://llvm.org/perf/)

 * LLVM Japan master: http://bb.pgr.jp/

 A side master built by Nakamura Takumi with some x86 and x86_64 buildbots. We
 rarely need to monitor anything there, but it's good to know it's there.

 * Linaro Downstream master: http://buildmaster.tcwglab.linaro.org/

 Our local master, that we use for development. Individual developers can have
 their own containers, in which case, the masters will be in different ports.

 These bots should always be ignored for their global status, or we'll generate
 a lot of noise to ourselves. Unless, of course, they're in their way upstream
 and going through staging deployment.

 * Green Dragon bots: http://lab.llvm.org:8080/green/

 This is not a buildbot master, but Jenkins. We don't monitor those in our page
 but they do have IRC bots in the #llvm channel and are already quite good at
 displaying success and failures.
	Monitoring Tools for LLVM Development
	=====================================

	These tools are not meant to be used for development or testing, but to be
	left running on a server or desktop as monitoring for your buildbots. They
	are also meant to be used in conjunction, not as a replacement, to Nagios
	and other hardware-level monitoring tools.

	Currently we only have one: bot-monitor, which I keep running on Linaro's
	public server (people.linaro.org) and keep it as a bookmark to quickly check
	the bot status. It's also a helpful bookmark for all bots we care.

	JSON Documentation
	------------------

	The JSON file should be self-explanatory, but just in case, here's a few
	of the behaviours it exhibits when rendered by the current version of the
	bot-monitor.

	The base structure is a list of masters, which has a few properties and a list
	of builder groups, which in turn also have some properties and a list of slaves.

	Master properties:

	"name": "Name of the master, which will appear in bold big letters",
	"base_url": "http://SERVER:PORT/BASE",
	"builder_url": "part of the URL that refers to the list of builders",
	"build_url": "part of the URL that refers to the list of builds",
	"ignore" : "true \| false, shows or hide the entire master from the page"
	"builders": [ ... ]

	Builder properties:

	"name": "Name of this group (fast bots, self-hosting, etc)",
	"ignore" : "true \| false, shows or hide the entire builder from the page"
	"bots": [ ... ]

	Bots properties:

	"name": "Exact name of the buildbot (becomes part of the URL)",
	"ignore": "true \| false, to ignore or not failures in this bot"

	Note that "ignore" has two different behaviour:

	* On masters and builders, it omits the entire class from the output
	* On bots, it still shows them, but ignores their status

	Note on bots:

	* You can repeat bots across builders, if they belong to multiple classes, for
	example "self-hosting" and "test-suite". The script will cache the results
	and simply re-print them, so this is only for visualisation / organisation
	purposes.
	* Using the same bot name on different masters means different bots. It may
	be the same configuration on two different masters, or it may be completely
	different bots. Beware.


	HTML Page
	---------

	For now, there's only HTML output, but there's nothing stopping we to develop
	more forms of communication (email, IRC bots, etc).

	The HTML page is separated into blocks: Masters, Builder Groups, Bots. It also
	has a date on the top, to make sure you're looking at an up-to-date page, and
	it changes the page icon from green to red if at least one (non-ignored) bot
	is broken.

	Bots offline are considered broken, as they may require attention. But when the
	admin restarts the master, that kills all buildslaves, and this show up as
	"slave lost". You don't need to do anything, just wait for the next successful
	build.

	Each buildbot has four columns:

	* Name & link: The bot name with a link to its page on its master. Good for
	easy access to buildbots and masters.
	* Status: Can only be "PASS" or "FAIL", but contains additional information
	if it fails, ex. "slave lost" or "build stage 1" or "test-suite". These are
	the name of the stages that failed.
	* Build number: The build number, to help identify if there is a change from
	a specific number. Not very useful, but there just for reference.
	* Commit range: The range of commits that were tested on that build. This is
	very helpful to identify if a slow bot is failing because it hasn't yet
	reached the commit range on a fast bot that is passing, or not.


	LLVM Masters
	------------

	There are a number of masters in the LLVM upstream infrastructure, and we may
	need to monitor bots in all of those, or switch between them, depending on the
	need.

	* LLVM Upstream main master: http://lab.llvm.org:8011/

	This is the main master that spams everyone every time one of the bots break.
	Unless there is any specific concern, bots should be in this master.

	* LLVM Upstream silent master: http://lab.llvm.org:8014/

	Exactly the same as above, but no emails are sent. This master is usually empty
	except for the bots that may be noise temporarily, in active development, or
	being a bot that doesn't track compiler regressions, but performance regressions
	which is monitored on another page (http://llvm.org/perf/)

	* LLVM Japan master: http://bb.pgr.jp/

	A side master built by Nakamura Takumi with some x86 and x86_64 buildbots. We
	rarely need to monitor anything there, but it's good to know it's there.

	* Linaro Downstream master: http://buildmaster.tcwglab.linaro.org/

	Our local master, that we use for development. Individual developers can have
	their own containers, in which case, the masters will be in different ports.

	These bots should always be ignored for their global status, or we'll generate
	a lot of noise to ourselves. Unless, of course, they're in their way upstream
	and going through staging deployment.

	* Green Dragon bots: http://lab.llvm.org:8080/green/

	This is not a buildbot master, but Jenkins. We don't monitor those in our page
	but they do have IRC bots in the #llvm channel and are already quite good at
	displaying success and failures.