Blame - monitor/README.txt - toolchain/llvm/linaro-scripts.git

blob: 652c4eb0b6e6ef82f3ee5bc613b02b4aa221b7a8 [file] [log] [blame]

Renato Golin	94cc104	2016-04-26 11:02:23 +0100	[diff] [blame]	1	Monitoring Tools for LLVM Development
				2	=====================================
				3
				4	These tools are not meant to be used for development or testing, but to be
				5	left running on a server or desktop as monitoring for your buildbots. They
				6	are also meant to be used in conjunction, not as a replacement, to Nagios
				7	and other hardware-level monitoring tools.
				8
				9	Currently we only have one: bot-monitor, which I keep running on Linaro's
				10	public server (people.linaro.org) and keep it as a bookmark to quickly check
				11	the bot status. It's also a helpful bookmark for all bots we care.
Renato Golin	662105b	2016-06-10 12:47:43 +0100	[diff] [blame]	12
				13	JSON Documentation
				14	------------------
				15
				16	The JSON file should be self-explanatory, but just in case, here's a few
				17	of the behaviours it exhibits when rendered by the current version of the
				18	bot-monitor.
				19
				20	The base structure is a list of masters, which has a few properties and a list
				21	of builder groups, which in turn also have some properties and a list of slaves.
				22
				23	Master properties:
				24
				25	"name": "Name of the master, which will appear in bold big letters",
				26	"base_url": "http://SERVER:PORT/BASE",
				27	"builder_url": "part of the URL that refers to the list of builders",
				28	"build_url": "part of the URL that refers to the list of builds",
				29	"ignore" : "true \| false, shows or hide the entire master from the page"
				30	"builders": [ ... ]
				31
				32	Builder properties:
				33
				34	"name": "Name of this group (fast bots, self-hosting, etc)",
				35	"ignore" : "true \| false, shows or hide the entire builder from the page"
				36	"bots": [ ... ]
				37
				38	Bots properties:
				39
				40	"name": "Exact name of the buildbot (becomes part of the URL)",
				41	"ignore": "true \| false, to ignore or not failures in this bot"
				42
				43	Note that "ignore" has two different behaviour:
				44
				45	* On masters and builders, it omits the entire class from the output
				46	* On bots, it still shows them, but ignores their status
				47
				48	Note on bots:
				49
				50	* You can repeat bots across builders, if they belong to multiple classes, for
				51	example "self-hosting" and "test-suite". The script will cache the results
				52	and simply re-print them, so this is only for visualisation / organisation
				53	purposes.
				54	* Using the same bot name on different masters means different bots. It may
				55	be the same configuration on two different masters, or it may be completely
				56	different bots. Beware.
				57
				58
				59	HTML Page
				60	---------
				61
				62	For now, there's only HTML output, but there's nothing stopping we to develop
				63	more forms of communication (email, IRC bots, etc).
				64
				65	The HTML page is separated into blocks: Masters, Builder Groups, Bots. It also
				66	has a date on the top, to make sure you're looking at an up-to-date page, and
				67	it changes the page icon from green to red if at least one (non-ignored) bot
				68	is broken.
				69
				70	Bots offline are considered broken, as they may require attention. But when the
				71	admin restarts the master, that kills all buildslaves, and this show up as
				72	"slave lost". You don't need to do anything, just wait for the next successful
				73	build.
				74
				75	Each buildbot has four columns:
				76
				77	* Name & link: The bot name with a link to its page on its master. Good for
				78	easy access to buildbots and masters.
				79	* Status: Can only be "PASS" or "FAIL", but contains additional information
				80	if it fails, ex. "slave lost" or "build stage 1" or "test-suite". These are
				81	the name of the stages that failed.
				82	* Build number: The build number, to help identify if there is a change from
				83	a specific number. Not very useful, but there just for reference.
				84	* Commit range: The range of commits that were tested on that build. This is
				85	very helpful to identify if a slow bot is failing because it hasn't yet
				86	reached the commit range on a fast bot that is passing, or not.
				87
				88
				89	LLVM Masters
				90	------------
				91
				92	There are a number of masters in the LLVM upstream infrastructure, and we may
				93	need to monitor bots in all of those, or switch between them, depending on the
				94	need.
				95
				96	* LLVM Upstream main master: http://lab.llvm.org:8011/
				97
				98	This is the main master that spams everyone every time one of the bots break.
				99	Unless there is any specific concern, bots should be in this master.
				100
				101	* LLVM Upstream silent master: http://lab.llvm.org:8014/
				102
				103	Exactly the same as above, but no emails are sent. This master is usually empty
				104	except for the bots that may be noise temporarily, in active development, or
				105	being a bot that doesn't track compiler regressions, but performance regressions
				106	which is monitored on another page (http://llvm.org/perf/)
				107
				108	* LLVM Japan master: http://bb.pgr.jp/
				109
				110	A side master built by Nakamura Takumi with some x86 and x86_64 buildbots. We
				111	rarely need to monitor anything there, but it's good to know it's there.
				112
				113	* Linaro Downstream master: http://buildmaster.tcwglab.linaro.org/
				114
				115	Our local master, that we use for development. Individual developers can have
				116	their own containers, in which case, the masters will be in different ports.
				117
				118	These bots should always be ignored for their global status, or we'll generate
				119	a lot of noise to ourselves. Unless, of course, they're in their way upstream
				120	and going through staging deployment.
				121
				122	* Green Dragon bots: http://lab.llvm.org:8080/green/
				123
				124	This is not a buildbot master, but Jenkins. We don't monitor those in our page
				125	but they do have IRC bots in the #llvm channel and are already quite good at
				126	displaying success and failures.