aboutsummaryrefslogtreecommitdiff
path: root/doc/v2/scheduler.rst
blob: 113966ae0b5edfed4741532a6fb0ccb032d39cee (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
.. index:: state machine, scheduling, hacking

.. _state_machine:

State machine
#############

The state machine describes and controls the state and health of Workers,
Devices and Test Jobs.

Workers
*******

For each worker, two variables describe the current status:

* ``state``:

  * *Online*
  * *Offline*

* ``health``:

  * *Active*
  * *Maintenance*
  * *Retired*

``state`` is an internal variable, set by ``lava-master`` when receiving (or
not) pings from each worker.

.. caution:: When a worker is in *Offline*, none of the attached devices
   will be used to schedule new jobs.

``health`` can be used by admins to control the ``health`` of all attached
devices. For instance, when set to *Maintenance*, all attached devices will be
automatically put into maintenance mode so that no jobs will be scheduled on
those devices.

Devices
*******

For each device, two variables describe the current status:

* ``state``:

  * *Idle*: not in use by any test job

  * *Reserved*: has been reserved for a test job but the test job is not
    running yet

  * *Running*: currently running a test job

* ``health``:

  * *Good*
  * *Unknown*
  * *Looping*: should run health-checks in a loop
  * *Bad*
  * *Maintenance*
  * *Retired*

``state`` is an internal variable, set by ``lava-master`` and ``lava-logs``
when scheduling, starting, canceling and ending test jobs.

``health`` can be used by admins to indicate if a device should be used by the
scheduler or not. Moreover, when ending an health-check, the device health will
be set according to the test job health.

TestJobs
********

For each test job, two variables are describing the current status:

* ``state``:

  * *Submitted*: waiting in the queue

  * *Scheduling: part of a multinode test job where some sub-jobs are
    still in *Submitted*

  * *Scheduled*: has been scheduled. For multinode, it means that all
    sub-jobs are also scheduled

  * *Running*: currently running on a device

  * *Canceling*: has been canceled but not ended yet

  * *Finished*

.. note:: Only multinode test jobs use *Scheduling*. When all
   sub-jobs are in *Scheduling*, ``lava-master`` will transition all test
   jobs to *Scheduled*.

* ``health``:

  * *Unknown*: default value that will be overriden when the job is finished.

  * *Complete*

  * *Incomplete*

  * *Canceled*: the test job was canceled.

.. _scheduler:

Scheduler
#########

The scheduler is called by ``lava-master`` approximatively every 20 seconds.
The scheduler starts by scheduling health-checks. The remaining devices are
then considered for test jobs.

Health-checks
*************

To ensure that health-checks are always scheduled when needed, they will be
considered first by the scheduler before regular test jobs.

The scheduler will only consider devices where:

* `state` is *Idle*
* `health` is *Good*, *Unknown* or *Looping*
* worker's `state` is *Online*

.. note:: A device whose ``health`` is *Bad*, *Maintenance* or *Retired* is
   never considered by the scheduler when it is looking for devices to run test
   jobs

Test jobs
*********

The scheduler will only consider devices where:

* ``state`` is *Idle*
* ``health`` is *Good* or *Unknown*
* worker's ``state`` is *Online*