From 7ed735c33104f3c6194fbc67e3a8b6e64ae84ad1 Mon Sep 17 00:00:00 2001 From: Vincent Guittot Date: Wed, 4 Dec 2019 19:21:40 +0100 Subject: sched/fair: Fix find_idlest_group() to handle CPU affinity Because of CPU affinity, the local group can be skipped which breaks the assumption that statistics are always collected for local group. With uninitialized local_sgs, the comparison is meaningless and the behavior unpredictable. This can even end up to use local pointer which is to NULL in this case. If the local group has been skipped because of CPU affinity, we return the idlest group. Fixes: 57abff067a08 ("sched/fair: Rework find_idlest_group()") Reported-by: John Stultz Signed-off-by: Vincent Guittot Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Valentin Schneider Tested-by: John Stultz Cc: rostedt@goodmis.org Cc: valentin.schneider@arm.com Cc: mingo@redhat.com Cc: mgorman@suse.de Cc: juri.lelli@redhat.com Cc: dietmar.eggemann@arm.com Cc: bsegall@google.com Cc: qais.yousef@arm.com Link: https://lkml.kernel.org/r/1575483700-22153-1-git-send-email-vincent.guittot@linaro.org --- kernel/sched/fair.c | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'kernel/sched/fair.c') diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 08a233e97a01..146b6c83633f 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -8417,6 +8417,10 @@ find_idlest_group(struct sched_domain *sd, struct task_struct *p, if (!idlest) return NULL; + /* The local group has been skipped because of CPU affinity */ + if (!local) + return idlest; + /* * If the local group is idler than the selected idlest group * don't try and push the task. -- cgit v1.2.3 From 6cf82d559e1a1d89f06ff4d428aca479c1dd0be6 Mon Sep 17 00:00:00 2001 From: Vincent Guittot Date: Fri, 29 Nov 2019 15:04:47 +0100 Subject: sched/cfs: fix spurious active migration The load balance can fail to find a suitable task during the periodic check because the imbalance is smaller than half of the load of the waiting tasks. This results in the increase of the number of failed load balance, which can end up to start an active migration. This active migration is useless because the current running task is not a better choice than the waiting ones. In fact, the current task was probably not running but waiting for the CPU during one of the previous attempts and it had already not been selected. When load balance fails too many times to migrate a task, we should relax the contraint on the maximum load of the tasks that can be migrated similarly to what is done with cache hotness. Before the rework, load balance used to set the imbalance to the average load_per_task in order to mitigate such situation. This increased the likelihood of migrating a task but also of selecting a larger task than needed while more appropriate ones were in the list. Signed-off-by: Vincent Guittot Signed-off-by: Peter Zijlstra (Intel) Link: https://lkml.kernel.org/r/1575036287-6052-1-git-send-email-vincent.guittot@linaro.org --- kernel/sched/fair.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) (limited to 'kernel/sched/fair.c') diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 146b6c83633f..ba749f579714 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -7328,7 +7328,14 @@ static int detach_tasks(struct lb_env *env) load < 16 && !env->sd->nr_balance_failed) goto next; - if (load/2 > env->imbalance) + /* + * Make sure that we don't migrate too much load. + * Nevertheless, let relax the constraint if + * scheduler fails to find a good waiting task to + * migrate. + */ + if (load/2 > env->imbalance && + env->sd->nr_balance_failed <= env->sd->cache_nice_tries) goto next; env->imbalance -= load; -- cgit v1.2.3