2019-07-27 05:57:46

by Crystal Wood

[permalink] [raw]
Subject: [PATCH RT 7/8] sched: migrate_enable: Use select_fallback_rq()

migrate_enable() currently open-codes a variant of select_fallback_rq().
However, it does not have the "No more Mr. Nice Guy" fallback and thus
it will pass an invalid CPU to the migration thread if cpus_mask only
contains a CPU that is !active.

Signed-off-by: Scott Wood <[email protected]>
---
This scenario will be more likely after the next patch, since
the migrate_disable_update check goes away. However, it could happen
anyway if cpus_mask was updated to a CPU other than the one we were
pinned to, and that CPU subsequently became inactive.
---
kernel/sched/core.c | 25 ++++++++++---------------
1 file changed, 10 insertions(+), 15 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index eb27a9bf70d7..3a2d8251a30c 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7368,6 +7368,7 @@ void migrate_enable(void)
if (p->migrate_disable_update) {
struct rq *rq;
struct rq_flags rf;
+ int cpu = task_cpu(p);

rq = task_rq_lock(p, &rf);
update_rq_clock(rq);
@@ -7377,21 +7378,15 @@ void migrate_enable(void)

p->migrate_disable_update = 0;

- WARN_ON(smp_processor_id() != task_cpu(p));
- if (!cpumask_test_cpu(task_cpu(p), &p->cpus_mask)) {
- const struct cpumask *cpu_valid_mask = cpu_active_mask;
- struct migration_arg arg;
- unsigned int dest_cpu;
-
- if (p->flags & PF_KTHREAD) {
- /*
- * Kernel threads are allowed on online && !active CPUs
- */
- cpu_valid_mask = cpu_online_mask;
- }
- dest_cpu = cpumask_any_and(cpu_valid_mask, &p->cpus_mask);
- arg.task = p;
- arg.dest_cpu = dest_cpu;
+ WARN_ON(smp_processor_id() != cpu);
+ if (!cpumask_test_cpu(cpu, &p->cpus_mask)) {
+ struct migration_arg arg = { p };
+ struct rq_flags rf;
+
+ rq = task_rq_lock(p, &rf);
+ update_rq_clock(rq);
+ arg.dest_cpu = select_fallback_rq(cpu, p);
+ task_rq_unlock(rq, p, &rf);

unpin_current_cpu();
preempt_lazy_enable();
--
1.8.3.1



Subject: Re: [PATCH RT 7/8] sched: migrate_enable: Use select_fallback_rq()

On 2019-07-27 00:56:37 [-0500], Scott Wood wrote:
> migrate_enable() currently open-codes a variant of select_fallback_rq().
> However, it does not have the "No more Mr. Nice Guy" fallback and thus
> it will pass an invalid CPU to the migration thread if cpus_mask only
> contains a CPU that is !active.
>
> Signed-off-by: Scott Wood <[email protected]>
> ---
> This scenario will be more likely after the next patch, since
> the migrate_disable_update check goes away. However, it could happen
> anyway if cpus_mask was updated to a CPU other than the one we were
> pinned to, and that CPU subsequently became inactive.

I'm unclear about the problem / side effect this has (before and after
the change). It is possible (before and after that change) that a CPU is
selected which is invalid / goes offline after the "preempt_enable()"
statement and before stop_one_cpu() does its job, correct?

> ---
> kernel/sched/core.c | 25 ++++++++++---------------
> 1 file changed, 10 insertions(+), 15 deletions(-)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index eb27a9bf70d7..3a2d8251a30c 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -7368,6 +7368,7 @@ void migrate_enable(void)
> if (p->migrate_disable_update) {
> struct rq *rq;
> struct rq_flags rf;
> + int cpu = task_cpu(p);
>
> rq = task_rq_lock(p, &rf);
> update_rq_clock(rq);
> @@ -7377,21 +7378,15 @@ void migrate_enable(void)
>
> p->migrate_disable_update = 0;
>
> - WARN_ON(smp_processor_id() != task_cpu(p));
> - if (!cpumask_test_cpu(task_cpu(p), &p->cpus_mask)) {
> - const struct cpumask *cpu_valid_mask = cpu_active_mask;
> - struct migration_arg arg;
> - unsigned int dest_cpu;
> -
> - if (p->flags & PF_KTHREAD) {
> - /*
> - * Kernel threads are allowed on online && !active CPUs
> - */
> - cpu_valid_mask = cpu_online_mask;
> - }
> - dest_cpu = cpumask_any_and(cpu_valid_mask, &p->cpus_mask);
> - arg.task = p;
> - arg.dest_cpu = dest_cpu;
> + WARN_ON(smp_processor_id() != cpu);
> + if (!cpumask_test_cpu(cpu, &p->cpus_mask)) {
> + struct migration_arg arg = { p };
> + struct rq_flags rf;
> +
> + rq = task_rq_lock(p, &rf);
> + update_rq_clock(rq);
> + arg.dest_cpu = select_fallback_rq(cpu, p);
> + task_rq_unlock(rq, p, &rf);
>
> unpin_current_cpu();
> preempt_lazy_enable();

Sebastian

2019-09-26 12:41:18

by Crystal Wood

[permalink] [raw]
Subject: Re: [PATCH RT 7/8] sched: migrate_enable: Use select_fallback_rq()

On Tue, 2019-09-17 at 18:00 +0200, Sebastian Andrzej Siewior wrote:
> On 2019-07-27 00:56:37 [-0500], Scott Wood wrote:
> > migrate_enable() currently open-codes a variant of select_fallback_rq().
> > However, it does not have the "No more Mr. Nice Guy" fallback and thus
> > it will pass an invalid CPU to the migration thread if cpus_mask only
> > contains a CPU that is !active.
> >
> > Signed-off-by: Scott Wood <[email protected]>
> > ---
> > This scenario will be more likely after the next patch, since
> > the migrate_disable_update check goes away. However, it could happen
> > anyway if cpus_mask was updated to a CPU other than the one we were
> > pinned to, and that CPU subsequently became inactive.
>
> I'm unclear about the problem / side effect this has (before and after
> the change). It is possible (before and after that change) that a CPU is
> selected which is invalid / goes offline after the "preempt_enable()"
> statement and before stop_one_cpu() does its job, correct?

By "pass an invalid CPU" I don't mean offline; I mean >= nr_cpu_ids which is
what cpumask_any_and() returns if the sets don't intersect (a CPU going
offline is merely a way to end up in that situation). At one point I
observed that causing a crash. I guess is_cpu_allowed() returned true by
chance based on the contents of data beyond the end of the cpumask? That
doesn't seem likely based on what comes after p->cpus_mask (at that point
migrate_disable should be zero), but maybe I had something else at that
point in the struct while developing. In any case, not something to be
relied on. :-)

Going offline after selection shouldn't be a problem. migration_cpu_stop()
won't do anything if is_cpu_allowed() returns false, and we'll get migrated
off the CPU by migrate_tasks(). Even if we get migrated after reading
task_cpu(p) but before queuing the stop machine work, it'll either get
flushed when the stopper thread parks, or rejected due to stopper->enabled
being false.

-Scott