2020-02-21 21:27:32

by Tom Zanussi

[permalink] [raw]
Subject: [PATCH RT 15/25] sched: migrate_enable: Use select_fallback_rq()

From: Scott Wood <[email protected]>

v4.14.170-rt75-rc1 stable review patch.
If anyone has any objections, please let me know.

-----------


[ Upstream commit adfa969d4cfcc995a9d866020124e50f1827d2d1 ]

migrate_enable() currently open-codes a variant of select_fallback_rq().
However, it does not have the "No more Mr. Nice Guy" fallback and thus
it will pass an invalid CPU to the migration thread if cpus_mask only
contains a CPU that is !active.

Signed-off-by: Scott Wood <[email protected]>
Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
Signed-off-by: Tom Zanussi <[email protected]>
---
kernel/sched/core.c | 25 ++++++++++---------------
1 file changed, 10 insertions(+), 15 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 189e6f08575e..46324d2099e3 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7008,6 +7008,7 @@ void migrate_enable(void)
if (p->migrate_disable_update) {
struct rq *rq;
struct rq_flags rf;
+ int cpu = task_cpu(p);

rq = task_rq_lock(p, &rf);
update_rq_clock(rq);
@@ -7017,21 +7018,15 @@ void migrate_enable(void)

p->migrate_disable_update = 0;

- WARN_ON(smp_processor_id() != task_cpu(p));
- if (!cpumask_test_cpu(task_cpu(p), &p->cpus_mask)) {
- const struct cpumask *cpu_valid_mask = cpu_active_mask;
- struct migration_arg arg;
- unsigned int dest_cpu;
-
- if (p->flags & PF_KTHREAD) {
- /*
- * Kernel threads are allowed on online && !active CPUs
- */
- cpu_valid_mask = cpu_online_mask;
- }
- dest_cpu = cpumask_any_and(cpu_valid_mask, &p->cpus_mask);
- arg.task = p;
- arg.dest_cpu = dest_cpu;
+ WARN_ON(smp_processor_id() != cpu);
+ if (!cpumask_test_cpu(cpu, &p->cpus_mask)) {
+ struct migration_arg arg = { p };
+ struct rq_flags rf;
+
+ rq = task_rq_lock(p, &rf);
+ update_rq_clock(rq);
+ arg.dest_cpu = select_fallback_rq(cpu, p);
+ task_rq_unlock(rq, p, &rf);

unpin_current_cpu();
preempt_lazy_enable();
--
2.14.1


Subject: Re: [PATCH RT 15/25] sched: migrate_enable: Use select_fallback_rq()

On 2020-02-21 15:24:43 [-0600], [email protected] wrote:
> From: Scott Wood <[email protected]>
>
> v4.14.170-rt75-rc1 stable review patch.
> If anyone has any objections, please let me know.

This creates bug which is stuffed later via
sched: migrate_enable: Busy loop until the migration request is completed

So if apply this, please take the bug fix, too. This is Stevens queue
for reference:
|[PATCH RT 22/30] sched: migrate_enable: Use select_fallback_rq()
^^ bug introduced

|[PATCH RT 23/30] sched: Lazy migrate_disable processing
|[PATCH RT 24/30] sched: migrate_enable: Use stop_one_cpu_nowait()
|[PATCH RT 25/30] Revert "ARM: Initialize split page table locks for vector page"
|[PATCH RT 26/30] locking: Make spinlock_t and rwlock_t a RCU section on RT
|[PATCH RT 27/30] sched/core: migrate_enable() must access takedown_cpu_task on !HOTPLUG_CPU
|[PATCH RT 28/30] lib/smp_processor_id: Adjust check_preemption_disabled()
|[PATCH RT 29/30] sched: migrate_enable: Busy loop until the migration request is completed
^^ bug fixed

Sebastian

2020-02-24 15:31:29

by Tom Zanussi

[permalink] [raw]
Subject: Re: [PATCH RT 15/25] sched: migrate_enable: Use select_fallback_rq()

On Mon, 2020-02-24 at 10:43 +0100, Sebastian Andrzej Siewior wrote:
> On 2020-02-21 15:24:43 [-0600], [email protected] wrote:
> > From: Scott Wood <[email protected]>
> >
> > v4.14.170-rt75-rc1 stable review patch.
> > If anyone has any objections, please let me know.
>
> This creates bug which is stuffed later via
> sched: migrate_enable: Busy loop until the migration request is
> completed
>
> So if apply this, please take the bug fix, too. This is Stevens queue
> for reference:
> > [PATCH RT 22/30] sched: migrate_enable: Use select_fallback_rq()
>
> ^^ bug introduced
>

Hmm, it seemed from the comment on the 4.19 series that it was '24/32
sched: migrate_enable: Use stop_one_cpu_nowait()' that required 'sched:
migrate_enable: Busy loop until the migration request is
completed' as a bug fix.

https://lore.kernel.org/linux-rt-users/[email protected]/#t

I didn't take the stop_one_cpu_nowait() one, so didn't take the busy
loop one either.

Thanks,

Tom

> > [PATCH RT 23/30] sched: Lazy migrate_disable
> > processing
> >
> > [PATCH RT 24/30] sched: migrate_enable: Use stop_one_cpu_nowait()
> > [PATCH RT 25/30] Revert "ARM: Initialize split page table locks for
> > vector page"
> > [PATCH RT 26/30] locking: Make spinlock_t and rwlock_t a RCU
> > section on RT
> > [PATCH RT 27/30] sched/core: migrate_enable() must access
> > takedown_cpu_task on !HOTPLUG_CPU
> > [PATCH RT 28/30] lib/smp_processor_id: Adjust
> > check_preemption_disabled()
> > [PATCH RT 29/30] sched: migrate_enable: Busy loop until the
> > migration request is completed
>
> ^^ bug fixed
>
> Sebastian

Subject: Re: [PATCH RT 15/25] sched: migrate_enable: Use select_fallback_rq()

On 2020-02-24 09:31:06 [-0600], Tom Zanussi wrote:
> On Mon, 2020-02-24 at 10:43 +0100, Sebastian Andrzej Siewior wrote:
> > On 2020-02-21 15:24:43 [-0600], [email protected] wrote:
> > > From: Scott Wood <[email protected]>
> > >
> > > v4.14.170-rt75-rc1 stable review patch.
> > > If anyone has any objections, please let me know.
> >
> > This creates bug which is stuffed later via
> > sched: migrate_enable: Busy loop until the migration request is
> > completed
> >
> > So if apply this, please take the bug fix, too. This is Stevens queue
> > for reference:
> > > [PATCH RT 22/30] sched: migrate_enable: Use select_fallback_rq()
> >
> > ^^ bug introduced
> >
>
> Hmm, it seemed from the comment on the 4.19 series that it was '24/32
> sched: migrate_enable: Use stop_one_cpu_nowait()' that required 'sched:
> migrate_enable: Busy loop until the migration request is
> completed' as a bug fix.
>
> https://lore.kernel.org/linux-rt-users/[email protected]/#t
>
> I didn't take the stop_one_cpu_nowait() one, so didn't take the busy
> loop one either.

Ach, it was the different WARN_ON() then. So this might not introduce
any bug then. *Might*.
Steven backported the whole pile and you took just this one patch. The
whole set was tested in devel and uncovered a problem which was fixed
later. Taking only a part *may* expose other problems it *may* be fine.

Steven, any opinion on your side?

> Thanks,
>
> Tom

Sebastian

2020-02-24 22:16:25

by Crystal Wood

[permalink] [raw]
Subject: Re: [PATCH RT 15/25] sched: migrate_enable: Use select_fallback_rq()

On Mon, 2020-02-24 at 17:05 +0100, Sebastian Andrzej Siewior wrote:
> On 2020-02-24 09:31:06 [-0600], Tom Zanussi wrote:
> > On Mon, 2020-02-24 at 10:43 +0100, Sebastian Andrzej Siewior wrote:
> > > On 2020-02-21 15:24:43 [-0600], [email protected] wrote:
> > > > From: Scott Wood <[email protected]>
> > > >
> > > > v4.14.170-rt75-rc1 stable review patch.
> > > > If anyone has any objections, please let me know.
> > >
> > > This creates bug which is stuffed later via
> > > sched: migrate_enable: Busy loop until the migration request is
> > > completed
> > >
> > > So if apply this, please take the bug fix, too. This is Stevens queue
> > > for reference:
> > > > [PATCH RT 22/30] sched: migrate_enable: Use select_fallback_rq()
> > >
> > > ^^ bug introduced
> > >
> >
> > Hmm, it seemed from the comment on the 4.19 series that it was '24/32
> > sched: migrate_enable: Use stop_one_cpu_nowait()' that required 'sched:
> > migrate_enable: Busy loop until the migration request is
> > completed' as a bug fix.
> >
> >
> > https://lore.kernel.org/linux-rt-users/[email protected]/#t
> >
> > I didn't take the stop_one_cpu_nowait() one, so didn't take the busy
> > loop one either.
>
> Ach, it was the different WARN_ON() then. So this might not introduce
> any bug then. *Might*.
> Steven backported the whole pile and you took just this one patch. The
> whole set was tested in devel and uncovered a problem which was fixed
> later. Taking only a part *may* expose other problems it *may* be fine.

Taking up to this patch should be OK (well, you still have the
current->state clobbering, but it shouldn't introduce any new known bugs).
The busy loop patch itself has a followup fix though (in theory the busy
loop could deadlock): 2dcd94b443c5dcbc ("sched: migrate_enable: Use per-cpu
cpu_stop_work") which should be considered for v4.19 rt stable which has the
busy loop patch.

-Scott