2022-11-14 12:22:35

by Xuewen Yan

[permalink] [raw]
Subject: [PATCH] sched/rt: Use cpu_active_mask to prevent rto_push_irq_work's dead loop

We are performing the stress test related to cpu hotplug,
when there are only two cpus left in the system(for example: cpu0 and cpu1),
if cpu1 is offline at this time, the following infinite loop may occur:

When cpu0 contains more than one rt task, it will push the other rt tasks
to cpux for execution. If cpu1 is going through the hotplug process at this time,
it woule start a stop_machine to be responsible for migrating the tasks on cpu1
to other online cpus. And prevent any task from migrating to this cpu.
As a result, when the cpu0 migrates the rt task to cpu1, the stop_machine
re-migrates the rt task to cpu0, which causes the rt.pushable_tasks of rq0
to never be null. The result is these rt tasks that have been repeatedly
migrated to cpu0 and cpu1.

In order to prevent the above situation from happening, use cpu_active_mask
to prevent find_lowset_rq from selecting inactive cpu. At the same time,
when there is only one active cpu in the system, irq_push_irq_work should be
terminated in advance.

Co-developed-by: Ke Wang <[email protected]>
Signed-off-by: Ke Wang <[email protected]>
Signed-off-by: Xuewen Yan <[email protected]>
---
kernel/sched/cpupri.c | 1 +
kernel/sched/rt.c | 6 ++++++
2 files changed, 7 insertions(+)

diff --git a/kernel/sched/cpupri.c b/kernel/sched/cpupri.c
index a286e726eb4b..42c40cfdf836 100644
--- a/kernel/sched/cpupri.c
+++ b/kernel/sched/cpupri.c
@@ -101,6 +101,7 @@ static inline int __cpupri_find(struct cpupri *cp, struct task_struct *p,

if (lowest_mask) {
cpumask_and(lowest_mask, &p->cpus_mask, vec->mask);
+ cpumask_and(lowest_mask, lowest_mask, cpu_active_mask);

/*
* We have to ensure that we have at least one bit
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index ed2a47e4ddae..9433696dae50 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -2219,6 +2219,7 @@ static int rto_next_cpu(struct root_domain *rd)
{
int next;
int cpu;
+ struct cpumask tmp_cpumask;

/*
* When starting the IPI RT pushing, the rto_cpu is set to -1,
@@ -2238,6 +2239,11 @@ static int rto_next_cpu(struct root_domain *rd)
/* When rto_cpu is -1 this acts like cpumask_first() */
cpu = cpumask_next(rd->rto_cpu, rd->rto_mask);

+ cpumask_and(&tmp_cpumask, rd->rto_mask, cpu_active_mask);
+ if (rd->rto_cpu == -1 && cpumask_weight(&tmp_cpumask) == 1 &&
+ cpumask_test_cpu(smp_processor_id(), &tmp_cpumask))
+ break;
+
rd->rto_cpu = cpu;

if (cpu < nr_cpu_ids)
--
2.25.1



2022-11-17 22:36:37

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH] sched/rt: Use cpu_active_mask to prevent rto_push_irq_work's dead loop

On Mon, 14 Nov 2022 20:04:53 +0800
Xuewen Yan <[email protected]> wrote:

> +++ b/kernel/sched/rt.c
> @@ -2219,6 +2219,7 @@ static int rto_next_cpu(struct root_domain *rd)
> {
> int next;
> int cpu;
> + struct cpumask tmp_cpumask;

If you have a machine with thousands of CPUs, this will likely kill the
stack.

>
> /*
> * When starting the IPI RT pushing, the rto_cpu is set to -1,
> @@ -2238,6 +2239,11 @@ static int rto_next_cpu(struct root_domain *rd)
> /* When rto_cpu is -1 this acts like cpumask_first() */
> cpu = cpumask_next(rd->rto_cpu, rd->rto_mask);
>
> + cpumask_and(&tmp_cpumask, rd->rto_mask, cpu_active_mask);
> + if (rd->rto_cpu == -1 && cpumask_weight(&tmp_cpumask) == 1 &&
> + cpumask_test_cpu(smp_processor_id(), &tmp_cpumask))
> + break;
> +

Kill the above.

> rd->rto_cpu = cpu;
>
> if (cpu < nr_cpu_ids) {

Why not just add here:

if (!cpumask_test_cpu(cpu, cpu_active_mask))
continue;
return cpu;
}

?

-- Steve

2022-11-18 12:27:24

by Xuewen Yan

[permalink] [raw]
Subject: Re: [PATCH] sched/rt: Use cpu_active_mask to prevent rto_push_irq_work's dead loop

On Fri, Nov 18, 2022 at 6:16 AM Steven Rostedt <[email protected]> wrote:
>
> On Mon, 14 Nov 2022 20:04:53 +0800
> Xuewen Yan <[email protected]> wrote:
>
> > +++ b/kernel/sched/rt.c
> > @@ -2219,6 +2219,7 @@ static int rto_next_cpu(struct root_domain *rd)
> > {
> > int next;
> > int cpu;
> > + struct cpumask tmp_cpumask;
>
> If you have a machine with thousands of CPUs, this will likely kill the
> stack.
Ha, I did not take it into account. Thanks!

>
> >
> > /*
> > * When starting the IPI RT pushing, the rto_cpu is set to -1,
> > @@ -2238,6 +2239,11 @@ static int rto_next_cpu(struct root_domain *rd)
> > /* When rto_cpu is -1 this acts like cpumask_first() */
> > cpu = cpumask_next(rd->rto_cpu, rd->rto_mask);
> >
> > + cpumask_and(&tmp_cpumask, rd->rto_mask, cpu_active_mask);
> > + if (rd->rto_cpu == -1 && cpumask_weight(&tmp_cpumask) == 1 &&
> > + cpumask_test_cpu(smp_processor_id(), &tmp_cpumask))
> > + break;
> > +
>
> Kill the above.
>
> > rd->rto_cpu = cpu;
> >
> > if (cpu < nr_cpu_ids) {
>
> Why not just add here:
>
> if (!cpumask_test_cpu(cpu, cpu_active_mask))
> continue;
> return cpu;
> }
>
> ?
Let's consider this scenario:
the online_cpu_mask is 0x03(cpu0/1),the active_cpu_mask is
0x01(cpu0),the rto cpu is cpu0,
the rto_mask is 0x01, and the irq cpu is cpu0, as a result, the first
loop, the rto_cpu would be -1,
but the loop < rto_loop_next, on next loop, because of the rto_cpu is
-1, so the next rto cpu would
be cpu0 still, as a result, the cpu0 would push rt tasks to
cpu1(inactive cpu) while running in the irq_work.

So we should judge whether the current cpu(the only one active cpu) is
the next loop's cpu.

Thanks!

>
> -- Steve

2022-11-20 19:50:13

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH] sched/rt: Use cpu_active_mask to prevent rto_push_irq_work's dead loop

On Fri, 18 Nov 2022 20:08:54 +0800
Xuewen Yan <[email protected]> wrote:

> Let's consider this scenario:
> the online_cpu_mask is 0x03(cpu0/1),the active_cpu_mask is
> 0x01(cpu0),the rto cpu is cpu0,
> the rto_mask is 0x01, and the irq cpu is cpu0, as a result, the first
> loop, the rto_cpu would be -1,
> but the loop < rto_loop_next, on next loop, because of the rto_cpu is
> -1, so the next rto cpu would
> be cpu0 still, as a result, the cpu0 would push rt tasks to
> cpu1(inactive cpu) while running in the irq_work.
>
> So we should judge whether the current cpu(the only one active cpu) is
> the next loop's cpu.

Wait, I'm confused by what you are trying to do here.

The rto_next_cpu() only sends an IPI to the next CPU that has more than
one RT task queued on it, where the RT tasks can migrate.

If we send CPU0 an IPI, let CPU0 figure out to only push to the active
mask. Why are trying to prevent sending the IPI to an active CPU?

Will the first part of your patch that modifies the cpupri() not keep
it from pushing to the non active CPU?

-- Steve

2022-11-20 23:41:51

by Qais Yousef

[permalink] [raw]
Subject: Re: [PATCH] sched/rt: Use cpu_active_mask to prevent rto_push_irq_work's dead loop

On 11/17/22 17:00, Steven Rostedt wrote:
> On Mon, 14 Nov 2022 20:04:53 +0800
> Xuewen Yan <[email protected]> wrote:
>
> > +++ b/kernel/sched/rt.c
> > @@ -2219,6 +2219,7 @@ static int rto_next_cpu(struct root_domain *rd)
> > {
> > int next;
> > int cpu;
> > + struct cpumask tmp_cpumask;
>
> If you have a machine with thousands of CPUs, this will likely kill the
> stack.
>
> >
> > /*
> > * When starting the IPI RT pushing, the rto_cpu is set to -1,
> > @@ -2238,6 +2239,11 @@ static int rto_next_cpu(struct root_domain *rd)
> > /* When rto_cpu is -1 this acts like cpumask_first() */
> > cpu = cpumask_next(rd->rto_cpu, rd->rto_mask);
> >
> > + cpumask_and(&tmp_cpumask, rd->rto_mask, cpu_active_mask);
> > + if (rd->rto_cpu == -1 && cpumask_weight(&tmp_cpumask) == 1 &&
> > + cpumask_test_cpu(smp_processor_id(), &tmp_cpumask))
> > + break;
> > +
>
> Kill the above.
>
> > rd->rto_cpu = cpu;
> >
> > if (cpu < nr_cpu_ids) {
>
> Why not just add here:
>
> if (!cpumask_test_cpu(cpu, cpu_active_mask))
> continue;
> return cpu;
> }
>
> ?

Or this?


diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index ed2a47e4ddae..5073e06e34c3 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -2236,7 +2236,7 @@ static int rto_next_cpu(struct root_domain *rd)
for (;;) {

/* When rto_cpu is -1 this acts like cpumask_first() */
- cpu = cpumask_next(rd->rto_cpu, rd->rto_mask);
+ cpu = cpumask_next_and(rd->rto_cpu, rd->rto_mask, cpu_active_mask);

rd->rto_cpu = cpu;


I think we might be circumventing the root cause though. It looks like that
there are only 2 cpus online in the system and one of them going offline (cpu1)
while the other is being overloaded (cpu0), ending in ping-ponging the tasks?

If that's the case, it seems to me the rto state needs to be cleared for cpu0
and stop attempting to push tasks. Which I'd assume it usually happens but
there's a race that prevents it from realizing this.

Maybe something like this would be better? Assuming I understood the problem
correctly of course.


diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index ed2a47e4ddae..d80072cc2596 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -334,6 +334,10 @@ static inline void rt_set_overload(struct rq *rq)
if (!rq->online)
return;

+ /* Overload is meaningless if we shrank to become a UP system */
+ if (cpumask_weight(cpu_online_mask) == 1)
+ return;
+
cpumask_set_cpu(rq->cpu, rq->rd->rto_mask);
/*
* Make sure the mask is visible before we set

This could be working around the problem too, not sure. But the bug IMHO is
that if we shrink to a UP system, the whole push-pull mechanism should retire
temporarily. Which I assume this usually happens, but there's a race somewhere.


Thanks

--
Qais Yousef