2022-03-30 12:10:27

by Zqiang

[permalink] [raw]
Subject: [PATCH] rcu: Put the irq work into hard interrupt context for execution

In PREEMPT_RT kernel, if irq work flags is not set, it will be
executed in per-CPU irq_work kthreads. set IRQ_WORK_HARD_IRQ flags
to irq work, put it in the context of hard interrupt execution,
accelerate scheduler to re-evaluate.

Signed-off-by: Zqiang <[email protected]>
---
kernel/rcu/tree.c | 2 +-
kernel/rcu/tree_plugin.h | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index e2ffbeceba69..a69587773a85 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -678,7 +678,7 @@ static void late_wakeup_func(struct irq_work *work)
}

static DEFINE_PER_CPU(struct irq_work, late_wakeup_work) =
- IRQ_WORK_INIT(late_wakeup_func);
+ IRQ_WORK_INIT_HARD(late_wakeup_func);

/*
* If either:
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 3037c2536e1f..cf7bd28af8ef 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -661,7 +661,7 @@ static void rcu_read_unlock_special(struct task_struct *t)
expboost && !rdp->defer_qs_iw_pending && cpu_online(rdp->cpu)) {
// Get scheduler to re-evaluate and call hooks.
// If !IRQ_WORK, FQS scan will eventually IPI.
- init_irq_work(&rdp->defer_qs_iw, rcu_preempt_deferred_qs_handler);
+ rdp->defer_qs_iw = IRQ_WORK_INIT_HARD(rcu_preempt_deferred_qs_handler);
rdp->defer_qs_iw_pending = true;
irq_work_queue_on(&rdp->defer_qs_iw, rdp->cpu);
}
--
2.25.1


2022-03-31 03:14:42

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] rcu: Put the irq work into hard interrupt context for execution

On Wed, Mar 30, 2022 at 02:00:12PM +0800, Zqiang wrote:
> In PREEMPT_RT kernel, if irq work flags is not set, it will be
> executed in per-CPU irq_work kthreads. set IRQ_WORK_HARD_IRQ flags
> to irq work, put it in the context of hard interrupt execution,
> accelerate scheduler to re-evaluate.
>
> Signed-off-by: Zqiang <[email protected]>
> ---
> kernel/rcu/tree.c | 2 +-
> kernel/rcu/tree_plugin.h | 2 +-
> 2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index e2ffbeceba69..a69587773a85 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -678,7 +678,7 @@ static void late_wakeup_func(struct irq_work *work)
> }
>
> static DEFINE_PER_CPU(struct irq_work, late_wakeup_work) =
> - IRQ_WORK_INIT(late_wakeup_func);
> + IRQ_WORK_INIT_HARD(late_wakeup_func);

This is used only by rcu_irq_work_resched(), which is invoked only by
rcu_user_enter(), which is never invoked until userspace is enabled,
by which time all of the various kthreads will have been spawned, correct?

Either way, please show me the exact sequence of events that lead to a
problem with the current IRQ_WORK_INIT().

> /*
> * If either:
> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> index 3037c2536e1f..cf7bd28af8ef 100644
> --- a/kernel/rcu/tree_plugin.h
> +++ b/kernel/rcu/tree_plugin.h
> @@ -661,7 +661,7 @@ static void rcu_read_unlock_special(struct task_struct *t)
> expboost && !rdp->defer_qs_iw_pending && cpu_online(rdp->cpu)) {
> // Get scheduler to re-evaluate and call hooks.
> // If !IRQ_WORK, FQS scan will eventually IPI.
> - init_irq_work(&rdp->defer_qs_iw, rcu_preempt_deferred_qs_handler);
> + rdp->defer_qs_iw = IRQ_WORK_INIT_HARD(rcu_preempt_deferred_qs_handler);
> rdp->defer_qs_iw_pending = true;
> irq_work_queue_on(&rdp->defer_qs_iw, rdp->cpu);
> }

OK, in theory, rcu_read_unlock() could get to this point before all of
the various kthreads were spawned. In practice, the next time that the
boot CPU went idle, the end of the quiescent state would be noticed.

Or has this been failing in some other manner? If so, please let me
know the exact sequence of events.

Thanx, Paul

2022-03-31 04:11:42

by Zqiang

[permalink] [raw]
Subject: RE: [PATCH] rcu: Put the irq work into hard interrupt context for execution

On Wed, Mar 30, 2022 at 02:00:12PM +0800, Zqiang wrote:
> In PREEMPT_RT kernel, if irq work flags is not set, it will be
> executed in per-CPU irq_work kthreads. set IRQ_WORK_HARD_IRQ flags to
> irq work, put it in the context of hard interrupt execution,
> accelerate scheduler to re-evaluate.
>
> Signed-off-by: Zqiang <[email protected]>
> ---
> kernel/rcu/tree.c | 2 +-
> kernel/rcu/tree_plugin.h | 2 +-
> 2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index
> e2ffbeceba69..a69587773a85 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -678,7 +678,7 @@ static void late_wakeup_func(struct irq_work
> *work) }
>
> static DEFINE_PER_CPU(struct irq_work, late_wakeup_work) =
> - IRQ_WORK_INIT(late_wakeup_func);
> + IRQ_WORK_INIT_HARD(late_wakeup_func);

>This is used only by rcu_irq_work_resched(), which is invoked only by rcu_user_enter(), which is never invoked until userspace is enabled, by which time all of the various kthreads will have been spawned, correct?
>
>Either way, please show me the exact sequence of events that lead to a problem with the current IRQ_WORK_INIT().
>
> /*
> * If either:
> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h index
> 3037c2536e1f..cf7bd28af8ef 100644
> --- a/kernel/rcu/tree_plugin.h
> +++ b/kernel/rcu/tree_plugin.h
> @@ -661,7 +661,7 @@ static void rcu_read_unlock_special(struct task_struct *t)
> expboost && !rdp->defer_qs_iw_pending && cpu_online(rdp->cpu)) {
> // Get scheduler to re-evaluate and call hooks.
> // If !IRQ_WORK, FQS scan will eventually IPI.
> - init_irq_work(&rdp->defer_qs_iw, rcu_preempt_deferred_qs_handler);
> + rdp->defer_qs_iw =
> +IRQ_WORK_INIT_HARD(rcu_preempt_deferred_qs_handler);
> rdp->defer_qs_iw_pending = true;
> irq_work_queue_on(&rdp->defer_qs_iw, rdp->cpu);
> }
>
>OK, in theory, rcu_read_unlock() could get to this point before all of the various kthreads were spawned. In practice, the next time that the boot CPU went idle, the end of the quiescent state would be noticed.

Through my understanding, use irq_work in order to make the quiescent state be noticed earlier,
Because the irq_work execute in interrupt, this irq_work can be executed in time, but In RT kernel
The irq_work is put into the kthread for execution, when it is executed, it is affected by the scheduling delay.
Is there anything I missed?

Thanks
Zqiang

>
>Or has this been failing in some other manner? If so, please let me know the exact sequence of events.
>
> Thanx, Paul

2022-04-01 01:57:30

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] rcu: Put the irq work into hard interrupt context for execution

On Wed, Mar 30, 2022 at 10:47:05PM +0000, Zhang, Qiang1 wrote:
> On Wed, Mar 30, 2022 at 02:00:12PM +0800, Zqiang wrote:
> > In PREEMPT_RT kernel, if irq work flags is not set, it will be
> > executed in per-CPU irq_work kthreads. set IRQ_WORK_HARD_IRQ flags to
> > irq work, put it in the context of hard interrupt execution,
> > accelerate scheduler to re-evaluate.
> >
> > Signed-off-by: Zqiang <[email protected]>
> > ---
> > kernel/rcu/tree.c | 2 +-
> > kernel/rcu/tree_plugin.h | 2 +-
> > 2 files changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index
> > e2ffbeceba69..a69587773a85 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -678,7 +678,7 @@ static void late_wakeup_func(struct irq_work
> > *work) }
> >
> > static DEFINE_PER_CPU(struct irq_work, late_wakeup_work) =
> > - IRQ_WORK_INIT(late_wakeup_func);
> > + IRQ_WORK_INIT_HARD(late_wakeup_func);
>
> >This is used only by rcu_irq_work_resched(), which is invoked only by rcu_user_enter(), which is never invoked until userspace is enabled, by which time all of the various kthreads will have been spawned, correct?
> >
> >Either way, please show me the exact sequence of events that lead to a problem with the current IRQ_WORK_INIT().
> >
> > /*
> > * If either:
> > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h index
> > 3037c2536e1f..cf7bd28af8ef 100644
> > --- a/kernel/rcu/tree_plugin.h
> > +++ b/kernel/rcu/tree_plugin.h
> > @@ -661,7 +661,7 @@ static void rcu_read_unlock_special(struct task_struct *t)
> > expboost && !rdp->defer_qs_iw_pending && cpu_online(rdp->cpu)) {
> > // Get scheduler to re-evaluate and call hooks.
> > // If !IRQ_WORK, FQS scan will eventually IPI.
> > - init_irq_work(&rdp->defer_qs_iw, rcu_preempt_deferred_qs_handler);
> > + rdp->defer_qs_iw =
> > +IRQ_WORK_INIT_HARD(rcu_preempt_deferred_qs_handler);
> > rdp->defer_qs_iw_pending = true;
> > irq_work_queue_on(&rdp->defer_qs_iw, rdp->cpu);
> > }
> >
> >OK, in theory, rcu_read_unlock() could get to this point before all of the various kthreads were spawned. In practice, the next time that the boot CPU went idle, the end of the quiescent state would be noticed.
>
> Through my understanding, use irq_work in order to make the quiescent state be noticed earlier,
> Because the irq_work execute in interrupt, this irq_work can be executed in time, but In RT kernel
> The irq_work is put into the kthread for execution, when it is executed, it is affected by the scheduling delay.
> Is there anything I missed?

Yes, in that I am not seeing any actual data showing that this fix really
makes things better. Please in mind that IRQ_WORK_INIT_HARD does have
performance disadvantages of its own. So although I agree with your
words saying that IRQ_WORK_INIT_HARD -might- be helpful, those words
are not sufficient.

So, can you show a statistically significant benefit on a real system?
For example, by measuring the time required for a expedited grace period
to complete? That would argue for this change, though it would need to be
conditional, so that systems that don't care that much about the latency
of expedited RCU grace periods don't need to pay the IRQ_WORK_INIT_HARD
performance penalties. Or you would need to demonstrate that these
performance penalties don't cause problems. (But such a demonstration
is not easy given the wide variety of systems that Linux supports.)

Now, I could imagine that the current code could cause problems during
boot on CONFIG_PREEMPT_RT kernels. But, believe me, I can imagine all
sorts of horrible problems. But we should fix those that happen not
just in my imagination, but also in the real world. ;-)

So if you can make such a problem happen in real life, then I would be
happy to take a patch that fixed this on CONFIG_PREEMPT_RT but kept the
current code otherwise.

Thanx, Paul

> Thanks
> Zqiang
>
> >
> >Or has this been failing in some other manner? If so, please let me know the exact sequence of events.
> >
> > Thanx, Paul

2022-04-01 12:58:29

by Zqiang

[permalink] [raw]
Subject: RE: [PATCH] rcu: Put the irq work into hard interrupt context for execution


On Wed, Mar 30, 2022 at 10:47:05PM +0000, Zhang, Qiang1 wrote:
> On Wed, Mar 30, 2022 at 02:00:12PM +0800, Zqiang wrote:
> > In PREEMPT_RT kernel, if irq work flags is not set, it will be
> > executed in per-CPU irq_work kthreads. set IRQ_WORK_HARD_IRQ flags
> > to irq work, put it in the context of hard interrupt execution,
> > accelerate scheduler to re-evaluate.
> >
> > Signed-off-by: Zqiang <[email protected]>
> > ---
> > kernel/rcu/tree.c | 2 +-
> > kernel/rcu/tree_plugin.h | 2 +-
> > 2 files changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index
> > e2ffbeceba69..a69587773a85 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -678,7 +678,7 @@ static void late_wakeup_func(struct irq_work
> > *work) }
> >
> > static DEFINE_PER_CPU(struct irq_work, late_wakeup_work) =
> > - IRQ_WORK_INIT(late_wakeup_func);
> > + IRQ_WORK_INIT_HARD(late_wakeup_func);
>
> >This is used only by rcu_irq_work_resched(), which is invoked only by rcu_user_enter(), which is never invoked until userspace is enabled, by which time all of the various kthreads will have been spawned, correct?
> >
> >Either way, please show me the exact sequence of events that lead to a problem with the current IRQ_WORK_INIT().
> >
> > /*
> > * If either:
> > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > index 3037c2536e1f..cf7bd28af8ef 100644
> > --- a/kernel/rcu/tree_plugin.h
> > +++ b/kernel/rcu/tree_plugin.h
> > @@ -661,7 +661,7 @@ static void rcu_read_unlock_special(struct task_struct *t)
> > expboost && !rdp->defer_qs_iw_pending && cpu_online(rdp->cpu)) {
> > // Get scheduler to re-evaluate and call hooks.
> > // If !IRQ_WORK, FQS scan will eventually IPI.
> > - init_irq_work(&rdp->defer_qs_iw, rcu_preempt_deferred_qs_handler);
> > + rdp->defer_qs_iw =
> > +IRQ_WORK_INIT_HARD(rcu_preempt_deferred_qs_handler);
> > rdp->defer_qs_iw_pending = true;
> > irq_work_queue_on(&rdp->defer_qs_iw, rdp->cpu);
> > }
> >
> >OK, in theory, rcu_read_unlock() could get to this point before all of the various kthreads were spawned. In practice, the next time that the boot CPU went idle, the end of the quiescent state would be noticed.
>
> Through my understanding, use irq_work in order to make the quiescent
> state be noticed earlier, Because the irq_work execute in interrupt,
> this irq_work can be executed in time, but In RT kernel The irq_work is put into the kthread for execution, when it is executed, it is affected by the scheduling delay.
> Is there anything I missed?

>Yes, in that I am not seeing any actual data showing that this fix really makes things better. Please in mind that IRQ_WORK_INIT_HARD does have performance disadvantages of its own. So although I agree with your words saying that IRQ_WORK_INIT_HARD -might- be helpful, those words are not sufficient.
>
>So, can you show a statistically significant benefit on a real system?
>For example, by measuring the time required for a expedited grace period to complete? That would argue for this change, though it would need to be conditional, so that systems that don't care that much about the latency of expedited RCU grace periods don't need to pay the IRQ_WORK_INIT_HARD performance penalties. Or you would need to demonstrate that these performance penalties don't cause problems. (But such a demonstration is not easy given the wide variety of systems that Linux supports.)
>
>Now, I could imagine that the current code could cause problems during boot on CONFIG_PREEMPT_RT kernels. But, believe me, I can imagine all sorts of horrible problems. But we should fix those that happen not just in my imagination, but also in the real world. ;-)

Thanks, agree. I'll test it according to your suggestion.

>
>So if you can make such a problem happen in real life, then I would be happy to take a patch that fixed this on CONFIG_PREEMPT_RT but kept the current code otherwise.
>
> Thanx, Paul

> Thanks
> Zqiang
>
> >
> >Or has this been failing in some other manner? If so, please let me know the exact sequence of events.
> >
> > Thanx, Paul

2022-04-02 15:37:07

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] rcu: Put the irq work into hard interrupt context for execution

On Fri, Apr 01, 2022 at 01:55:51AM +0000, Zhang, Qiang1 wrote:
>
> On Wed, Mar 30, 2022 at 10:47:05PM +0000, Zhang, Qiang1 wrote:
> > On Wed, Mar 30, 2022 at 02:00:12PM +0800, Zqiang wrote:
> > > In PREEMPT_RT kernel, if irq work flags is not set, it will be
> > > executed in per-CPU irq_work kthreads. set IRQ_WORK_HARD_IRQ flags
> > > to irq work, put it in the context of hard interrupt execution,
> > > accelerate scheduler to re-evaluate.
> > >
> > > Signed-off-by: Zqiang <[email protected]>
> > > ---
> > > kernel/rcu/tree.c | 2 +-
> > > kernel/rcu/tree_plugin.h | 2 +-
> > > 2 files changed, 2 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index
> > > e2ffbeceba69..a69587773a85 100644
> > > --- a/kernel/rcu/tree.c
> > > +++ b/kernel/rcu/tree.c
> > > @@ -678,7 +678,7 @@ static void late_wakeup_func(struct irq_work
> > > *work) }
> > >
> > > static DEFINE_PER_CPU(struct irq_work, late_wakeup_work) =
> > > - IRQ_WORK_INIT(late_wakeup_func);
> > > + IRQ_WORK_INIT_HARD(late_wakeup_func);
> >
> > >This is used only by rcu_irq_work_resched(), which is invoked only by rcu_user_enter(), which is never invoked until userspace is enabled, by which time all of the various kthreads will have been spawned, correct?
> > >
> > >Either way, please show me the exact sequence of events that lead to a problem with the current IRQ_WORK_INIT().
> > >
> > > /*
> > > * If either:
> > > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > > index 3037c2536e1f..cf7bd28af8ef 100644
> > > --- a/kernel/rcu/tree_plugin.h
> > > +++ b/kernel/rcu/tree_plugin.h
> > > @@ -661,7 +661,7 @@ static void rcu_read_unlock_special(struct task_struct *t)
> > > expboost && !rdp->defer_qs_iw_pending && cpu_online(rdp->cpu)) {
> > > // Get scheduler to re-evaluate and call hooks.
> > > // If !IRQ_WORK, FQS scan will eventually IPI.
> > > - init_irq_work(&rdp->defer_qs_iw, rcu_preempt_deferred_qs_handler);
> > > + rdp->defer_qs_iw =
> > > +IRQ_WORK_INIT_HARD(rcu_preempt_deferred_qs_handler);
> > > rdp->defer_qs_iw_pending = true;
> > > irq_work_queue_on(&rdp->defer_qs_iw, rdp->cpu);
> > > }
> > >
> > >OK, in theory, rcu_read_unlock() could get to this point before all of the various kthreads were spawned. In practice, the next time that the boot CPU went idle, the end of the quiescent state would be noticed.
> >
> > Through my understanding, use irq_work in order to make the quiescent
> > state be noticed earlier, Because the irq_work execute in interrupt,
> > this irq_work can be executed in time, but In RT kernel The irq_work is put into the kthread for execution, when it is executed, it is affected by the scheduling delay.
> > Is there anything I missed?
>
> >Yes, in that I am not seeing any actual data showing that this fix really makes things better. Please in mind that IRQ_WORK_INIT_HARD does have performance disadvantages of its own. So although I agree with your words saying that IRQ_WORK_INIT_HARD -might- be helpful, those words are not sufficient.
> >
> >So, can you show a statistically significant benefit on a real system?
> >For example, by measuring the time required for a expedited grace period to complete? That would argue for this change, though it would need to be conditional, so that systems that don't care that much about the latency of expedited RCU grace periods don't need to pay the IRQ_WORK_INIT_HARD performance penalties. Or you would need to demonstrate that these performance penalties don't cause problems. (But such a demonstration is not easy given the wide variety of systems that Linux supports.)
> >
> >Now, I could imagine that the current code could cause problems during boot on CONFIG_PREEMPT_RT kernels. But, believe me, I can imagine all sorts of horrible problems. But we should fix those that happen not just in my imagination, but also in the real world. ;-)
>
> Thanks, agree. I'll test it according to your suggestion.

Very good! I am looking forward to seeing what you come up with.

Thanx, Paul

> >So if you can make such a problem happen in real life, then I would be happy to take a patch that fixed this on CONFIG_PREEMPT_RT but kept the current code otherwise.
> >
> > Thanx, Paul
>
> > Thanks
> > Zqiang
> >
> > >
> > >Or has this been failing in some other manner? If so, please let me know the exact sequence of events.
> > >
> > > Thanx, Paul

2022-04-04 23:51:48

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] rcu: Put the irq work into hard interrupt context for execution

On Sat, Apr 02, 2022 at 06:29:04AM +0000, Zhang, Qiang1 wrote:
>
> On Fri, Apr 01, 2022 at 01:55:51AM +0000, Zhang, Qiang1 wrote:
> >
> > On Wed, Mar 30, 2022 at 10:47:05PM +0000, Zhang, Qiang1 wrote:
> > > On Wed, Mar 30, 2022 at 02:00:12PM +0800, Zqiang wrote:
> > > > In PREEMPT_RT kernel, if irq work flags is not set, it will be
> > > > executed in per-CPU irq_work kthreads. set IRQ_WORK_HARD_IRQ flags
> > > > to irq work, put it in the context of hard interrupt execution,
> > > > accelerate scheduler to re-evaluate.
> > > >
> > > > Signed-off-by: Zqiang <[email protected]>
> > > > ---
> > > > kernel/rcu/tree.c | 2 +-
> > > > kernel/rcu/tree_plugin.h | 2 +-
> > > > 2 files changed, 2 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index
> > > > e2ffbeceba69..a69587773a85 100644
> > > > --- a/kernel/rcu/tree.c
> > > > +++ b/kernel/rcu/tree.c
> > > > @@ -678,7 +678,7 @@ static void late_wakeup_func(struct irq_work
> > > > *work) }
> > > >
> > > > static DEFINE_PER_CPU(struct irq_work, late_wakeup_work) =
> > > > - IRQ_WORK_INIT(late_wakeup_func);
> > > > + IRQ_WORK_INIT_HARD(late_wakeup_func);
> > >
> > > >This is used only by rcu_irq_work_resched(), which is invoked only by rcu_user_enter(), which is never invoked until userspace is enabled, by which time all of the various kthreads will have been spawned, correct?
> > > >
> > > >Either way, please show me the exact sequence of events that lead to a problem with the current IRQ_WORK_INIT().
> > > >
> > > > /*
> > > > * If either:
> > > > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > > > index 3037c2536e1f..cf7bd28af8ef 100644
> > > > --- a/kernel/rcu/tree_plugin.h
> > > > +++ b/kernel/rcu/tree_plugin.h
> > > > @@ -661,7 +661,7 @@ static void rcu_read_unlock_special(struct task_struct *t)
> > > > expboost && !rdp->defer_qs_iw_pending && cpu_online(rdp->cpu)) {
> > > > // Get scheduler to re-evaluate and call hooks.
> > > > // If !IRQ_WORK, FQS scan will eventually IPI.
> > > > - init_irq_work(&rdp->defer_qs_iw, rcu_preempt_deferred_qs_handler);
> > > > + rdp->defer_qs_iw =
> > > > +IRQ_WORK_INIT_HARD(rcu_preempt_deferred_qs_handler);
> > > > rdp->defer_qs_iw_pending = true;
> > > > irq_work_queue_on(&rdp->defer_qs_iw, rdp->cpu);
> > > > }
> > > >
> > > >OK, in theory, rcu_read_unlock() could get to this point before all of the various kthreads were spawned. In practice, the next time that the boot CPU went idle, the end of the quiescent state would be noticed.
> > >
> > > Through my understanding, use irq_work in order to make the
> > > quiescent state be noticed earlier, Because the irq_work execute in
> > > interrupt, this irq_work can be executed in time, but In RT kernel The irq_work is put into the kthread for execution, when it is executed, it is affected by the scheduling delay.
> > > Is there anything I missed?
> >
> > >Yes, in that I am not seeing any actual data showing that this fix really makes things better. Please in mind that IRQ_WORK_INIT_HARD does have performance disadvantages of its own. So although I agree with your words saying that IRQ_WORK_INIT_HARD -might- be helpful, those words are not sufficient.
> > >
> > >So, can you show a statistically significant benefit on a real system?
> > >For example, by measuring the time required for a expedited grace
> > >period to complete? That would argue for this change, though it
> > >would need to be conditional, so that systems that don't care that
> > >much about the latency of expedited RCU grace periods don't need to
> > >pay the IRQ_WORK_INIT_HARD performance penalties. Or you would need
> > >to demonstrate that these performance penalties don't cause problems.
> > >(But such a demonstration is not easy given the wide variety of
> > >systems that Linux supports.)
> > >
> > >Now, I could imagine that the current code could cause problems
> > >during boot on CONFIG_PREEMPT_RT kernels. But, believe me, I can
> > >imagine all sorts of horrible problems. But we should fix those that
> > >happen not just in my imagination, but also in the real world. ;-)
> >
> > Thanks, agree. I'll test it according to your suggestion.
> >
> >Very good! I am looking forward to seeing what you come up with.
>
> Hello, Paul
>
> I use v5.17.1-rt16 PREEMPT_RT kernel and enable CONFIG_RCU_STRICT_GRACE_PERIOD.
> When boot, find a lot of 'defer_qs_iw' irq-work is handled in the 'irq_work/0', the 'irq_work/0' is RT fifo
> Kthreads, it occupies boot CPU for a long time, the kernel_init kthread cannot get the boot CPU to start
> Up other CPU, the boot process occurs hang.
> Replace init_irq_work (&rdp->defer_qs_iw) with IRQ_WORK_INIT_HARD can fix.

Very good on making it happen!

But please make the fix use IRQ_WORK_INIT_HARD only for PREEMPT_RT
kernels. That way the performance of other systems won't be degraded
by this fix for a PREEMPT_RT bug.

Thanx, Paul

> Thanks
> Zqiang
>
> >
> > Thanx, Paul
>
> > >So if you can make such a problem happen in real life, then I would be happy to take a patch that fixed this on CONFIG_PREEMPT_RT but kept the current code otherwise.
> > >
> > > Thanx, Paul
> >
> > > Thanks
> > > Zqiang
> > >
> > > >
> > > >Or has this been failing in some other manner? If so, please let me know the exact sequence of events.
> > > >
> > > > Thanx, Paul

2022-04-05 00:40:43

by Zqiang

[permalink] [raw]
Subject: RE: [PATCH] rcu: Put the irq work into hard interrupt context for execution


On Fri, Apr 01, 2022 at 01:55:51AM +0000, Zhang, Qiang1 wrote:
>
> On Wed, Mar 30, 2022 at 10:47:05PM +0000, Zhang, Qiang1 wrote:
> > On Wed, Mar 30, 2022 at 02:00:12PM +0800, Zqiang wrote:
> > > In PREEMPT_RT kernel, if irq work flags is not set, it will be
> > > executed in per-CPU irq_work kthreads. set IRQ_WORK_HARD_IRQ flags
> > > to irq work, put it in the context of hard interrupt execution,
> > > accelerate scheduler to re-evaluate.
> > >
> > > Signed-off-by: Zqiang <[email protected]>
> > > ---
> > > kernel/rcu/tree.c | 2 +-
> > > kernel/rcu/tree_plugin.h | 2 +-
> > > 2 files changed, 2 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index
> > > e2ffbeceba69..a69587773a85 100644
> > > --- a/kernel/rcu/tree.c
> > > +++ b/kernel/rcu/tree.c
> > > @@ -678,7 +678,7 @@ static void late_wakeup_func(struct irq_work
> > > *work) }
> > >
> > > static DEFINE_PER_CPU(struct irq_work, late_wakeup_work) =
> > > - IRQ_WORK_INIT(late_wakeup_func);
> > > + IRQ_WORK_INIT_HARD(late_wakeup_func);
> >
> > >This is used only by rcu_irq_work_resched(), which is invoked only by rcu_user_enter(), which is never invoked until userspace is enabled, by which time all of the various kthreads will have been spawned, correct?
> > >
> > >Either way, please show me the exact sequence of events that lead to a problem with the current IRQ_WORK_INIT().
> > >
> > > /*
> > > * If either:
> > > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > > index 3037c2536e1f..cf7bd28af8ef 100644
> > > --- a/kernel/rcu/tree_plugin.h
> > > +++ b/kernel/rcu/tree_plugin.h
> > > @@ -661,7 +661,7 @@ static void rcu_read_unlock_special(struct task_struct *t)
> > > expboost && !rdp->defer_qs_iw_pending && cpu_online(rdp->cpu)) {
> > > // Get scheduler to re-evaluate and call hooks.
> > > // If !IRQ_WORK, FQS scan will eventually IPI.
> > > - init_irq_work(&rdp->defer_qs_iw, rcu_preempt_deferred_qs_handler);
> > > + rdp->defer_qs_iw =
> > > +IRQ_WORK_INIT_HARD(rcu_preempt_deferred_qs_handler);
> > > rdp->defer_qs_iw_pending = true;
> > > irq_work_queue_on(&rdp->defer_qs_iw, rdp->cpu);
> > > }
> > >
> > >OK, in theory, rcu_read_unlock() could get to this point before all of the various kthreads were spawned. In practice, the next time that the boot CPU went idle, the end of the quiescent state would be noticed.
> >
> > Through my understanding, use irq_work in order to make the
> > quiescent state be noticed earlier, Because the irq_work execute in
> > interrupt, this irq_work can be executed in time, but In RT kernel The irq_work is put into the kthread for execution, when it is executed, it is affected by the scheduling delay.
> > Is there anything I missed?
>
> >Yes, in that I am not seeing any actual data showing that this fix really makes things better. Please in mind that IRQ_WORK_INIT_HARD does have performance disadvantages of its own. So although I agree with your words saying that IRQ_WORK_INIT_HARD -might- be helpful, those words are not sufficient.
> >
> >So, can you show a statistically significant benefit on a real system?
> >For example, by measuring the time required for a expedited grace
> >period to complete? That would argue for this change, though it
> >would need to be conditional, so that systems that don't care that
> >much about the latency of expedited RCU grace periods don't need to
> >pay the IRQ_WORK_INIT_HARD performance penalties. Or you would need
> >to demonstrate that these performance penalties don't cause problems.
> >(But such a demonstration is not easy given the wide variety of
> >systems that Linux supports.)
> >
> >Now, I could imagine that the current code could cause problems
> >during boot on CONFIG_PREEMPT_RT kernels. But, believe me, I can
> >imagine all sorts of horrible problems. But we should fix those that
> >happen not just in my imagination, but also in the real world. ;-)
>
> Thanks, agree. I'll test it according to your suggestion.
>
>Very good! I am looking forward to seeing what you come up with.

Hello, Paul

I use v5.17.1-rt16 PREEMPT_RT kernel and enable CONFIG_RCU_STRICT_GRACE_PERIOD.
When boot, find a lot of 'defer_qs_iw' irq-work is handled in the 'irq_work/0', the 'irq_work/0' is RT fifo
Kthreads, it occupies boot CPU for a long time, the kernel_init kthread cannot get the boot CPU to start
Up other CPU, the boot process occurs hang.
Replace init_irq_work (&rdp->defer_qs_iw) with IRQ_WORK_INIT_HARD can fix.

Thanks
Zqiang

>
> Thanx, Paul

> >So if you can make such a problem happen in real life, then I would be happy to take a patch that fixed this on CONFIG_PREEMPT_RT but kept the current code otherwise.
> >
> > Thanx, Paul
>
> > Thanks
> > Zqiang
> >
> > >
> > >Or has this been failing in some other manner? If so, please let me know the exact sequence of events.
> > >
> > > Thanx, Paul