2020-08-06 04:21:25

by Jiafei Pan

[permalink] [raw]
Subject: [PATCH] softirq: add irq off checking for __raise_softirq_irqoff

__raise_softirq_irqoff will update per-CPU mask of pending softirqs,
it need to be called in irq disabled context in order to keep it atomic
operation, otherwise it will be interrupted by hardware interrupt,
and per-CPU softirqs pending mask will be corrupted, the result is
there will be unexpected issue, for example hrtimer soft irq will
be losed and soft hrtimer will never be expire and handled.

Adding irqs disabled checking here to provide warning in irqs enabled
context.

Signed-off-by: Jiafei Pan <[email protected]>
---
kernel/softirq.c | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/kernel/softirq.c b/kernel/softirq.c
index bf88d7f62433..11f61e54a3ae 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -481,6 +481,11 @@ void raise_softirq(unsigned int nr)

void __raise_softirq_irqoff(unsigned int nr)
{
+ /* This function can only be called in irq disabled context,
+ * otherwise or_softirq_pending will be interrupted by hardware
+ * interrupt, so that there will be unexpected issue.
+ */
+ WARN_ON_ONCE(!irqs_disabled());
trace_softirq_raise(nr);
or_softirq_pending(1UL << nr);
}
--
2.17.1


2020-08-13 03:07:55

by Jiafei Pan

[permalink] [raw]
Subject: RE: [PATCH] softirq: add irq off checking for __raise_softirq_irqoff

Any comments? Thanks.

@Steven Rostedt, I thinks irq off checking is necessary especially for Preempt-RT kernel, because some context may be changed from irq off to irq on when enable Preempt RT, I once met a issue that hrtimer soft irq is lost when enabled Preempt RT, finally I found napi_schedule_irqoff is called in hardware interrupt handler, there maybe no issue for non RT kernel, but for Preempt RT, interrupt is threaded, so irq is on in interrupt handler, the result is __raise_softirq_irqoff is called in irq on context, so that per-CPU softirq masking is corrupted because of the process of updating of soft irq masking is interrupted and not a atomic operation , and then caused hrtimer soft irq is lost. So I think adding irq status checking in __raise_softirq_irqoff can report such issue directly and help us to find the root cause of such issue.

I know that there may be performance impaction to add extra checking here, if it is the case, how about to include it in some debug configuration items? Such as CONFIG_DEBUG_PREEMPT or other debug items?

Best Regards,
Jiafei.

-----Original Message-----
From: Jiafei Pan <[email protected]>
Sent: Thursday, August 6, 2020 12:07 PM
To: [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]
Cc: [email protected]; [email protected]; Jiafei Pan <[email protected]>; Leo Li <[email protected]>; Vladimir Oltean <[email protected]>; Jiafei Pan <[email protected]>
Subject: [PATCH] softirq: add irq off checking for __raise_softirq_irqoff

__raise_softirq_irqoff will update per-CPU mask of pending softirqs, it need to be called in irq disabled context in order to keep it atomic operation, otherwise it will be interrupted by hardware interrupt, and per-CPU softirqs pending mask will be corrupted, the result is there will be unexpected issue, for example hrtimer soft irq will be losed and soft hrtimer will never be expire and handled.

Adding irqs disabled checking here to provide warning in irqs enabled context.

Signed-off-by: Jiafei Pan <[email protected]>
---
kernel/softirq.c | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/kernel/softirq.c b/kernel/softirq.c index bf88d7f62433..11f61e54a3ae 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -481,6 +481,11 @@ void raise_softirq(unsigned int nr)

void __raise_softirq_irqoff(unsigned int nr) {
+ /* This function can only be called in irq disabled context,
+ * otherwise or_softirq_pending will be interrupted by hardware
+ * interrupt, so that there will be unexpected issue.
+ */
+ WARN_ON_ONCE(!irqs_disabled());
trace_softirq_raise(nr);
or_softirq_pending(1UL << nr);
}
--
2.17.1

2020-08-13 05:59:41

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] softirq: add irq off checking for __raise_softirq_irqoff

On Thu, Aug 06, 2020 at 12:07:29PM +0800, Jiafei Pan wrote:
> __raise_softirq_irqoff will update per-CPU mask of pending softirqs,
> it need to be called in irq disabled context in order to keep it atomic
> operation, otherwise it will be interrupted by hardware interrupt,
> and per-CPU softirqs pending mask will be corrupted, the result is
> there will be unexpected issue, for example hrtimer soft irq will
> be losed and soft hrtimer will never be expire and handled.
>
> Adding irqs disabled checking here to provide warning in irqs enabled
> context.
>
> Signed-off-by: Jiafei Pan <[email protected]>
> ---
> kernel/softirq.c | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/kernel/softirq.c b/kernel/softirq.c
> index bf88d7f62433..11f61e54a3ae 100644
> --- a/kernel/softirq.c
> +++ b/kernel/softirq.c
> @@ -481,6 +481,11 @@ void raise_softirq(unsigned int nr)
>
> void __raise_softirq_irqoff(unsigned int nr)
> {
> + /* This function can only be called in irq disabled context,
> + * otherwise or_softirq_pending will be interrupted by hardware
> + * interrupt, so that there will be unexpected issue.
> + */

Comment style is wrong, also I'm not sure the comment is really
helpfull.

> + WARN_ON_ONCE(!irqs_disabled());

lockdep_assert_irqs_disabled();

> trace_softirq_raise(nr);
> or_softirq_pending(1UL << nr);
> }

2020-08-13 07:34:32

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] softirq: add irq off checking for __raise_softirq_irqoff

Jiafei Pan <[email protected]> writes:
> __raise_softirq_irqoff will update per-CPU mask of pending softirqs,

Please write __raise_softirq_irqoff() so it's clear that this is about a
function.

> void __raise_softirq_irqoff(unsigned int nr)
> {
> + /* This function can only be called in irq disabled context,
> + * otherwise or_softirq_pending will be interrupted by hardware
> + * interrupt, so that there will be unexpected issue.
> + */
> + WARN_ON_ONCE(!irqs_disabled());

Please use lockdep_assert_irqs_disabled() instead.

Thanks,

tglx

2020-08-13 14:59:16

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH] softirq: add irq off checking for __raise_softirq_irqoff

On Thu, 13 Aug 2020 03:03:46 +0000
Jiafei Pan <[email protected]> wrote:

> Any comments? Thanks.
>
> @Steven Rostedt, I thinks irq off checking is necessary especially

This is probably more for Thomas Gleixner.

> for Preempt-RT kernel, because some context may be changed from irq
> off to irq on when enable Preempt RT, I once met a issue that hrtimer
> soft irq is lost when enabled Preempt RT, finally I found
> napi_schedule_irqoff is called in hardware interrupt handler, there
> maybe no issue for non RT kernel, but for Preempt RT, interrupt is
> threaded, so irq is on in interrupt handler, the result is
> __raise_softirq_irqoff is called in irq on context, so that per-CPU
> softirq masking is corrupted because of the process of updating of
> soft irq masking is interrupted and not a atomic operation , and then
> caused hrtimer soft irq is lost. So I think adding irq status
> checking in __raise_softirq_irqoff can report such issue directly and
> help us to find the root cause of such issue.
>
> I know that there may be performance impaction to add extra checking
> here, if it is the case, how about to include it in some debug
> configuration items? Such as CONFIG_DEBUG_PREEMPT or other debug
> items?
>


> Best Regards,
> Jiafei.
>
> -----Original Message-----
> From: Jiafei Pan <[email protected]>
> Sent: Thursday, August 6, 2020 12:07 PM
> To: [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]
> Cc: [email protected]; [email protected]; Jiafei Pan <[email protected]>; Leo Li <[email protected]>; Vladimir Oltean <[email protected]>; Jiafei Pan <[email protected]>
> Subject: [PATCH] softirq: add irq off checking for __raise_softirq_irqoff
>
> __raise_softirq_irqoff will update per-CPU mask of pending softirqs, it need to be called in irq disabled context in order to keep it atomic operation, otherwise it will be interrupted by hardware interrupt, and per-CPU softirqs pending mask will be corrupted, the result is there will be unexpected issue, for example hrtimer soft irq will be losed and soft hrtimer will never be expire and handled.

Please wrap your change logs.

>
> Adding irqs disabled checking here to provide warning in irqs enabled context.
>
> Signed-off-by: Jiafei Pan <[email protected]>
> ---
> kernel/softirq.c | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/kernel/softirq.c b/kernel/softirq.c index bf88d7f62433..11f61e54a3ae 100644
> --- a/kernel/softirq.c
> +++ b/kernel/softirq.c
> @@ -481,6 +481,11 @@ void raise_softirq(unsigned int nr)
>
> void __raise_softirq_irqoff(unsigned int nr) {
> + /* This function can only be called in irq disabled context,
> + * otherwise or_softirq_pending will be interrupted by hardware
> + * interrupt, so that there will be unexpected issue.
> + */
> + WARN_ON_ONCE(!irqs_disabled());

Perhaps: lockdep_assert_irqs_disabled() is more appropriate, and
doesn't add extra overhead on production systems.

-- Steve


> trace_softirq_raise(nr);
> or_softirq_pending(1UL << nr);
> }
> --
> 2.17.1

2020-08-14 02:22:52

by Jiafei Pan

[permalink] [raw]
Subject: RE: [EXT] Re: [PATCH] softirq: add irq off checking for __raise_softirq_irqoff


On Thu, 13 Aug 2020 03:03:46 +0000
Jiafei Pan <[email protected]> wrote:

> Any comments? Thanks.
>
> @Steven Rostedt, I thinks irq off checking is necessary especially

> This is probably more for Thomas Gleixner.
Thanks Steven.
@Thomas Gleixner, would you please review the patch? thanks.
Jiafei.
> for Preempt-RT kernel, because some context may be changed from irq
> off to irq on when enable Preempt RT, I once met a issue that hrtimer
> soft irq is lost when enabled Preempt RT, finally I found
> napi_schedule_irqoff is called in hardware interrupt handler, there
> maybe no issue for non RT kernel, but for Preempt RT, interrupt is
> threaded, so irq is on in interrupt handler, the result is
> __raise_softirq_irqoff is called in irq on context, so that per-CPU
> softirq masking is corrupted because of the process of updating of
> soft irq masking is interrupted and not a atomic operation , and then
> caused hrtimer soft irq is lost. So I think adding irq status checking
> in __raise_softirq_irqoff can report such issue directly and help us
> to find the root cause of such issue.
>
> I know that there may be performance impaction to add extra checking
> here, if it is the case, how about to include it in some debug
> configuration items? Such as CONFIG_DEBUG_PREEMPT or other debug
> items?
>


> Best Regards,
> Jiafei.
>
> -----Original Message-----
> From: Jiafei Pan <[email protected]>
> Sent: Thursday, August 6, 2020 12:07 PM
> To: [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected]
> Cc: [email protected]; [email protected];
> Jiafei Pan <[email protected]>; Leo Li <[email protected]>; Vladimir
> Oltean <[email protected]>; Jiafei Pan <[email protected]>
> Subject: [PATCH] softirq: add irq off checking for
> __raise_softirq_irqoff
>
> __raise_softirq_irqoff will update per-CPU mask of pending softirqs, it need to be called in irq disabled context in order to keep it atomic operation, otherwise it will be interrupted by hardware interrupt, and per-CPU softirqs pending mask will be corrupted, the result is there will be unexpected issue, for example hrtimer soft irq will be losed and soft hrtimer will never be expire and handled.

Please wrap your change logs.

>
> Adding irqs disabled checking here to provide warning in irqs enabled context.
>
> Signed-off-by: Jiafei Pan <[email protected]>
> ---
> kernel/softirq.c | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/kernel/softirq.c b/kernel/softirq.c index
> bf88d7f62433..11f61e54a3ae 100644
> --- a/kernel/softirq.c
> +++ b/kernel/softirq.c
> @@ -481,6 +481,11 @@ void raise_softirq(unsigned int nr)
>
> void __raise_softirq_irqoff(unsigned int nr) {
> + /* This function can only be called in irq disabled context,
> + * otherwise or_softirq_pending will be interrupted by hardware
> + * interrupt, so that there will be unexpected issue.
> + */
> + WARN_ON_ONCE(!irqs_disabled());

Perhaps: lockdep_assert_irqs_disabled() is more appropriate, and doesn't add extra overhead on production systems.

-- Steve


> trace_softirq_raise(nr);
> or_softirq_pending(1UL << nr);
> }
> --
> 2.17.1

2020-08-14 02:25:38

by Steven Rostedt

[permalink] [raw]
Subject: Re: [EXT] Re: [PATCH] softirq: add irq off checking for __raise_softirq_irqoff

On Fri, 14 Aug 2020 02:21:25 +0000
Jiafei Pan <[email protected]> wrote:

> > This is probably more for Thomas Gleixner.
> Thanks Steven.
> @Thomas Gleixner, would you please review the patch? thanks.
> Jiafei.

I believe he already did.

-- Steve

2020-08-14 03:31:28

by Jiafei Pan

[permalink] [raw]
Subject: RE: [EXT] Re: [PATCH] softirq: add irq off checking for __raise_softirq_irqoff

> From: Peter Zijlstra <[email protected]>
> Sent: Thursday, August 13, 2020 1:58 PM
>
> On Thu, Aug 06, 2020 at 12:07:29PM +0800, Jiafei Pan wrote:
> > __raise_softirq_irqoff will update per-CPU mask of pending softirqs,
> > it need to be called in irq disabled context in order to keep it
> > atomic operation, otherwise it will be interrupted by hardware
> > interrupt, and per-CPU softirqs pending mask will be corrupted, the
> > result is there will be unexpected issue, for example hrtimer soft irq
> > will be losed and soft hrtimer will never be expire and handled.
> >
> > Adding irqs disabled checking here to provide warning in irqs enabled
> > context.
> >
> > Signed-off-by: Jiafei Pan <[email protected]>
> > ---
> > kernel/softirq.c | 5 +++++
> > 1 file changed, 5 insertions(+)
> >
> > diff --git a/kernel/softirq.c b/kernel/softirq.c index
> > bf88d7f62433..11f61e54a3ae 100644
> > --- a/kernel/softirq.c
> > +++ b/kernel/softirq.c
> > @@ -481,6 +481,11 @@ void raise_softirq(unsigned int nr)
> >
> > void __raise_softirq_irqoff(unsigned int nr) {
> > + /* This function can only be called in irq disabled context,
> > + * otherwise or_softirq_pending will be interrupted by hardware
> > + * interrupt, so that there will be unexpected issue.
> > + */
>
> Comment style is wrong, also I'm not sure the comment is really helpfull.
[Jiafei Pan] Thanks for your comments, yes, function name already indicate the function
Should be called in irq off context, will remove the comment in next version.
>
> > + WARN_ON_ONCE(!irqs_disabled());
>
> lockdep_assert_irqs_disabled();
>
> > trace_softirq_raise(nr);
> > or_softirq_pending(1UL << nr);
> > }

2020-08-14 04:18:22

by Jiafei Pan

[permalink] [raw]
Subject: RE: [EXT] Re: [PATCH] softirq: add irq off checking for __raise_softirq_irqoff

> From: Steven Rostedt <[email protected]>
> Sent: Thursday, August 13, 2020 10:57 PM
>
> On Thu, 13 Aug 2020 03:03:46 +0000
> Jiafei Pan <[email protected]> wrote:
>
> > Any comments? Thanks.
> >
> > @Steven Rostedt, I thinks irq off checking is necessary especially
>
> This is probably more for Thomas Gleixner.
>
> > for Preempt-RT kernel, because some context may be changed from irq
> > off to irq on when enable Preempt RT, I once met a issue that hrtimer
> > soft irq is lost when enabled Preempt RT, finally I found
> > napi_schedule_irqoff is called in hardware interrupt handler, there
> > maybe no issue for non RT kernel, but for Preempt RT, interrupt is
> > threaded, so irq is on in interrupt handler, the result is
> > __raise_softirq_irqoff is called in irq on context, so that per-CPU
> > softirq masking is corrupted because of the process of updating of
> > soft irq masking is interrupted and not a atomic operation , and then
> > caused hrtimer soft irq is lost. So I think adding irq status checking
> > in __raise_softirq_irqoff can report such issue directly and help us
> > to find the root cause of such issue.
> >
> > I know that there may be performance impaction to add extra checking
> > here, if it is the case, how about to include it in some debug
> > configuration items? Such as CONFIG_DEBUG_PREEMPT or other debug
> > items?
> >
>
>
> > Best Regards,
> > Jiafei.
> >
> > -----Original Message-----
> > From: Jiafei Pan <[email protected]>
> > Sent: Thursday, August 6, 2020 12:07 PM
> > To: [email protected]; [email protected]; [email protected];
> > [email protected]; [email protected]; [email protected]
> > Cc: [email protected]; [email protected];
> > Jiafei Pan <[email protected]>; Leo Li <[email protected]>; Vladimir
> > Oltean <[email protected]>; Jiafei Pan <[email protected]>
> > Subject: [PATCH] softirq: add irq off checking for
> > __raise_softirq_irqoff
> >
> > __raise_softirq_irqoff will update per-CPU mask of pending softirqs, it need
> to be called in irq disabled context in order to keep it atomic operation,
> otherwise it will be interrupted by hardware interrupt, and per-CPU softirqs
> pending mask will be corrupted, the result is there will be unexpected issue,
> for example hrtimer soft irq will be losed and soft hrtimer will never be expire
> and handled.
>
> Please wrap your change logs.
[Jiafei Pan] Thanks, will update it.
>
> >
> > Adding irqs disabled checking here to provide warning in irqs enabled
> context.
> >
> > Signed-off-by: Jiafei Pan <[email protected]>
> > ---
> > kernel/softirq.c | 5 +++++
> > 1 file changed, 5 insertions(+)
> >
> > diff --git a/kernel/softirq.c b/kernel/softirq.c index
> > bf88d7f62433..11f61e54a3ae 100644
> > --- a/kernel/softirq.c
> > +++ b/kernel/softirq.c
> > @@ -481,6 +481,11 @@ void raise_softirq(unsigned int nr)
> >
> > void __raise_softirq_irqoff(unsigned int nr) {
> > + /* This function can only be called in irq disabled context,
> > + * otherwise or_softirq_pending will be interrupted by hardware
> > + * interrupt, so that there will be unexpected issue.
> > + */
> > + WARN_ON_ONCE(!irqs_disabled());
>
> Perhaps: lockdep_assert_irqs_disabled() is more appropriate, and doesn't add
> extra overhead on production systems.
>
> -- Steve
[Jiafei Pan] Thanks, will update it.
>
>
> > trace_softirq_raise(nr);
> > or_softirq_pending(1UL << nr);
> > }
> > --
> > 2.17.1