2015-05-01 20:00:04

by Josh Hunt

[permalink] [raw]
Subject: Re: [PATCH 3.19 016/175] ksoftirqd: Enable IRQs and call cond_resched() before poking RCU

On Wed, Mar 4, 2015 at 12:13 AM, Greg Kroah-Hartman
<[email protected]> wrote:
> 3.19-stable review patch. If anyone has any objections, please let me know.
>
> ------------------
>
> From: Calvin Owens <[email protected]>
>
> commit 28423ad283d5348793b0c45cc9b1af058e776fd6 upstream.
>
> While debugging an issue with excessive softirq usage, I encountered the
> following note in commit 3e339b5dae24a706 ("softirq: Use hotplug thread
> infrastructure"):
>
> [ paulmck: Call rcu_note_context_switch() with interrupts enabled. ]
>
> ...but despite this note, the patch still calls RCU with IRQs disabled.
>
> This seemingly innocuous change caused a significant regression in softirq
> CPU usage on the sending side of a large TCP transfer (~1 GB/s): when
> introducing 0.01% packet loss, the softirq usage would jump to around 25%,
> spiking as high as 50%. Before the change, the usage would never exceed 5%.
>
> Moving the call to rcu_note_context_switch() after the cond_sched() call,
> as it was originally before the hotplug patch, completely eliminated this
> problem.
>
> Signed-off-by: Calvin Owens <[email protected]>
> Signed-off-by: Paul E. McKenney <[email protected]>
> Signed-off-by: Greg Kroah-Hartman <[email protected]>
>
> ---
> kernel/softirq.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> --- a/kernel/softirq.c
> +++ b/kernel/softirq.c
> @@ -656,9 +656,13 @@ static void run_ksoftirqd(unsigned int c
> * in the task stack here.
> */
> __do_softirq();
> - rcu_note_context_switch();
> local_irq_enable();
> cond_resched();
> +
> + preempt_disable();
> + rcu_note_context_switch();
> + preempt_enable();
> +
> return;
> }
> local_irq_enable();
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

Sorry for the delay in noticing this, but should this be applied to
3.14-stable as well?

Thanks
Josh


2015-05-01 20:52:11

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH 3.19 016/175] ksoftirqd: Enable IRQs and call cond_resched() before poking RCU

On Fri, May 01, 2015 at 03:00:00PM -0500, Josh Hunt wrote:
> On Wed, Mar 4, 2015 at 12:13 AM, Greg Kroah-Hartman
> <[email protected]> wrote:
> > 3.19-stable review patch. If anyone has any objections, please let me know.
> >
> > ------------------
> >
> > From: Calvin Owens <[email protected]>
> >
> > commit 28423ad283d5348793b0c45cc9b1af058e776fd6 upstream.
> >
> > While debugging an issue with excessive softirq usage, I encountered the
> > following note in commit 3e339b5dae24a706 ("softirq: Use hotplug thread
> > infrastructure"):
> >
> > [ paulmck: Call rcu_note_context_switch() with interrupts enabled. ]
> >
> > ...but despite this note, the patch still calls RCU with IRQs disabled.
> >
> > This seemingly innocuous change caused a significant regression in softirq
> > CPU usage on the sending side of a large TCP transfer (~1 GB/s): when
> > introducing 0.01% packet loss, the softirq usage would jump to around 25%,
> > spiking as high as 50%. Before the change, the usage would never exceed 5%.
> >
> > Moving the call to rcu_note_context_switch() after the cond_sched() call,
> > as it was originally before the hotplug patch, completely eliminated this
> > problem.
> >
> > Signed-off-by: Calvin Owens <[email protected]>
> > Signed-off-by: Paul E. McKenney <[email protected]>
> > Signed-off-by: Greg Kroah-Hartman <[email protected]>
> >
> > ---
> > kernel/softirq.c | 6 +++++-
> > 1 file changed, 5 insertions(+), 1 deletion(-)
> >
> > --- a/kernel/softirq.c
> > +++ b/kernel/softirq.c
> > @@ -656,9 +656,13 @@ static void run_ksoftirqd(unsigned int c
> > * in the task stack here.
> > */
> > __do_softirq();
> > - rcu_note_context_switch();
> > local_irq_enable();
> > cond_resched();
> > +
> > + preempt_disable();
> > + rcu_note_context_switch();
> > + preempt_enable();
> > +
> > return;
> > }
> > local_irq_enable();
> >
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at http://www.tux.org/lkml/
>
> Sorry for the delay in noticing this, but should this be applied to
> 3.14-stable as well?

Why should it? And odds are, if I didn't apply it there, it was either
because it didn't apply, or it broke the build. Have you tried this out
in 3.14 to see if it does even work?

thanks,

greg k-h

2015-05-02 03:14:53

by Mike Galbraith

[permalink] [raw]
Subject: Re: [PATCH 3.19 016/175] ksoftirqd: Enable IRQs and call cond_resched() before poking RCU

On Fri, 2015-05-01 at 22:52 +0200, Greg Kroah-Hartman wrote:
> On Fri, May 01, 2015 at 03:00:00PM -0500, Josh Hunt wrote:
> > On Wed, Mar 4, 2015 at 12:13 AM, Greg Kroah-Hartman
> > <[email protected]> wrote:
> > > 3.19-stable review patch. If anyone has any objections, please let me know.
> > >
> > > ------------------
> > >
> > > From: Calvin Owens <[email protected]>
> > >
> > > commit 28423ad283d5348793b0c45cc9b1af058e776fd6 upstream.
> > >
> > > While debugging an issue with excessive softirq usage, I encountered the
> > > following note in commit 3e339b5dae24a706 ("softirq: Use hotplug thread
> > > infrastructure"):
> > >
> > > [ paulmck: Call rcu_note_context_switch() with interrupts enabled. ]
> > >
> > > ...but despite this note, the patch still calls RCU with IRQs disabled.
> > >
> > > This seemingly innocuous change caused a significant regression in softirq
> > > CPU usage on the sending side of a large TCP transfer (~1 GB/s): when
> > > introducing 0.01% packet loss, the softirq usage would jump to around 25%,
> > > spiking as high as 50%. Before the change, the usage would never exceed 5%.
> > >
> > > Moving the call to rcu_note_context_switch() after the cond_sched() call,
> > > as it was originally before the hotplug patch, completely eliminated this
> > > problem.
> > >
> > > Signed-off-by: Calvin Owens <[email protected]>
> > > Signed-off-by: Paul E. McKenney <[email protected]>
> > > Signed-off-by: Greg Kroah-Hartman <[email protected]>
> > >
> > > ---
> > > kernel/softirq.c | 6 +++++-
> > > 1 file changed, 5 insertions(+), 1 deletion(-)
> > >
> > > --- a/kernel/softirq.c
> > > +++ b/kernel/softirq.c
> > > @@ -656,9 +656,13 @@ static void run_ksoftirqd(unsigned int c
> > > * in the task stack here.
> > > */
> > > __do_softirq();
> > > - rcu_note_context_switch();
> > > local_irq_enable();
> > > cond_resched();
> > > +
> > > + preempt_disable();
> > > + rcu_note_context_switch();
> > > + preempt_enable();
> > > +
> > > return;
> > > }
> > > local_irq_enable();
> > >
> > >
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > > the body of a message to [email protected]
> > > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > > Please read the FAQ at http://www.tux.org/lkml/
> >
> > Sorry for the delay in noticing this, but should this be applied to
> > 3.14-stable as well?
>
> Why should it?

The regression inducing change arrived in 3.7-rc1.

> And odds are, if I didn't apply it there, it was either
> because it didn't apply, or it broke the build.

a. [x] rcu_note_context_switch(cpu) -> rcu_note_context_switch()

>From 28423ad283d5348793b0c45cc9b1af058e776fd6 Mon Sep 17 00:00:00 2001
From: Calvin Owens <[email protected]>
Date: Tue, 13 Jan 2015 13:16:18 -0800
Subject: ksoftirqd: Enable IRQs and call cond_resched() before poking RCU

While debugging an issue with excessive softirq usage, I encountered the
following note in commit 3e339b5dae24a706 ("softirq: Use hotplug thread
infrastructure"):

[ paulmck: Call rcu_note_context_switch() with interrupts enabled. ]

...but despite this note, the patch still calls RCU with IRQs disabled.

This seemingly innocuous change caused a significant regression in softirq
CPU usage on the sending side of a large TCP transfer (~1 GB/s): when
introducing 0.01% packet loss, the softirq usage would jump to around 25%,
spiking as high as 50%. Before the change, the usage would never exceed 5%.

Moving the call to rcu_note_context_switch() after the cond_sched() call,
as it was originally before the hotplug patch, completely eliminated this
problem.

Signed-off-by: Calvin Owens <[email protected]>
Cc: [email protected]
Signed-off-by: Paul E. McKenney <[email protected]>
Signed-off-by: Mike Galbraith <[email protected]>
---
kernel/softirq.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)

--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -657,9 +657,13 @@ static void run_ksoftirqd(unsigned int c
* in the task stack here.
*/
__do_softirq();
- rcu_note_context_switch(cpu);
local_irq_enable();
cond_resched();
+
+ preempt_disable();
+ rcu_note_context_switch(cpu);
+ preempt_enable();
+
return;
}
local_irq_enable();

2015-05-02 17:45:35

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH 3.19 016/175] ksoftirqd: Enable IRQs and call cond_resched() before poking RCU

On Sat, May 02, 2015 at 05:14:50AM +0200, Mike Galbraith wrote:
> On Fri, 2015-05-01 at 22:52 +0200, Greg Kroah-Hartman wrote:
> > On Fri, May 01, 2015 at 03:00:00PM -0500, Josh Hunt wrote:
> > > On Wed, Mar 4, 2015 at 12:13 AM, Greg Kroah-Hartman
> > > <[email protected]> wrote:
> > > > 3.19-stable review patch. If anyone has any objections, please let me know.
> > > >
> > > > ------------------
> > > >
> > > > From: Calvin Owens <[email protected]>
> > > >
> > > > commit 28423ad283d5348793b0c45cc9b1af058e776fd6 upstream.
> > > >
> > > > While debugging an issue with excessive softirq usage, I encountered the
> > > > following note in commit 3e339b5dae24a706 ("softirq: Use hotplug thread
> > > > infrastructure"):
> > > >
> > > > [ paulmck: Call rcu_note_context_switch() with interrupts enabled. ]
> > > >
> > > > ...but despite this note, the patch still calls RCU with IRQs disabled.
> > > >
> > > > This seemingly innocuous change caused a significant regression in softirq
> > > > CPU usage on the sending side of a large TCP transfer (~1 GB/s): when
> > > > introducing 0.01% packet loss, the softirq usage would jump to around 25%,
> > > > spiking as high as 50%. Before the change, the usage would never exceed 5%.
> > > >
> > > > Moving the call to rcu_note_context_switch() after the cond_sched() call,
> > > > as it was originally before the hotplug patch, completely eliminated this
> > > > problem.
> > > >
> > > > Signed-off-by: Calvin Owens <[email protected]>
> > > > Signed-off-by: Paul E. McKenney <[email protected]>
> > > > Signed-off-by: Greg Kroah-Hartman <[email protected]>
> > > >
> > > > ---
> > > > kernel/softirq.c | 6 +++++-
> > > > 1 file changed, 5 insertions(+), 1 deletion(-)
> > > >
> > > > --- a/kernel/softirq.c
> > > > +++ b/kernel/softirq.c
> > > > @@ -656,9 +656,13 @@ static void run_ksoftirqd(unsigned int c
> > > > * in the task stack here.
> > > > */
> > > > __do_softirq();
> > > > - rcu_note_context_switch();
> > > > local_irq_enable();
> > > > cond_resched();
> > > > +
> > > > + preempt_disable();
> > > > + rcu_note_context_switch();
> > > > + preempt_enable();
> > > > +
> > > > return;
> > > > }
> > > > local_irq_enable();
> > > >
> > > >
> > > > --
> > > > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > > > the body of a message to [email protected]
> > > > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > > > Please read the FAQ at http://www.tux.org/lkml/
> > >
> > > Sorry for the delay in noticing this, but should this be applied to
> > > 3.14-stable as well?
> >
> > Why should it?
>
> The regression inducing change arrived in 3.7-rc1.
>
> > And odds are, if I didn't apply it there, it was either
> > because it didn't apply, or it broke the build.
>
> a. [x] rcu_note_context_switch(cpu) -> rcu_note_context_switch()

Thanks for this, now applied.

greg k-h

2015-05-11 07:37:44

by Michal Kubecek

[permalink] [raw]
Subject: Re: [PATCH 3.19 016/175] ksoftirqd: Enable IRQs and call cond_resched() before poking RCU

On Sat, May 02, 2015 at 05:14:50AM +0200, Mike Galbraith wrote:
> On Fri, 2015-05-01 at 22:52 +0200, Greg Kroah-Hartman wrote:
> > On Fri, May 01, 2015 at 03:00:00PM -0500, Josh Hunt wrote:
> > > On Wed, Mar 4, 2015 at 12:13 AM, Greg Kroah-Hartman
> > > <[email protected]> wrote:
> > > > 3.19-stable review patch. If anyone has any objections, please let me know.
> > > >
> > > > ------------------
> > > >
> > > > From: Calvin Owens <[email protected]>
> > > >
> > > > commit 28423ad283d5348793b0c45cc9b1af058e776fd6 upstream.
> > > >
> > > > While debugging an issue with excessive softirq usage, I encountered the
> > > > following note in commit 3e339b5dae24a706 ("softirq: Use hotplug thread
> > > > infrastructure"):
> > > >
> > > > [ paulmck: Call rcu_note_context_switch() with interrupts enabled. ]
> > > >
> > > > ...but despite this note, the patch still calls RCU with IRQs disabled.
> > > >
> > > > This seemingly innocuous change caused a significant regression in softirq
> > > > CPU usage on the sending side of a large TCP transfer (~1 GB/s): when
> > > > introducing 0.01% packet loss, the softirq usage would jump to around 25%,
> > > > spiking as high as 50%. Before the change, the usage would never exceed 5%.
> > > >
> > > > Moving the call to rcu_note_context_switch() after the cond_sched() call,
> > > > as it was originally before the hotplug patch, completely eliminated this
> > > > problem.
> > > >
> > > > Signed-off-by: Calvin Owens <[email protected]>
> > > > Signed-off-by: Paul E. McKenney <[email protected]>
> > > > Signed-off-by: Greg Kroah-Hartman <[email protected]>
> > > >
> > > > ---
> > > > kernel/softirq.c | 6 +++++-
> > > > 1 file changed, 5 insertions(+), 1 deletion(-)
> > > >
> > > > --- a/kernel/softirq.c
> > > > +++ b/kernel/softirq.c
> > > > @@ -656,9 +656,13 @@ static void run_ksoftirqd(unsigned int c
> > > > * in the task stack here.
> > > > */
> > > > __do_softirq();
> > > > - rcu_note_context_switch();
> > > > local_irq_enable();
> > > > cond_resched();
> > > > +
> > > > + preempt_disable();
> > > > + rcu_note_context_switch();
> > > > + preempt_enable();
> > > > +
> > > > return;
> > > > }
> > > > local_irq_enable();
> > > >
> > > >
> > > > --
> > > > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > > > the body of a message to [email protected]
> > > > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > > > Please read the FAQ at http://www.tux.org/lkml/
> > >
> > > Sorry for the delay in noticing this, but should this be applied to
> > > 3.14-stable as well?
> >
> > Why should it?
>
> The regression inducing change arrived in 3.7-rc1.

I guess stable-3.12.y should have it too, then (with a trivial refresh
for the comment added in 3.13).

Michal Kubecek

> > And odds are, if I didn't apply it there, it was either
> > because it didn't apply, or it broke the build.
>
> a. [x] rcu_note_context_switch(cpu) -> rcu_note_context_switch()
>
> From 28423ad283d5348793b0c45cc9b1af058e776fd6 Mon Sep 17 00:00:00 2001
> From: Calvin Owens <[email protected]>
> Date: Tue, 13 Jan 2015 13:16:18 -0800
> Subject: ksoftirqd: Enable IRQs and call cond_resched() before poking RCU
>
> While debugging an issue with excessive softirq usage, I encountered the
> following note in commit 3e339b5dae24a706 ("softirq: Use hotplug thread
> infrastructure"):
>
> [ paulmck: Call rcu_note_context_switch() with interrupts enabled. ]
>
> ...but despite this note, the patch still calls RCU with IRQs disabled.
>
> This seemingly innocuous change caused a significant regression in softirq
> CPU usage on the sending side of a large TCP transfer (~1 GB/s): when
> introducing 0.01% packet loss, the softirq usage would jump to around 25%,
> spiking as high as 50%. Before the change, the usage would never exceed 5%.
>
> Moving the call to rcu_note_context_switch() after the cond_sched() call,
> as it was originally before the hotplug patch, completely eliminated this
> problem.
>
> Signed-off-by: Calvin Owens <[email protected]>
> Cc: [email protected]
> Signed-off-by: Paul E. McKenney <[email protected]>
> Signed-off-by: Mike Galbraith <[email protected]>
> ---
> kernel/softirq.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> --- a/kernel/softirq.c
> +++ b/kernel/softirq.c
> @@ -657,9 +657,13 @@ static void run_ksoftirqd(unsigned int c
> * in the task stack here.
> */
> __do_softirq();
> - rcu_note_context_switch(cpu);
> local_irq_enable();
> cond_resched();
> +
> + preempt_disable();
> + rcu_note_context_switch(cpu);
> + preempt_enable();
> +
> return;
> }
> local_irq_enable();
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe stable" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2015-05-11 07:53:18

by Jiri Slaby

[permalink] [raw]
Subject: Re: [PATCH 3.19 016/175] ksoftirqd: Enable IRQs and call cond_resched() before poking RCU

On 05/11/2015, 09:37 AM, Michal Kubecek wrote:
> On Sat, May 02, 2015 at 05:14:50AM +0200, Mike Galbraith wrote:
>> On Fri, 2015-05-01 at 22:52 +0200, Greg Kroah-Hartman wrote:
>>> On Fri, May 01, 2015 at 03:00:00PM -0500, Josh Hunt wrote:
>>>> On Wed, Mar 4, 2015 at 12:13 AM, Greg Kroah-Hartman
>>>> <[email protected]> wrote:
>>>>> 3.19-stable review patch. If anyone has any objections, please let me know.
>>>>>
>>>>> ------------------
>>>>>
>>>>> From: Calvin Owens <[email protected]>
>>>>>
>>>>> commit 28423ad283d5348793b0c45cc9b1af058e776fd6 upstream.
>>>>>
>>>>> While debugging an issue with excessive softirq usage, I encountered the
>>>>> following note in commit 3e339b5dae24a706 ("softirq: Use hotplug thread
>>>>> infrastructure"):
>>>>>
>>>>> [ paulmck: Call rcu_note_context_switch() with interrupts enabled. ]

...

>>>> Sorry for the delay in noticing this, but should this be applied to
>>>> 3.14-stable as well?
>>>
>>> Why should it?
>>
>> The regression inducing change arrived in 3.7-rc1.
>
> I guess stable-3.12.y should have it too, then (with a trivial refresh
> for the comment added in 3.13).

Yes, I already put it into 3.12. Thanks.

--
js
suse labs