2012-02-27 20:23:39

by Dan Williams

[permalink] [raw]
Subject: [RFC PATCH] kick ksoftirqd more often to please soft lockup detector

An experimental hack to tease out whether we are continuing to
run the softirq handler past the point of needing scheduling.

It allows only one trip through __do_softirq() as long as need_resched()
is set which hopefully creates the back pressure needed to get ksoftirqd
scheduled.

Targeted to address reports like the following that are produced
with i/o tests to a sas domain with a large number of disks (48+), and
lots of debugging enabled (slub_deubg, lockdep) that makes the
block+scsi softirq path more cpu-expensive than normal.

With this patch applied the softlockup detector seems appeased, but it
seems odd to need changes to kernel/softirq.c so maybe I have overlooked
something that needs changing at the block/scsi level?

BUG: soft lockup - CPU#3 stuck for 22s! [kworker/3:1:78]
Modules linked in: nls_utf8 ipv6 uinput sg iTCO_wdt iTCO_vendor_support ioatdma dca i2c_i801 i2c_core wmi sd_mod ahci libahci isci libsas libata scsi_transport_sas [last unloaded: scsi_wait_scan]
irq event stamp: 26260303
hardirqs last enabled at (26260302): [<ffffffff814becf4>] restore_args+0x0/0x30
hardirqs last disabled at (26260303): [<ffffffff814c60ee>] apic_timer_interrupt+0x6e/0x80
softirqs last enabled at (26220386): [<ffffffff81033edd>] __do_softirq+0x1ae/0x1bd
softirqs last disabled at (26220665): [<ffffffff814c696c>] call_softirq+0x1c/0x26
CPU 3
Modules linked in: nls_utf8 ipv6 uinput sg iTCO_wdt iTCO_vendor_support ioatdma dca i2c_i801 i2c_core wmi sd_mod ahci libahci isci libsas libata scsi_transport_sas [last unloaded: scsi_wait_scan]

Pid: 78, comm: kworker/3:1 Not tainted 3.3.0-rc3-7ada1dd-isci-3.0.183+ #1 Intel Corporation ROSECITY/ROSECITY
RIP: 0010:[<ffffffff814be8b6>] [<ffffffff814be8b6>] _raw_spin_unlock_irq+0x34/0x4b
RSP: 0000:ffff8800bb8c3c50 EFLAGS: 00000202
RAX: ffff8800375f3ec0 RBX: ffffffff814becf4 RCX: ffff8800bb8c3c00
RDX: 0000000000000001 RSI: ffff880035bbc348 RDI: ffff8800375f4588
RBP: ffff8800bb8c3c60 R08: 0000000000000000 R09: ffff880035aed150
R10: 0000000000018f3b R11: ffff8800bb8c39e0 R12: ffff8800bb8c3bc8
R13: ffffffff814c60f3 R14: ffff8800bb8c3c60 R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffff8800bb8c0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000f2e028 CR3: 00000000b11b3000 CR4: 00000000000406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process kworker/3:1 (pid: 78, threadinfo ffff8800377d2000, task ffff8800375f3ec0)
Stack:
ffff88003555f800 ffff88003555f800 ffff8800bb8c3cc0 ffffffffa00512c4
ffffffff814be8b2 ffff880035dfc000 ffff880035dfe000 000000000553a265
ffff8800bb8c3cb0 ffff88003555f800 ffff8800b20af200 ffff880035dfe000
Call Trace:
<IRQ>
[<ffffffffa00512c4>] sas_queuecommand+0xa7/0x204 [libsas]
[<ffffffff814be8b2>] ? _raw_spin_unlock_irq+0x30/0x4b
[<ffffffff8132b6a9>] scsi_dispatch_cmd+0x1a2/0x24c
[<ffffffff813317a8>] ? spin_lock+0x9/0xb
[<ffffffff813333b0>] scsi_request_fn+0x3b1/0x3d9
[<ffffffff8124a19f>] __blk_run_queue+0x1d/0x1f
[<ffffffff8124b869>] blk_run_queue+0x26/0x3a
[<ffffffff813319a5>] scsi_run_queue+0x1fb/0x20a
[<ffffffff81332136>] scsi_next_command+0x3b/0x4c
[<ffffffff81332b66>] scsi_io_completion+0x205/0x44f
[<ffffffff813316b8>] ? spin_unlock_irqrestore+0x9/0xb
[<ffffffff8132b3ab>] scsi_finish_command+0xeb/0xf4
[<ffffffff81333a04>] scsi_softirq_done+0x112/0x11b
[<ffffffff812540ac>] blk_done_softirq+0x7e/0x96
[<ffffffff81033e0c>] __do_softirq+0xdd/0x1bd
[<ffffffff814c696c>] call_softirq+0x1c/0x26
[<ffffffff81003ce6>] do_softirq+0x4b/0xa5
[<ffffffff81034916>] irq_exit+0x55/0xc2
[<ffffffff814c6a9c>] smp_apic_timer_interrupt+0x7c/0x8a
[<ffffffff814c60f3>] apic_timer_interrupt+0x73/0x80
<EOI>
[<ffffffff814be8b6>] ? _raw_spin_unlock_irq+0x34/0x4b
[<ffffffffa00512c4>] sas_queuecommand+0xa7/0x204 [libsas]
[<ffffffff814be8b2>] ? _raw_spin_unlock_irq+0x30/0x4b
[<ffffffff8132b6a9>] scsi_dispatch_cmd+0x1a2/0x24c
[<ffffffff813317a8>] ? spin_lock+0x9/0xb
[<ffffffff813333b0>] scsi_request_fn+0x3b1/0x3d9
[<ffffffff8124a19f>] __blk_run_queue+0x1d/0x1f
[<ffffffff8125f3bb>] cfq_kick_queue+0x2f/0x41
[<ffffffff8104462e>] process_one_work+0x1c8/0x336
[<ffffffff81044599>] ? process_one_work+0x133/0x336
[<ffffffff81044306>] ? spin_lock_irq+0x9/0xb
[<ffffffff8125f38c>] ? cfq_init_queue+0x2a3/0x2a3
[<ffffffff81045fd9>] ? workqueue_congested+0x1e/0x1e
[<ffffffff81046085>] worker_thread+0xac/0x151
[<ffffffff81045fd9>] ? workqueue_congested+0x1e/0x1e
[<ffffffff8104a618>] kthread+0x8a/0x92
[<ffffffff8107654e>] ? trace_hardirqs_on_caller+0x16/0x16d
[<ffffffff814c6874>] kernel_thread_helper+0x4/0x10
[<ffffffff814becf4>] ? retint_restore_args+0x13/0x13
[<ffffffff8104a58e>] ? kthread_create_on_node+0x14d/0x14d
[<ffffffff814c6870>] ? gs_change+0x13/0x13

Cc: Peter Zijlstra <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: James Bottomley <[email protected]>
Reported-by: Lukasz Dorau <[email protected]>
Reported-by: Andrzej Jakowski <[email protected]>
Not-yet-signed-off-by: Dan Williams <[email protected]>
---
kernel/softirq.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/softirq.c b/kernel/softirq.c
index 4eb3a0f..82a3f43 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -255,7 +255,7 @@ restart:
local_irq_disable();

pending = local_softirq_pending();
- if (pending && --max_restart)
+ if (pending && --max_restart && !need_resched())
goto restart;

if (pending)


2012-02-28 08:35:39

by Yong Zhang

[permalink] [raw]
Subject: Re: [RFC PATCH] kick ksoftirqd more often to please soft lockup detector

On Mon, Feb 27, 2012 at 12:38:47PM -0800, Dan Williams wrote:
> An experimental hack to tease out whether we are continuing to
> run the softirq handler past the point of needing scheduling.
>
> It allows only one trip through __do_softirq() as long as need_resched()
> is set which hopefully creates the back pressure needed to get ksoftirqd
> scheduled.
>
> Targeted to address reports like the following that are produced
> with i/o tests to a sas domain with a large number of disks (48+), and
> lots of debugging enabled (slub_deubg, lockdep) that makes the
> block+scsi softirq path more cpu-expensive than normal.
>
> With this patch applied the softlockup detector seems appeased, but it
> seems odd to need changes to kernel/softirq.c so maybe I have overlooked
> something that needs changing at the block/scsi level?

But stucking in softirq for 22s still seems odd.

I guess the reason why your patch works is that softirq returns before
handling BLOCK_SOFTIRQ, but who knows, just guess.

Does kernel command line 'threadirqs' solve your issue?

Thanks,
Yong

>
> BUG: soft lockup - CPU#3 stuck for 22s! [kworker/3:1:78]
> Modules linked in: nls_utf8 ipv6 uinput sg iTCO_wdt iTCO_vendor_support ioatdma dca i2c_i801 i2c_core wmi sd_mod ahci libahci isci libsas libata scsi_transport_sas [last unloaded: scsi_wait_scan]
> irq event stamp: 26260303
> hardirqs last enabled at (26260302): [<ffffffff814becf4>] restore_args+0x0/0x30
> hardirqs last disabled at (26260303): [<ffffffff814c60ee>] apic_timer_interrupt+0x6e/0x80
> softirqs last enabled at (26220386): [<ffffffff81033edd>] __do_softirq+0x1ae/0x1bd
> softirqs last disabled at (26220665): [<ffffffff814c696c>] call_softirq+0x1c/0x26
> CPU 3
> Modules linked in: nls_utf8 ipv6 uinput sg iTCO_wdt iTCO_vendor_support ioatdma dca i2c_i801 i2c_core wmi sd_mod ahci libahci isci libsas libata scsi_transport_sas [last unloaded: scsi_wait_scan]
>
> Pid: 78, comm: kworker/3:1 Not tainted 3.3.0-rc3-7ada1dd-isci-3.0.183+ #1 Intel Corporation ROSECITY/ROSECITY
> RIP: 0010:[<ffffffff814be8b6>] [<ffffffff814be8b6>] _raw_spin_unlock_irq+0x34/0x4b
> RSP: 0000:ffff8800bb8c3c50 EFLAGS: 00000202
> RAX: ffff8800375f3ec0 RBX: ffffffff814becf4 RCX: ffff8800bb8c3c00
> RDX: 0000000000000001 RSI: ffff880035bbc348 RDI: ffff8800375f4588
> RBP: ffff8800bb8c3c60 R08: 0000000000000000 R09: ffff880035aed150
> R10: 0000000000018f3b R11: ffff8800bb8c39e0 R12: ffff8800bb8c3bc8
> R13: ffffffff814c60f3 R14: ffff8800bb8c3c60 R15: 0000000000000000
> FS: 0000000000000000(0000) GS:ffff8800bb8c0000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 0000000000f2e028 CR3: 00000000b11b3000 CR4: 00000000000406e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process kworker/3:1 (pid: 78, threadinfo ffff8800377d2000, task ffff8800375f3ec0)
> Stack:
> ffff88003555f800 ffff88003555f800 ffff8800bb8c3cc0 ffffffffa00512c4
> ffffffff814be8b2 ffff880035dfc000 ffff880035dfe000 000000000553a265
> ffff8800bb8c3cb0 ffff88003555f800 ffff8800b20af200 ffff880035dfe000
> Call Trace:
> <IRQ>
> [<ffffffffa00512c4>] sas_queuecommand+0xa7/0x204 [libsas]
> [<ffffffff814be8b2>] ? _raw_spin_unlock_irq+0x30/0x4b
> [<ffffffff8132b6a9>] scsi_dispatch_cmd+0x1a2/0x24c
> [<ffffffff813317a8>] ? spin_lock+0x9/0xb
> [<ffffffff813333b0>] scsi_request_fn+0x3b1/0x3d9
> [<ffffffff8124a19f>] __blk_run_queue+0x1d/0x1f
> [<ffffffff8124b869>] blk_run_queue+0x26/0x3a
> [<ffffffff813319a5>] scsi_run_queue+0x1fb/0x20a
> [<ffffffff81332136>] scsi_next_command+0x3b/0x4c
> [<ffffffff81332b66>] scsi_io_completion+0x205/0x44f
> [<ffffffff813316b8>] ? spin_unlock_irqrestore+0x9/0xb
> [<ffffffff8132b3ab>] scsi_finish_command+0xeb/0xf4
> [<ffffffff81333a04>] scsi_softirq_done+0x112/0x11b
> [<ffffffff812540ac>] blk_done_softirq+0x7e/0x96
> [<ffffffff81033e0c>] __do_softirq+0xdd/0x1bd
> [<ffffffff814c696c>] call_softirq+0x1c/0x26
> [<ffffffff81003ce6>] do_softirq+0x4b/0xa5
> [<ffffffff81034916>] irq_exit+0x55/0xc2
> [<ffffffff814c6a9c>] smp_apic_timer_interrupt+0x7c/0x8a
> [<ffffffff814c60f3>] apic_timer_interrupt+0x73/0x80
> <EOI>
> [<ffffffff814be8b6>] ? _raw_spin_unlock_irq+0x34/0x4b
> [<ffffffffa00512c4>] sas_queuecommand+0xa7/0x204 [libsas]
> [<ffffffff814be8b2>] ? _raw_spin_unlock_irq+0x30/0x4b
> [<ffffffff8132b6a9>] scsi_dispatch_cmd+0x1a2/0x24c
> [<ffffffff813317a8>] ? spin_lock+0x9/0xb
> [<ffffffff813333b0>] scsi_request_fn+0x3b1/0x3d9
> [<ffffffff8124a19f>] __blk_run_queue+0x1d/0x1f
> [<ffffffff8125f3bb>] cfq_kick_queue+0x2f/0x41
> [<ffffffff8104462e>] process_one_work+0x1c8/0x336
> [<ffffffff81044599>] ? process_one_work+0x133/0x336
> [<ffffffff81044306>] ? spin_lock_irq+0x9/0xb
> [<ffffffff8125f38c>] ? cfq_init_queue+0x2a3/0x2a3
> [<ffffffff81045fd9>] ? workqueue_congested+0x1e/0x1e
> [<ffffffff81046085>] worker_thread+0xac/0x151
> [<ffffffff81045fd9>] ? workqueue_congested+0x1e/0x1e
> [<ffffffff8104a618>] kthread+0x8a/0x92
> [<ffffffff8107654e>] ? trace_hardirqs_on_caller+0x16/0x16d
> [<ffffffff814c6874>] kernel_thread_helper+0x4/0x10
> [<ffffffff814becf4>] ? retint_restore_args+0x13/0x13
> [<ffffffff8104a58e>] ? kthread_create_on_node+0x14d/0x14d
> [<ffffffff814c6870>] ? gs_change+0x13/0x13
>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Jens Axboe <[email protected]>
> Cc: James Bottomley <[email protected]>
> Reported-by: Lukasz Dorau <[email protected]>
> Reported-by: Andrzej Jakowski <[email protected]>
> Not-yet-signed-off-by: Dan Williams <[email protected]>
> ---
> kernel/softirq.c | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/kernel/softirq.c b/kernel/softirq.c
> index 4eb3a0f..82a3f43 100644
> --- a/kernel/softirq.c
> +++ b/kernel/softirq.c
> @@ -255,7 +255,7 @@ restart:
> local_irq_disable();
>
> pending = local_softirq_pending();
> - if (pending && --max_restart)
> + if (pending && --max_restart && !need_resched())
> goto restart;
>
> if (pending)
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

--
Only stand for myself

2012-02-28 09:49:13

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFC PATCH] kick ksoftirqd more often to please soft lockup detector

On Mon, 2012-02-27 at 12:38 -0800, Dan Williams wrote:
> An experimental hack to tease out whether we are continuing to
> run the softirq handler past the point of needing scheduling.
>
> It allows only one trip through __do_softirq() as long as need_resched()
> is set which hopefully creates the back pressure needed to get ksoftirqd
> scheduled.
>
> Targeted to address reports like the following that are produced
> with i/o tests to a sas domain with a large number of disks (48+), and
> lots of debugging enabled (slub_deubg, lockdep) that makes the
> block+scsi softirq path more cpu-expensive than normal.
>
> With this patch applied the softlockup detector seems appeased, but it
> seems odd to need changes to kernel/softirq.c so maybe I have overlooked
> something that needs changing at the block/scsi level?
>
> BUG: soft lockup - CPU#3 stuck for 22s! [kworker/3:1:78]

So you're stuck in softirq for 22s+, max_restart is 10, this gives that
on average you spend 2.2s+ per softirq invocation, this is completely
absolutely bonkers. Softirq handlers should never consume significant
amount of cpu-time.

Thomas, think its about time we put something like the below in?


---
kernel/softirq.c | 16 ++++++++++++++++
1 files changed, 16 insertions(+), 0 deletions(-)

diff --git a/kernel/softirq.c b/kernel/softirq.c
index ff066a4..6137ee1 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -210,6 +210,7 @@ asmlinkage void __do_softirq(void)
__u32 pending;
int max_restart = MAX_SOFTIRQ_RESTART;
int cpu;
+ u64 start, callback, now;

pending = local_softirq_pending();
account_system_vtime(current);
@@ -223,6 +224,8 @@ asmlinkage void __do_softirq(void)
/* Reset the pending bitmask before enabling irqs */
set_softirq_pending(0);

+ start = callback = cpu_clock(cpu);
+
local_irq_enable();

h = softirq_vec;
@@ -246,6 +249,15 @@ asmlinkage void __do_softirq(void)
preempt_count() = prev_count;
}

+ now = cpu_clock(cpu);
+ if (now - callback > TICK_NSEC / 4) {
+ printk(KERN_ERR "softirq took longer than 1/4 tick: "
+ "%u %s %p\n", vec_nr,
+ softirq_to_name[vec_nr],
+ h->action);
+ }
+ callback = now;
+
rcu_bh_qs(cpu);
}
h++;
@@ -254,6 +266,10 @@ asmlinkage void __do_softirq(void)

local_irq_disable();

+ now = cpu_clock(cpu);
+ if (now - start > TICK_NSEC / 2)
+ printk(KERN_ERR "softirq loop took longer than 1/2 tick\n");
+
pending = local_softirq_pending();
if (pending && --max_restart)
goto restart;

2012-02-28 16:48:41

by Dan Williams

[permalink] [raw]
Subject: Re: [RFC PATCH] kick ksoftirqd more often to please soft lockup detector

On Tue, Feb 28, 2012 at 1:48 AM, Peter Zijlstra <[email protected]> wrote:
>> BUG: soft lockup - CPU#3 stuck for 22s! [kworker/3:1:78]
>
> So you're stuck in softirq for 22s+, max_restart is 10, this gives that
> on average you spend 2.2s+ per softirq invocation, this is completely
> absolutely bonkers. Softirq handlers should never consume significant
> amount of cpu-time.

I'm not sure whether this was just one __do_softirq() invocation, or
multiple. I'll see if I can instrument this, but the suspicion is
that max_restart can't help when softirq is constantly re-triggered.
I'll also add the below patch to see if any single invocation is
violating these expectations.

--
Dan

2012-02-28 21:41:43

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [RFC PATCH] kick ksoftirqd more often to please soft lockup detector

On Tue, 28 Feb 2012, Peter Zijlstra wrote:

> On Mon, 2012-02-27 at 12:38 -0800, Dan Williams wrote:
> > An experimental hack to tease out whether we are continuing to
> > run the softirq handler past the point of needing scheduling.
> >
> > It allows only one trip through __do_softirq() as long as need_resched()
> > is set which hopefully creates the back pressure needed to get ksoftirqd
> > scheduled.
> >
> > Targeted to address reports like the following that are produced
> > with i/o tests to a sas domain with a large number of disks (48+), and
> > lots of debugging enabled (slub_deubg, lockdep) that makes the
> > block+scsi softirq path more cpu-expensive than normal.
> >
> > With this patch applied the softlockup detector seems appeased, but it
> > seems odd to need changes to kernel/softirq.c so maybe I have overlooked
> > something that needs changing at the block/scsi level?
> >
> > BUG: soft lockup - CPU#3 stuck for 22s! [kworker/3:1:78]
>
> So you're stuck in softirq for 22s+, max_restart is 10, this gives that
> on average you spend 2.2s+ per softirq invocation, this is completely
> absolutely bonkers. Softirq handlers should never consume significant
> amount of cpu-time.
>
> Thomas, think its about time we put something like the below in?

Absolutely. Anything which consumes more than a few microseconds in
the softirq handler needs to be sorted out, no matter what.

2012-02-28 22:17:01

by Dan Williams

[permalink] [raw]
Subject: Re: [RFC PATCH] kick ksoftirqd more often to please soft lockup detector

On Tue, Feb 28, 2012 at 1:41 PM, Thomas Gleixner <[email protected]> wrote:
> On Tue, 28 Feb 2012, Peter Zijlstra wrote:
>
>> On Mon, 2012-02-27 at 12:38 -0800, Dan Williams wrote:
>> > An experimental hack to tease out whether we are continuing to
>> > run the softirq handler past the point of needing scheduling.
>> >
>> > It allows only one trip through __do_softirq() as long as need_resched()
>> > is set which hopefully creates the back pressure needed to get ksoftirqd
>> > scheduled.
>> >
>> > Targeted to address reports like the following that are produced
>> > with i/o tests to a sas domain with a large number of disks (48+), and
>> > lots of debugging enabled (slub_deubg, lockdep) that makes the
>> > block+scsi softirq path more cpu-expensive than normal.
>> >
>> > With this patch applied the softlockup detector seems appeased, but it
>> > seems odd to need changes to kernel/softirq.c so maybe I have overlooked
>> > something that needs changing at the block/scsi level?
>> >
>> > BUG: soft lockup - CPU#3 stuck for 22s! [kworker/3:1:78]
>>
>> So you're stuck in softirq for 22s+, max_restart is 10, this gives that
>> on average you spend 2.2s+ per softirq invocation, this is completely
>> absolutely bonkers. Softirq handlers should never consume significant
>> amount of cpu-time.
>>
>> Thomas, think its about time we put something like the below in?
>
> Absolutely. Anything which consumes more than a few microseconds in
> the softirq handler needs to be sorted out, no matter what.

Looks like everyone is guilty:

[ 422.765336] softirq took longer than 1/4 tick: 3 NET_RX ffffffff813f0aa0
...
[ 423.971878] softirq took longer than 1/4 tick: 4 BLOCK ffffffff812519c8
[ 423.985093] softirq took longer than 1/4 tick: 6 TASKLET ffffffff8103422e
[ 423.993157] softirq took longer than 1/4 tick: 7 SCHED ffffffff8105e2e1
[ 424.001018] softirq took longer than 1/4 tick: 9 RCU ffffffff810a0fed
[ 424.008691] softirq loop took longer than 1/2 tick need_resched:
yes max_restart: 10

As expected whenever that 1/2 tick message gets emitted the softirq
handler is almost running in a need_resched() context.

$ grep need_resched.*no log | wc -l
295
$ grep need_resched.*yes log | wc -l
3201

One of these warning messages gets printed out at a rate of 1 every
40ms. (468 second log w/ 11,725 of these messages).

So is it a good idea to get more aggressive about scheduling ksoftrrqd?

--
Dan

2012-02-28 22:25:48

by Dan Williams

[permalink] [raw]
Subject: Re: [RFC PATCH] kick ksoftirqd more often to please soft lockup detector

On Tue, Feb 28, 2012 at 2:16 PM, Dan Williams <[email protected]> wrote:
> On Tue, Feb 28, 2012 at 1:41 PM, Thomas Gleixner <[email protected]> wrote:
>> On Tue, 28 Feb 2012, Peter Zijlstra wrote:
>>
>>> On Mon, 2012-02-27 at 12:38 -0800, Dan Williams wrote:
>>> > An experimental hack to tease out whether we are continuing to
>>> > run the softirq handler past the point of needing scheduling.
>>> >
>>> > It allows only one trip through __do_softirq() as long as need_resched()
>>> > is set which hopefully creates the back pressure needed to get ksoftirqd
>>> > scheduled.
>>> >
>>> > Targeted to address reports like the following that are produced
>>> > with i/o tests to a sas domain with a large number of disks (48+), and
>>> > lots of debugging enabled (slub_deubg, lockdep) that makes the
>>> > block+scsi softirq path more cpu-expensive than normal.
>>> >
>>> > With this patch applied the softlockup detector seems appeased, but it
>>> > seems odd to need changes to kernel/softirq.c so maybe I have overlooked
>>> > something that needs changing at the block/scsi level?
>>> >
>>> > BUG: soft lockup - CPU#3 stuck for 22s! [kworker/3:1:78]
>>>
>>> So you're stuck in softirq for 22s+, max_restart is 10, this gives that
>>> on average you spend 2.2s+ per softirq invocation, this is completely
>>> absolutely bonkers. Softirq handlers should never consume significant
>>> amount of cpu-time.
>>>
>>> Thomas, think its about time we put something like the below in?
>>
>> Absolutely. Anything which consumes more than a few microseconds in
>> the softirq handler needs to be sorted out, no matter what.
>
> Looks like everyone is guilty:
>
> [ ?422.765336] softirq took longer than 1/4 tick: 3 NET_RX ffffffff813f0aa0
> ...
> [ ?423.971878] softirq took longer than 1/4 tick: 4 BLOCK ffffffff812519c8
> [ ?423.985093] softirq took longer than 1/4 tick: 6 TASKLET ffffffff8103422e
> [ ?423.993157] softirq took longer than 1/4 tick: 7 SCHED ffffffff8105e2e1
> [ ?424.001018] softirq took longer than 1/4 tick: 9 RCU ffffffff810a0fed
> [ ?424.008691] softirq loop took longer than 1/2 tick need_resched:
> yes max_restart: 10
>
> As expected whenever that 1/2 tick message gets emitted the softirq
> handler is almost running in a need_resched() context.
>
> $ grep need_resched.*no log | wc -l
> 295
> $ grep need_resched.*yes log | wc -l
> 3201
>
> One of these warning messages gets printed out at a rate of 1 every
> 40ms. (468 second log w/ 11,725 of these messages).
>
> So is it a good idea to get more aggressive about scheduling ksoftrrqd?
>

Keep in mind this is with slub_debug and lockdep turned on, but we are
basically looking to do extended i/o runs on large topologies and want
the soft lockup detector to not generate false positives.

2012-02-29 09:17:18

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFC PATCH] kick ksoftirqd more often to please soft lockup detector

On Tue, 2012-02-28 at 14:16 -0800, Dan Williams wrote:
> Looks like everyone is guilty:
>
> [ 422.765336] softirq took longer than 1/4 tick: 3 NET_RX ffffffff813f0aa0
> ...
> [ 423.971878] softirq took longer than 1/4 tick: 4 BLOCK ffffffff812519c8
> [ 423.985093] softirq took longer than 1/4 tick: 6 TASKLET ffffffff8103422e
> [ 423.993157] softirq took longer than 1/4 tick: 7 SCHED ffffffff8105e2e1
> [ 424.001018] softirq took longer than 1/4 tick: 9 RCU ffffffff810a0fed
> [ 424.008691] softirq loop took longer than 1/2 tick need_resched:

/me kicks himself for not printing the actual duration.. :-)

> As expected whenever that 1/2 tick message gets emitted the softirq
> handler is almost running in a need_resched() context.

Yeah.. that's quite expected.

> So is it a good idea to get more aggressive about scheduling ksoftrrqd?

Nah, moving away from softirq more like. I'll put moving the
load-balancer into a kthread on the todo list. And it looks like
everybody else should move to kthreads too.

2012-02-29 19:49:57

by Dan Williams

[permalink] [raw]
Subject: Re: [RFC PATCH] kick ksoftirqd more often to please soft lockup detector

On Wed, Feb 29, 2012 at 1:17 AM, Peter Zijlstra <[email protected]> wrote:
> On Tue, 2012-02-28 at 14:16 -0800, Dan Williams wrote:
>> Looks like everyone is guilty:
>>
>> [ ?422.765336] softirq took longer than 1/4 tick: 3 NET_RX ffffffff813f0aa0
>> ...
>> [ ?423.971878] softirq took longer than 1/4 tick: 4 BLOCK ffffffff812519c8
>> [ ?423.985093] softirq took longer than 1/4 tick: 6 TASKLET ffffffff8103422e
>> [ ?423.993157] softirq took longer than 1/4 tick: 7 SCHED ffffffff8105e2e1
>> [ ?424.001018] softirq took longer than 1/4 tick: 9 RCU ffffffff810a0fed
>> [ ?424.008691] softirq loop took longer than 1/2 tick need_resched:
>
> /me kicks himself for not printing the actual duration.. :-)
>
>> As expected whenever that 1/2 tick message gets emitted the softirq
>> handler is almost running in a need_resched() context.
>
> Yeah.. that's quite expected.
>
>> So is it a good idea to get more aggressive about scheduling ksoftrrqd?
>
> Nah, moving away from softirq more like. I'll put moving the
> load-balancer into a kthread on the todo list. And it looks like
> everybody else should move to kthreads too.

I seem to recall this running into resistance (but maybe things have
changed in the last few years)?

https://lkml.org/lkml/2007/6/29/155