2013-04-18 21:06:46

by Sasha Levin

[permalink] [raw]
Subject: WARNING: at kernel/smp.c:385 smp_call_function_many

Hi guys,

It seems that with commit "smp: Give WARN()ing when calling
smp_call_function_many()/single() in serving irq" I'm getting a bunch of these:

[ 467.250851] WARNING: at kernel/smp.c:385 smp_call_function_many+0xda/0x400()
[ 467.251990] Modules linked in:
[ 467.252496] CPU: 1 PID: 12638 Comm: trinity-child1 Tainted: G W 3.9.0-rc7-next-20130418-sasha-00013-g333a5d2 #340
[ 467.259691] 0000000000000009 ffff8800bb603d98 ffffffff83deed7a ffff8800bb603dd8
[ 467.261157] ffffffff81112ecb ffff88009d229fd8 ffffffff81148430 0000000000000001
[ 467.262478] ffffffff861922e0 0000000000000008 ffffffff861922e0 ffff8800bb603de8
[ 467.263758] Call Trace:
[ 467.266447] <IRQ> [<ffffffff83deed7a>] dump_stack+0x19/0x1b
[ 467.267437] [<ffffffff81112ecb>] warn_slowpath_common+0x6b/0xa0
[ 467.268418] [<ffffffff81148430>] ? hrtimer_wakeup+0x30/0x30
[ 467.269838] [<ffffffff81112f15>] warn_slowpath_null+0x15/0x20
[ 467.270844] [<ffffffff81197b8a>] smp_call_function_many+0xda/0x400
[ 467.272287] [<ffffffff8115c055>] ? sched_clock_local+0x25/0xa0
[ 467.273259] [<ffffffff81148430>] ? hrtimer_wakeup+0x30/0x30
[ 467.274590] [<ffffffff81148430>] ? hrtimer_wakeup+0x30/0x30
[ 467.275519] [<ffffffff81198001>] smp_call_function+0x41/0x80
[ 467.276801] [<ffffffff81148430>] ? hrtimer_wakeup+0x30/0x30
[ 467.277756] [<ffffffff81198099>] on_each_cpu+0x39/0x110
[ 467.278604] [<ffffffff81149197>] clock_was_set+0x17/0x20
[ 467.280198] [<ffffffff81149bb2>] run_hrtimer_softirq+0x32/0x40
[ 467.281530] [<ffffffff8111cb07>] __do_softirq+0x1d7/0x460
[ 467.282434] [<ffffffff8111cf26>] irq_exit+0x76/0xd0
[ 467.283243] [<ffffffff8109287a>] smp_apic_timer_interrupt+0x8a/0xa0
[ 467.284707] [<ffffffff83e369f2>] apic_timer_interrupt+0x72/0x80
[ 467.285674] <EOI> [<ffffffff81071ef5>] ? sched_clock+0x15/0x20
[ 467.287132] [<ffffffff83e31dd6>] ? add_preempt_count+0xa6/0xc0
[ 467.288089] [<ffffffff81a1db2c>] delay_tsc+0xac/0x120
[ 467.289342] [<ffffffff8127cbd8>] ? __mem_cgroup_count_vm_event+0x178/0x1a0
[ 467.290522] [<ffffffff81a1da19>] __const_udelay+0x29/0x30
[ 467.291425] [<ffffffff8113ee14>] __rcu_read_unlock+0x44/0xb0
[ 467.292829] [<ffffffff8127cbf5>] __mem_cgroup_count_vm_event+0x195/0x1a0
[ 467.293994] [<ffffffff8127ca60>] ? mem_cgroup_from_task+0xd0/0xd0
[ 467.295456] [<ffffffff81240e82>] handle_mm_fault+0x632/0x660
[ 467.296692] [<ffffffff8118b40a>] ? __lock_is_held+0x5a/0x80
[ 467.297647] [<ffffffff83e318f0>] __do_page_fault+0x570/0x5e0
[ 467.298622] [<ffffffff811d3e2d>] ? rcu_try_advance_all_cbs+0x3d/0xb0
[ 467.300213] [<ffffffff811d3f1d>] ? rcu_eqs_exit_common+0x7d/0x260
[ 467.301241] [<ffffffff811d4e0e>] ? rcu_eqs_enter_common+0x33e/0x3b0
[ 467.302760] [<ffffffff811d419c>] ? rcu_eqs_exit+0x9c/0xb0
[ 467.303649] [<ffffffff83e31992>] do_page_fault+0x32/0x50
[ 467.304526] [<ffffffff83e30f60>] do_async_page_fault+0x30/0xc0
[ 467.306556] [<ffffffff83e2e228>] async_page_fault+0x28/0x30

So either the new commit is broken, or hrtimer is horribly broken?


Thanks,
Sasha


2013-04-19 00:10:37

by Liu, Chuansheng

[permalink] [raw]
Subject: RE: WARNING: at kernel/smp.c:385 smp_call_function_many

Hi Sasha,

> -----Original Message-----
> From: Sasha Levin [mailto:[email protected]]
> Sent: Friday, April 19, 2013 5:06 AM
> To: Liu, Chuansheng; Andrew Morton
> Cc: Ingo Molnar; Thomas Gleixner; Peter Zijlstra; Wu, Fengguang;
> [email protected]; Dave Jones; [email protected]
> Subject: WARNING: at kernel/smp.c:385 smp_call_function_many
>
> Hi guys,
>
> It seems that with commit "smp: Give WARN()ing when calling
> smp_call_function_many()/single() in serving irq" I'm getting a bunch of these:
This patch gives the warning even when the smp_call_function_many() is called in softirq context,
because it will bring the possible deadlock cases, the example is below:
CPUA CPUB
spin_lock(&spinlock)
Any irq coming, call the irq handler
irq_exit()
spin_lock_irq(&spinlock)
<== Blocking here due to
CPUB hold it
__do_softirq()
run_timer_softirq()
timer_cb()
call smp_call_function_many()
send IPI interrupt to CPUA
wait_csd()

Then both CPUA and CPUB will be deadlocked here.
>
> [ 467.250851] WARNING: at kernel/smp.c:385
> smp_call_function_many+0xda/0x400()
> [ 467.251990] Modules linked in:
> [ 467.252496] CPU: 1 PID: 12638 Comm: trinity-child1 Tainted: G W
> 3.9.0-rc7-next-20130418-sasha-00013-g333a5d2 #340
> [ 467.259691] 0000000000000009 ffff8800bb603d98 ffffffff83deed7a
> ffff8800bb603dd8
> [ 467.261157] ffffffff81112ecb ffff88009d229fd8 ffffffff81148430
> 0000000000000001
> [ 467.262478] ffffffff861922e0 0000000000000008 ffffffff861922e0
> ffff8800bb603de8
> [ 467.263758] Call Trace:
> [ 467.266447] <IRQ> [<ffffffff83deed7a>] dump_stack+0x19/0x1b
> [ 467.267437] [<ffffffff81112ecb>] warn_slowpath_common+0x6b/0xa0
> [ 467.268418] [<ffffffff81148430>] ? hrtimer_wakeup+0x30/0x30
> [ 467.269838] [<ffffffff81112f15>] warn_slowpath_null+0x15/0x20
> [ 467.270844] [<ffffffff81197b8a>] smp_call_function_many+0xda/0x400
> [ 467.272287] [<ffffffff8115c055>] ? sched_clock_local+0x25/0xa0
> [ 467.273259] [<ffffffff81148430>] ? hrtimer_wakeup+0x30/0x30
> [ 467.274590] [<ffffffff81148430>] ? hrtimer_wakeup+0x30/0x30
> [ 467.275519] [<ffffffff81198001>] smp_call_function+0x41/0x80
> [ 467.276801] [<ffffffff81148430>] ? hrtimer_wakeup+0x30/0x30
> [ 467.277756] [<ffffffff81198099>] on_each_cpu+0x39/0x110
> [ 467.278604] [<ffffffff81149197>] clock_was_set+0x17/0x20
> [ 467.280198] [<ffffffff81149bb2>] run_hrtimer_softirq+0x32/0x40
> [ 467.281530] [<ffffffff8111cb07>] __do_softirq+0x1d7/0x460
> [ 467.282434] [<ffffffff8111cf26>] irq_exit+0x76/0xd0
> [ 467.283243] [<ffffffff8109287a>] smp_apic_timer_interrupt+0x8a/0xa0
> [ 467.284707] [<ffffffff83e369f2>] apic_timer_interrupt+0x72/0x80
> [ 467.285674] <EOI> [<ffffffff81071ef5>] ? sched_clock+0x15/0x20
> [ 467.287132] [<ffffffff83e31dd6>] ? add_preempt_count+0xa6/0xc0
> [ 467.288089] [<ffffffff81a1db2c>] delay_tsc+0xac/0x120
> [ 467.289342] [<ffffffff8127cbd8>] ?
> __mem_cgroup_count_vm_event+0x178/0x1a0
> [ 467.290522] [<ffffffff81a1da19>] __const_udelay+0x29/0x30
> [ 467.291425] [<ffffffff8113ee14>] __rcu_read_unlock+0x44/0xb0
> [ 467.292829] [<ffffffff8127cbf5>]
> __mem_cgroup_count_vm_event+0x195/0x1a0
> [ 467.293994] [<ffffffff8127ca60>] ? mem_cgroup_from_task+0xd0/0xd0
> [ 467.295456] [<ffffffff81240e82>] handle_mm_fault+0x632/0x660
> [ 467.296692] [<ffffffff8118b40a>] ? __lock_is_held+0x5a/0x80
> [ 467.297647] [<ffffffff83e318f0>] __do_page_fault+0x570/0x5e0
> [ 467.298622] [<ffffffff811d3e2d>] ? rcu_try_advance_all_cbs+0x3d/0xb0
> [ 467.300213] [<ffffffff811d3f1d>] ? rcu_eqs_exit_common+0x7d/0x260
> [ 467.301241] [<ffffffff811d4e0e>] ? rcu_eqs_enter_common+0x33e/0x3b0
> [ 467.302760] [<ffffffff811d419c>] ? rcu_eqs_exit+0x9c/0xb0
> [ 467.303649] [<ffffffff83e31992>] do_page_fault+0x32/0x50
> [ 467.304526] [<ffffffff83e30f60>] do_async_page_fault+0x30/0xc0
> [ 467.306556] [<ffffffff83e2e228>] async_page_fault+0x28/0x30
>
> So either the new commit is broken, or hrtimer is horribly broken?
>
>
> Thanks,
> Sasha

2013-04-19 00:34:06

by Sasha Levin

[permalink] [raw]
Subject: Re: WARNING: at kernel/smp.c:385 smp_call_function_many

Hi Liu,

On 04/18/2013 08:10 PM, Liu, Chuansheng wrote:
>> It seems that with commit "smp: Give WARN()ing when calling
>> > smp_call_function_many()/single() in serving irq" I'm getting a bunch of these:
> This patch gives the warning even when the smp_call_function_many() is called in softirq context,
> because it will bring the possible deadlock cases, the example is below:
> CPUA CPUB
> spin_lock(&spinlock)
> Any irq coming, call the irq handler
> irq_exit()
> spin_lock_irq(&spinlock)
> <== Blocking here due to
> CPUB hold it
> __do_softirq()
> run_timer_softirq()
> timer_cb()
> call smp_call_function_many()
> send IPI interrupt to CPUA
> wait_csd()
>
> Then both CPUA and CPUB will be deadlocked here.

Right.

So commit f55a6faa3 ("hrtimer: Provide clock_was_set_delayed()") followed by one
of the commits that used clock_was_set_delayed() may trigger the deadlock you've
described. (cc John Stultz...)


Thanks,
Sasha

2013-05-16 18:25:00

by Sasha Levin

[permalink] [raw]
Subject: Re: WARNING: at kernel/smp.c:385 smp_call_function_many

On 04/18/2013 08:29 PM, Sasha Levin wrote:
> Hi Liu,
>
> On 04/18/2013 08:10 PM, Liu, Chuansheng wrote:
>>> It seems that with commit "smp: Give WARN()ing when calling
>>>> smp_call_function_many()/single() in serving irq" I'm getting a bunch of these:
>> This patch gives the warning even when the smp_call_function_many() is called in softirq context,
>> because it will bring the possible deadlock cases, the example is below:
>> CPUA CPUB
>> spin_lock(&spinlock)
>> Any irq coming, call the irq handler
>> irq_exit()
>> spin_lock_irq(&spinlock)
>> <== Blocking here due to
>> CPUB hold it
>> __do_softirq()
>> run_timer_softirq()
>> timer_cb()
>> call smp_call_function_many()
>> send IPI interrupt to CPUA
>> wait_csd()
>>
>> Then both CPUA and CPUB will be deadlocked here.
>
> Right.
>
> So commit f55a6faa3 ("hrtimer: Provide clock_was_set_delayed()") followed by one
> of the commits that used clock_was_set_delayed() may trigger the deadlock you've
> described. (cc John Stultz...)

Ping?


Thanks,
Sasha

2013-05-28 10:26:43

by Thomas Gleixner

[permalink] [raw]
Subject: Re: WARNING: at kernel/smp.c:385 smp_call_function_many

On Thu, 16 May 2013, Sasha Levin wrote:
> On 04/18/2013 08:29 PM, Sasha Levin wrote:
> > Hi Liu,
> >
> > On 04/18/2013 08:10 PM, Liu, Chuansheng wrote:
> >>> It seems that with commit "smp: Give WARN()ing when calling
> >>>> smp_call_function_many()/single() in serving irq" I'm getting a bunch of these:
> >> This patch gives the warning even when the smp_call_function_many() is called in softirq context,
> >> because it will bring the possible deadlock cases, the example is below:
> >> CPUA CPUB
> >> spin_lock(&spinlock)
> >> Any irq coming, call the irq handler
> >> irq_exit()
> >> spin_lock_irq(&spinlock)
> >> <== Blocking here due to
> >> CPUB hold it
> >> __do_softirq()
> >> run_timer_softirq()
> >> timer_cb()
> >> call smp_call_function_many()
> >> send IPI interrupt to CPUA
> >> wait_csd()
> >>
> >> Then both CPUA and CPUB will be deadlocked here.
> >
> > Right.

Well, that's only true if LOCKDEP is enabled. The !LOCKDEP case
reenables interrupts if the lock is contended.

> > So commit f55a6faa3 ("hrtimer: Provide clock_was_set_delayed()") followed by one
> > of the commits that used clock_was_set_delayed() may trigger the deadlock you've
> > described. (cc John Stultz...)

I'll have a look. It's non trivial to fix :(

Thanks,

tglx