2012-11-09 04:23:41

by Fengguang Wu

[permalink] [raw]
Subject: WARNING: at kernel/rcutree.c:1562 rcu_do_batch()

Paul,

I got the below warning in stable kernel 3.6.3. linux-next does
not have this issue. Bisect shows that the first bad commit is

commit b1420f1c8bfc30ecf6380a31d0f686884834b599
Author: Paul E. McKenney <[email protected]>
Date: Thu Mar 1 13:18:08 2012 -0800

rcu: Make rcu_barrier() less disruptive


[ 92.252733] do_IRQ: 1.59 No irq handler for vector (irq -1)
[ 92.253257] ------------[ cut here ]------------
[ 92.253675] WARNING: at /c/kernel-tests/src/stable/kernel/rcutree.c:1562 rcu_do_batch+0x17e/0x63b()
[ 92.254474] Hardware name: Bochs
[ 92.254766] Modules linked in:
[ 92.256689] Pid: 9, comm: migration/1 Not tainted 3.6.3 #1306
[ 92.256689] Call Trace:
[ 92.256689] <IRQ> [<ffffffff81033dbd>] warn_slowpath_common+0x83/0x9c
[ 92.256689] [<ffffffff81033df0>] warn_slowpath_null+0x1a/0x1c
[ 92.256689] [<ffffffff810a70fd>] rcu_do_batch+0x17e/0x63b
[ 92.256689] [<ffffffff810a6705>] ? rcu_report_qs_rnp+0x28b/0x2d5
[ 92.256689] [<ffffffff810a769d>] ? rcu_process_callbacks+0xe3/0x236
[ 92.256689] [<ffffffff810a772c>] rcu_process_callbacks+0x172/0x236
[ 92.256689] [<ffffffff8103b6cc>] __do_softirq+0xf6/0x231
[ 92.256689] [<ffffffff8107fe2c>] ? tick_program_event+0x24/0x26
[ 92.256689] [<ffffffff817d23bc>] call_softirq+0x1c/0x30
[ 92.256689] [<ffffffff81003f7c>] do_softirq+0x4a/0xa6
[ 92.256689] [<ffffffff8103ba98>] irq_exit+0x51/0xbc
[ 92.256689] [<ffffffff817d2a2f>] smp_apic_timer_interrupt+0x8b/0x99
[ 92.256689] [<ffffffff817d1c2f>] apic_timer_interrupt+0x6f/0x80
[ 92.256689] <EOI> [<ffffffff81067440>] ? local_clock+0x1d/0x5a
[ 92.256689] [<ffffffff8109ea37>] ? stop_machine_cpu_stop+0x104/0x119
[ 92.256689] [<ffffffff8109e6d5>] cpu_stopper_thread+0xdd/0x17d
[ 92.256689] [<ffffffff8109e933>] ? queue_stop_cpus_work+0x130/0x130
[ 92.256689] [<ffffffff817ca044>] ? _raw_spin_unlock_irqrestore+0x47/0x65
[ 92.256689] [<ffffffff81086613>] ? trace_hardirqs_on_caller+0x125/0x181
[ 92.256689] [<ffffffff8108667c>] ? trace_hardirqs_on+0xd/0xf
[ 92.256689] [<ffffffff8109e5f8>] ? cpu_stop_signal_done+0x2c/0x2c
[ 92.256689] [<ffffffff81055e74>] kthread+0x9a/0xa2
[ 92.256689] [<ffffffff817d22c4>] kernel_thread_helper+0x4/0x10
[ 92.256689] [<ffffffff817ca4b0>] ? retint_restore_args+0x13/0x13
[ 92.256689] [<ffffffff81055dda>] ? __init_kthread_worker+0x5a/0x5a
[ 92.317029] [<ffffffff817d22c0>] ? gs_change+0x13/0x13

Thanks,
Fengguang


Attachments:
(No filename) (2.41 kB)
dmesg-kvm-fat-3599-2012-10-22-01-01-57-3.6.3-1306 (52.82 kB)
config-3.6.3 (75.15 kB)
dmesg-kvm_bisect3-inn-12919-2012-11-03-16-40-41-3.4.0-rc4-bisect3-00033-gdc36be4-13 (54.76 kB)
dmesg-kvm_bisect3-inn-8580-2012-11-08-10-12-10-3.4.0-rc4-bisect3-00008-gb1420f1-17 (54.74 kB)
dmesg-kvm_bisect3-inn-12761-2012-11-08-10-10-28-3.4.0-rc4-bisect3-00008-gb1420f1-17 (54.94 kB)
Download all attachments

2012-11-09 05:28:46

by Michael wang

[permalink] [raw]
Subject: Re: WARNING: at kernel/rcutree.c:1562 rcu_do_batch()

Hi, Fengguang

On 11/09/2012 12:23 PM, Fengguang Wu wrote:
> Paul,
>
> I got the below warning in stable kernel 3.6.3. linux-next does
> not have this issue. Bisect shows that the first bad commit is

Please allow me to ask few questions:
1. is it 100% sure that linux-next don't show this issue on same hardware?
2. is it 100% sure that when removed commit b1420f1, both WARN in
rcu_do_batch() and __call_rcu() disappeared?

The reason I asked the questions is that this issue looks really similar
to the one we faced previously:
The interrupt come in after cpu has been offline.

I suppose this is caused by apic issue and no matter with the rcu
before, so I really want to figure out whether it is very related with
commit b1420f1?

Regards,
Michael Wang

>
> commit b1420f1c8bfc30ecf6380a31d0f686884834b599
> Author: Paul E. McKenney <[email protected]>
> Date: Thu Mar 1 13:18:08 2012 -0800
>
> rcu: Make rcu_barrier() less disruptive
>
>
> [ 92.252733] do_IRQ: 1.59 No irq handler for vector (irq -1)
> [ 92.253257] ------------[ cut here ]------------
> [ 92.253675] WARNING: at /c/kernel-tests/src/stable/kernel/rcutree.c:1562 rcu_do_batch+0x17e/0x63b()
> [ 92.254474] Hardware name: Bochs
> [ 92.254766] Modules linked in:
> [ 92.256689] Pid: 9, comm: migration/1 Not tainted 3.6.3 #1306
> [ 92.256689] Call Trace:
> [ 92.256689] <IRQ> [<ffffffff81033dbd>] warn_slowpath_common+0x83/0x9c
> [ 92.256689] [<ffffffff81033df0>] warn_slowpath_null+0x1a/0x1c
> [ 92.256689] [<ffffffff810a70fd>] rcu_do_batch+0x17e/0x63b
> [ 92.256689] [<ffffffff810a6705>] ? rcu_report_qs_rnp+0x28b/0x2d5
> [ 92.256689] [<ffffffff810a769d>] ? rcu_process_callbacks+0xe3/0x236
> [ 92.256689] [<ffffffff810a772c>] rcu_process_callbacks+0x172/0x236
> [ 92.256689] [<ffffffff8103b6cc>] __do_softirq+0xf6/0x231
> [ 92.256689] [<ffffffff8107fe2c>] ? tick_program_event+0x24/0x26
> [ 92.256689] [<ffffffff817d23bc>] call_softirq+0x1c/0x30
> [ 92.256689] [<ffffffff81003f7c>] do_softirq+0x4a/0xa6
> [ 92.256689] [<ffffffff8103ba98>] irq_exit+0x51/0xbc
> [ 92.256689] [<ffffffff817d2a2f>] smp_apic_timer_interrupt+0x8b/0x99
> [ 92.256689] [<ffffffff817d1c2f>] apic_timer_interrupt+0x6f/0x80
> [ 92.256689] <EOI> [<ffffffff81067440>] ? local_clock+0x1d/0x5a
> [ 92.256689] [<ffffffff8109ea37>] ? stop_machine_cpu_stop+0x104/0x119
> [ 92.256689] [<ffffffff8109e6d5>] cpu_stopper_thread+0xdd/0x17d
> [ 92.256689] [<ffffffff8109e933>] ? queue_stop_cpus_work+0x130/0x130
> [ 92.256689] [<ffffffff817ca044>] ? _raw_spin_unlock_irqrestore+0x47/0x65
> [ 92.256689] [<ffffffff81086613>] ? trace_hardirqs_on_caller+0x125/0x181
> [ 92.256689] [<ffffffff8108667c>] ? trace_hardirqs_on+0xd/0xf
> [ 92.256689] [<ffffffff8109e5f8>] ? cpu_stop_signal_done+0x2c/0x2c
> [ 92.256689] [<ffffffff81055e74>] kthread+0x9a/0xa2
> [ 92.256689] [<ffffffff817d22c4>] kernel_thread_helper+0x4/0x10
> [ 92.256689] [<ffffffff817ca4b0>] ? retint_restore_args+0x13/0x13
> [ 92.256689] [<ffffffff81055dda>] ? __init_kthread_worker+0x5a/0x5a
> [ 92.317029] [<ffffffff817d22c0>] ? gs_change+0x13/0x13
>
> Thanks,
> Fengguang
>

2012-11-10 00:32:12

by Paul E. McKenney

[permalink] [raw]
Subject: Re: WARNING: at kernel/rcutree.c:1562 rcu_do_batch()

On Fri, Nov 09, 2012 at 12:23:30PM +0800, Fengguang Wu wrote:
> Paul,
>
> I got the below warning in stable kernel 3.6.3. linux-next does
> not have this issue. Bisect shows that the first bad commit is
>
> commit b1420f1c8bfc30ecf6380a31d0f686884834b599
> Author: Paul E. McKenney <[email protected]>
> Date: Thu Mar 1 13:18:08 2012 -0800
>
> rcu: Make rcu_barrier() less disruptive
>
>
> [ 92.252733] do_IRQ: 1.59 No irq handler for vector (irq -1)
> [ 92.253257] ------------[ cut here ]------------
> [ 92.253675] WARNING: at /c/kernel-tests/src/stable/kernel/rcutree.c:1562 rcu_do_batch+0x17e/0x63b()
> [ 92.254474] Hardware name: Bochs
> [ 92.254766] Modules linked in:
> [ 92.256689] Pid: 9, comm: migration/1 Not tainted 3.6.3 #1306
> [ 92.256689] Call Trace:
> [ 92.256689] <IRQ> [<ffffffff81033dbd>] warn_slowpath_common+0x83/0x9c
> [ 92.256689] [<ffffffff81033df0>] warn_slowpath_null+0x1a/0x1c
> [ 92.256689] [<ffffffff810a70fd>] rcu_do_batch+0x17e/0x63b
> [ 92.256689] [<ffffffff810a6705>] ? rcu_report_qs_rnp+0x28b/0x2d5
> [ 92.256689] [<ffffffff810a769d>] ? rcu_process_callbacks+0xe3/0x236
> [ 92.256689] [<ffffffff810a772c>] rcu_process_callbacks+0x172/0x236
> [ 92.256689] [<ffffffff8103b6cc>] __do_softirq+0xf6/0x231
> [ 92.256689] [<ffffffff8107fe2c>] ? tick_program_event+0x24/0x26
> [ 92.256689] [<ffffffff817d23bc>] call_softirq+0x1c/0x30
> [ 92.256689] [<ffffffff81003f7c>] do_softirq+0x4a/0xa6
> [ 92.256689] [<ffffffff8103ba98>] irq_exit+0x51/0xbc
> [ 92.256689] [<ffffffff817d2a2f>] smp_apic_timer_interrupt+0x8b/0x99
> [ 92.256689] [<ffffffff817d1c2f>] apic_timer_interrupt+0x6f/0x80
> [ 92.256689] <EOI> [<ffffffff81067440>] ? local_clock+0x1d/0x5a
> [ 92.256689] [<ffffffff8109ea37>] ? stop_machine_cpu_stop+0x104/0x119
> [ 92.256689] [<ffffffff8109e6d5>] cpu_stopper_thread+0xdd/0x17d
> [ 92.256689] [<ffffffff8109e933>] ? queue_stop_cpus_work+0x130/0x130
> [ 92.256689] [<ffffffff817ca044>] ? _raw_spin_unlock_irqrestore+0x47/0x65
> [ 92.256689] [<ffffffff81086613>] ? trace_hardirqs_on_caller+0x125/0x181
> [ 92.256689] [<ffffffff8108667c>] ? trace_hardirqs_on+0xd/0xf
> [ 92.256689] [<ffffffff8109e5f8>] ? cpu_stop_signal_done+0x2c/0x2c
> [ 92.256689] [<ffffffff81055e74>] kthread+0x9a/0xa2
> [ 92.256689] [<ffffffff817d22c4>] kernel_thread_helper+0x4/0x10
> [ 92.256689] [<ffffffff817ca4b0>] ? retint_restore_args+0x13/0x13
> [ 92.256689] [<ffffffff81055dda>] ? __init_kthread_worker+0x5a/0x5a
> [ 92.317029] [<ffffffff817d22c0>] ? gs_change+0x13/0x13
>
> Thanks,
> Fengguang

Hello, Fengguang,

You need commit #bfa00b4c, which prevents offline CPUs from getting
into rcu_do_batch.

Thanx, Paul