2009-01-02 11:18:33

by Eric Sesterhenn

[permalink] [raw]
Subject: [BUG] NULL pointer deref with rcutorture

Hi,

modprobing rcutorture and removing it again gives me the following oops:

[ 122.873859] rcu-torture:--- Start of test: nreaders=2 nfakewriters=4 stat_interval=0 verbose=0 test_no_idle_hz=0 shuffle_interval=3 stutter=5 irqreader=1
[ 162.987060] BUG: unable to handle kernel NULL pointer dereference at 00000000
[ 162.987194] IP: [<00000000>] 0x0
[ 162.987268] *pde = 00000000
[ 162.987339] Oops: 0000 [#1] PREEMPT DEBUG_PAGEALLOC
[ 162.987499] last sysfs file: /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/resource
[ 162.987534] Modules linked in: [last unloaded: rcutorture]
[ 162.987667]
[ 162.987699] Pid: 5091, comm: udevd Tainted: G W (2.6.28-04980-gb58602a-dirty #154) System Name
[ 162.987732] EIP: 0060:[<00000000>] EFLAGS: 00010246 CPU: 0
[ 162.987761] EIP is at 0x0
[ 162.987849] EAX: d0afcbf0 EBX: 00000000 ECX: c0160f5a EDX: 00000006
[ 162.987880] ESI: d0afcbf0 EDI: 0000001c EBP: c0b01fe0 ESP: c0b01fd4
[ 162.987910] DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
[ 162.987941] Process udevd (pid: 5091, ti=c0b01000 task=c8cc4100 task.ti=c8d20000)
[ 162.988027] Stack:
[ 162.988027] c0160f61 00000200 00000001 c0b01ff8 c012a723 0000000a c8d20e4c 00000046
[ 162.988027] c012a6b4 c8d20e5c c01042ce
[ 162.988027] Call Trace:
[ 162.988027] [<c0160f61>] ? rcu_process_callbacks+0x65/0x79
[ 162.988027] [<c012a723>] ? __do_softirq+0x6f/0xf6
[ 162.988027] [<c012a6b4>] ? __do_softirq+0x0/0xf6
[ 162.988027] <IRQ> <0> [<c012a631>] ? irq_exit+0x40/0x7c
[ 162.988027] [<c0110b91>] ? smp_apic_timer_interrupt+0x68/0x73
[ 162.988027] [<c0103521>] ? apic_timer_interrupt+0x2d/0x34
[ 162.988027] [<c017af71>] ? unmap_vmas+0x308/0x47f
[ 162.988027] [<c017afa2>] ? unmap_vmas+0x339/0x47f
[ 162.988027] [<c0143db5>] ? trace_hardirqs_on_caller+0x17/0x15a
[ 162.988027] [<c017e326>] ? exit_mmap+0x83/0x104
[ 162.988027] [<c01244b8>] ? mmput+0x39/0x89
[ 162.988027] [<c01278de>] ? exit_mm+0xc3/0xcb
[ 162.988027] [<c0128b99>] ? do_exit+0x199/0x6d5
[ 162.988027] [<c04e56b0>] ? trace_hardirqs_on_thunk+0xc/0x10
[ 162.988027] [<c0129137>] ? do_group_exit+0x62/0x88
[ 162.988027] [<c0129171>] ? sys_exit_group+0x14/0x16
[ 162.988027] [<c0102ea1>] ? sysenter_do_call+0x12/0x31
[ 162.988027] Code: Bad EIP value.
[ 162.988027] EIP: [<00000000>] 0x0 SS:ESP 0068:c0b01fd4
[ 162.990524] ---[ end trace 4eaa2a86a8e2da22 ]---
[ 162.990555] Kernel panic - not syncing: Fatal exception in interrupt


Kernel is current -git with a unrelated
networking patch (http://marc.info/?l=linux-netdev&m=123088641023335&w=2)

root@whiterabbit:/usr/src/linux# grep RCU .config
# CONFIG_CLASSIC_RCU is not set
# CONFIG_TREE_RCU is not set
CONFIG_PREEMPT_RCU=y
CONFIG_RCU_TRACE=y
# CONFIG_TREE_RCU_TRACE is not set
CONFIG_PREEMPT_RCU_TRACE=y
CONFIG_RCU_TORTURE_TEST=m

Greetings, Eric


2009-01-02 17:59:15

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [BUG] NULL pointer deref with rcutorture

On Fri, Jan 02, 2009 at 12:18:19PM +0100, Eric Sesterhenn wrote:
> Hi,
>
> modprobing rcutorture and removing it again gives me the following oops:

This looks similar to something that Kamalesh ran into a few weeks ago.

Kamalesh, did the fixes for that make it to 2.6.28?

Thanx, Paul

> [ 122.873859] rcu-torture:--- Start of test: nreaders=2 nfakewriters=4 stat_interval=0 verbose=0 test_no_idle_hz=0 shuffle_interval=3 stutter=5 irqreader=1
> [ 162.987060] BUG: unable to handle kernel NULL pointer dereference at 00000000
> [ 162.987194] IP: [<00000000>] 0x0
> [ 162.987268] *pde = 00000000
> [ 162.987339] Oops: 0000 [#1] PREEMPT DEBUG_PAGEALLOC
> [ 162.987499] last sysfs file: /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/resource
> [ 162.987534] Modules linked in: [last unloaded: rcutorture]
> [ 162.987667]
> [ 162.987699] Pid: 5091, comm: udevd Tainted: G W (2.6.28-04980-gb58602a-dirty #154) System Name
> [ 162.987732] EIP: 0060:[<00000000>] EFLAGS: 00010246 CPU: 0
> [ 162.987761] EIP is at 0x0
> [ 162.987849] EAX: d0afcbf0 EBX: 00000000 ECX: c0160f5a EDX: 00000006
> [ 162.987880] ESI: d0afcbf0 EDI: 0000001c EBP: c0b01fe0 ESP: c0b01fd4
> [ 162.987910] DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
> [ 162.987941] Process udevd (pid: 5091, ti=c0b01000 task=c8cc4100 task.ti=c8d20000)
> [ 162.988027] Stack:
> [ 162.988027] c0160f61 00000200 00000001 c0b01ff8 c012a723 0000000a c8d20e4c 00000046
> [ 162.988027] c012a6b4 c8d20e5c c01042ce
> [ 162.988027] Call Trace:
> [ 162.988027] [<c0160f61>] ? rcu_process_callbacks+0x65/0x79
> [ 162.988027] [<c012a723>] ? __do_softirq+0x6f/0xf6
> [ 162.988027] [<c012a6b4>] ? __do_softirq+0x0/0xf6
> [ 162.988027] <IRQ> <0> [<c012a631>] ? irq_exit+0x40/0x7c
> [ 162.988027] [<c0110b91>] ? smp_apic_timer_interrupt+0x68/0x73
> [ 162.988027] [<c0103521>] ? apic_timer_interrupt+0x2d/0x34
> [ 162.988027] [<c017af71>] ? unmap_vmas+0x308/0x47f
> [ 162.988027] [<c017afa2>] ? unmap_vmas+0x339/0x47f
> [ 162.988027] [<c0143db5>] ? trace_hardirqs_on_caller+0x17/0x15a
> [ 162.988027] [<c017e326>] ? exit_mmap+0x83/0x104
> [ 162.988027] [<c01244b8>] ? mmput+0x39/0x89
> [ 162.988027] [<c01278de>] ? exit_mm+0xc3/0xcb
> [ 162.988027] [<c0128b99>] ? do_exit+0x199/0x6d5
> [ 162.988027] [<c04e56b0>] ? trace_hardirqs_on_thunk+0xc/0x10
> [ 162.988027] [<c0129137>] ? do_group_exit+0x62/0x88
> [ 162.988027] [<c0129171>] ? sys_exit_group+0x14/0x16
> [ 162.988027] [<c0102ea1>] ? sysenter_do_call+0x12/0x31
> [ 162.988027] Code: Bad EIP value.
> [ 162.988027] EIP: [<00000000>] 0x0 SS:ESP 0068:c0b01fd4
> [ 162.990524] ---[ end trace 4eaa2a86a8e2da22 ]---
> [ 162.990555] Kernel panic - not syncing: Fatal exception in interrupt
>
>
> Kernel is current -git with a unrelated
> networking patch (http://marc.info/?l=linux-netdev&m=123088641023335&w=2)
>
> root@whiterabbit:/usr/src/linux# grep RCU .config
> # CONFIG_CLASSIC_RCU is not set
> # CONFIG_TREE_RCU is not set
> CONFIG_PREEMPT_RCU=y
> CONFIG_RCU_TRACE=y
> # CONFIG_TREE_RCU_TRACE is not set
> CONFIG_PREEMPT_RCU_TRACE=y
> CONFIG_RCU_TORTURE_TEST=m
>
> Greetings, Eric
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2009-01-02 18:53:21

by Kamalesh Babulal

[permalink] [raw]
Subject: Re: [BUG] NULL pointer deref with rcutorture

* Paul E. McKenney <[email protected]> [2009-01-02 09:58:53]:

> On Fri, Jan 02, 2009 at 12:18:19PM +0100, Eric Sesterhenn wrote:
> > Hi,
> >
> > modprobing rcutorture and removing it again gives me the following oops:
>
> This looks similar to something that Kamalesh ran into a few weeks ago.
>
> Kamalesh, did the fixes for that make it to 2.6.28?
>
> Thanx, Paul
>

Hi Paul,

The fix was available in linux-next from Dec 23 2008 (commit 392ddc32982a5c661dd90dd49a3cb37f1c68b782
'bio: add support for inlining a number of bio_vecs inside the bio') and it has be pushed for merger in 2.6.29.

> > [ 122.873859] rcu-torture:--- Start of test: nreaders=2 nfakewriters=4 stat_interval=0 verbose=0 test_no_idle_hz=0 shuffle_interval=3 stutter=5 irqreader=1
> > [ 162.987060] BUG: unable to handle kernel NULL pointer dereference at 00000000
> > [ 162.987194] IP: [<00000000>] 0x0
> > [ 162.987268] *pde = 00000000
> > [ 162.987339] Oops: 0000 [#1] PREEMPT DEBUG_PAGEALLOC
> > [ 162.987499] last sysfs file: /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/resource
> > [ 162.987534] Modules linked in: [last unloaded: rcutorture]
> > [ 162.987667]
> > [ 162.987699] Pid: 5091, comm: udevd Tainted: G W (2.6.28-04980-gb58602a-dirty #154) System Name
> > [ 162.987732] EIP: 0060:[<00000000>] EFLAGS: 00010246 CPU: 0
> > [ 162.987761] EIP is at 0x0
> > [ 162.987849] EAX: d0afcbf0 EBX: 00000000 ECX: c0160f5a EDX: 00000006
> > [ 162.987880] ESI: d0afcbf0 EDI: 0000001c EBP: c0b01fe0 ESP: c0b01fd4
> > [ 162.987910] DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
> > [ 162.987941] Process udevd (pid: 5091, ti=c0b01000 task=c8cc4100 task.ti=c8d20000)
> > [ 162.988027] Stack:
> > [ 162.988027] c0160f61 00000200 00000001 c0b01ff8 c012a723 0000000a c8d20e4c 00000046
> > [ 162.988027] c012a6b4 c8d20e5c c01042ce
> > [ 162.988027] Call Trace:
> > [ 162.988027] [<c0160f61>] ? rcu_process_callbacks+0x65/0x79
> > [ 162.988027] [<c012a723>] ? __do_softirq+0x6f/0xf6
> > [ 162.988027] [<c012a6b4>] ? __do_softirq+0x0/0xf6
> > [ 162.988027] <IRQ> <0> [<c012a631>] ? irq_exit+0x40/0x7c
> > [ 162.988027] [<c0110b91>] ? smp_apic_timer_interrupt+0x68/0x73
> > [ 162.988027] [<c0103521>] ? apic_timer_interrupt+0x2d/0x34
> > [ 162.988027] [<c017af71>] ? unmap_vmas+0x308/0x47f
> > [ 162.988027] [<c017afa2>] ? unmap_vmas+0x339/0x47f
> > [ 162.988027] [<c0143db5>] ? trace_hardirqs_on_caller+0x17/0x15a
> > [ 162.988027] [<c017e326>] ? exit_mmap+0x83/0x104
> > [ 162.988027] [<c01244b8>] ? mmput+0x39/0x89
> > [ 162.988027] [<c01278de>] ? exit_mm+0xc3/0xcb
> > [ 162.988027] [<c0128b99>] ? do_exit+0x199/0x6d5
> > [ 162.988027] [<c04e56b0>] ? trace_hardirqs_on_thunk+0xc/0x10
> > [ 162.988027] [<c0129137>] ? do_group_exit+0x62/0x88
> > [ 162.988027] [<c0129171>] ? sys_exit_group+0x14/0x16
> > [ 162.988027] [<c0102ea1>] ? sysenter_do_call+0x12/0x31
> > [ 162.988027] Code: Bad EIP value.
> > [ 162.988027] EIP: [<00000000>] 0x0 SS:ESP 0068:c0b01fd4
> > [ 162.990524] ---[ end trace 4eaa2a86a8e2da22 ]---
> > [ 162.990555] Kernel panic - not syncing: Fatal exception in interrupt
> >
> >
> > Kernel is current -git with a unrelated
> > networking patch (http://marc.info/?l=linux-netdev&m=123088641023335&w=2)
> >
> > root@whiterabbit:/usr/src/linux# grep RCU .config
> > # CONFIG_CLASSIC_RCU is not set
> > # CONFIG_TREE_RCU is not set
> > CONFIG_PREEMPT_RCU=y
> > CONFIG_RCU_TRACE=y
> > # CONFIG_TREE_RCU_TRACE is not set
> > CONFIG_PREEMPT_RCU_TRACE=y
> > CONFIG_RCU_TORTURE_TEST=m
> >
> > Greetings, Eric
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at http://www.tux.org/lkml/

--
Thanks & Regards,
Kamalesh Babulal,
Linux Technology Center,
IBM, ISTL.

2009-01-02 19:53:39

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [BUG] NULL pointer deref with rcutorture

On Sat, Jan 03, 2009 at 12:23:03AM +0530, Kamalesh Babulal wrote:
> * Paul E. McKenney <[email protected]> [2009-01-02 09:58:53]:
>
> > On Fri, Jan 02, 2009 at 12:18:19PM +0100, Eric Sesterhenn wrote:
> > > Hi,
> > >
> > > modprobing rcutorture and removing it again gives me the following oops:
> >
> > This looks similar to something that Kamalesh ran into a few weeks ago.
> >
> > Kamalesh, did the fixes for that make it to 2.6.28?
> >
> > Thanx, Paul
> >
>
> Hi Paul,
>
> The fix was available in linux-next from Dec 23 2008 (commit 392ddc32982a5c661dd90dd49a3cb37f1c68b782
> 'bio: add support for inlining a number of bio_vecs inside the bio') and it has be pushed for merger in 2.6.29.

Thank you, Kamalesh!

Eric, could you please try applying the above commit and see if the
problem reproduces?

Thanx, Paul

> > > [ 122.873859] rcu-torture:--- Start of test: nreaders=2 nfakewriters=4 stat_interval=0 verbose=0 test_no_idle_hz=0 shuffle_interval=3 stutter=5 irqreader=1
> > > [ 162.987060] BUG: unable to handle kernel NULL pointer dereference at 00000000
> > > [ 162.987194] IP: [<00000000>] 0x0
> > > [ 162.987268] *pde = 00000000
> > > [ 162.987339] Oops: 0000 [#1] PREEMPT DEBUG_PAGEALLOC
> > > [ 162.987499] last sysfs file: /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/resource
> > > [ 162.987534] Modules linked in: [last unloaded: rcutorture]
> > > [ 162.987667]
> > > [ 162.987699] Pid: 5091, comm: udevd Tainted: G W (2.6.28-04980-gb58602a-dirty #154) System Name
> > > [ 162.987732] EIP: 0060:[<00000000>] EFLAGS: 00010246 CPU: 0
> > > [ 162.987761] EIP is at 0x0
> > > [ 162.987849] EAX: d0afcbf0 EBX: 00000000 ECX: c0160f5a EDX: 00000006
> > > [ 162.987880] ESI: d0afcbf0 EDI: 0000001c EBP: c0b01fe0 ESP: c0b01fd4
> > > [ 162.987910] DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
> > > [ 162.987941] Process udevd (pid: 5091, ti=c0b01000 task=c8cc4100 task.ti=c8d20000)
> > > [ 162.988027] Stack:
> > > [ 162.988027] c0160f61 00000200 00000001 c0b01ff8 c012a723 0000000a c8d20e4c 00000046
> > > [ 162.988027] c012a6b4 c8d20e5c c01042ce
> > > [ 162.988027] Call Trace:
> > > [ 162.988027] [<c0160f61>] ? rcu_process_callbacks+0x65/0x79
> > > [ 162.988027] [<c012a723>] ? __do_softirq+0x6f/0xf6
> > > [ 162.988027] [<c012a6b4>] ? __do_softirq+0x0/0xf6
> > > [ 162.988027] <IRQ> <0> [<c012a631>] ? irq_exit+0x40/0x7c
> > > [ 162.988027] [<c0110b91>] ? smp_apic_timer_interrupt+0x68/0x73
> > > [ 162.988027] [<c0103521>] ? apic_timer_interrupt+0x2d/0x34
> > > [ 162.988027] [<c017af71>] ? unmap_vmas+0x308/0x47f
> > > [ 162.988027] [<c017afa2>] ? unmap_vmas+0x339/0x47f
> > > [ 162.988027] [<c0143db5>] ? trace_hardirqs_on_caller+0x17/0x15a
> > > [ 162.988027] [<c017e326>] ? exit_mmap+0x83/0x104
> > > [ 162.988027] [<c01244b8>] ? mmput+0x39/0x89
> > > [ 162.988027] [<c01278de>] ? exit_mm+0xc3/0xcb
> > > [ 162.988027] [<c0128b99>] ? do_exit+0x199/0x6d5
> > > [ 162.988027] [<c04e56b0>] ? trace_hardirqs_on_thunk+0xc/0x10
> > > [ 162.988027] [<c0129137>] ? do_group_exit+0x62/0x88
> > > [ 162.988027] [<c0129171>] ? sys_exit_group+0x14/0x16
> > > [ 162.988027] [<c0102ea1>] ? sysenter_do_call+0x12/0x31
> > > [ 162.988027] Code: Bad EIP value.
> > > [ 162.988027] EIP: [<00000000>] 0x0 SS:ESP 0068:c0b01fd4
> > > [ 162.990524] ---[ end trace 4eaa2a86a8e2da22 ]---
> > > [ 162.990555] Kernel panic - not syncing: Fatal exception in interrupt
> > >
> > >
> > > Kernel is current -git with a unrelated
> > > networking patch (http://marc.info/?l=linux-netdev&m=123088641023335&w=2)
> > >
> > > root@whiterabbit:/usr/src/linux# grep RCU .config
> > > # CONFIG_CLASSIC_RCU is not set
> > > # CONFIG_TREE_RCU is not set
> > > CONFIG_PREEMPT_RCU=y
> > > CONFIG_RCU_TRACE=y
> > > # CONFIG_TREE_RCU_TRACE is not set
> > > CONFIG_PREEMPT_RCU_TRACE=y
> > > CONFIG_RCU_TORTURE_TEST=m
> > >
> > > Greetings, Eric
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > > the body of a message to [email protected]
> > > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > > Please read the FAQ at http://www.tux.org/lkml/
>
> --
> Thanks & Regards,
> Kamalesh Babulal,
> Linux Technology Center,
> IBM, ISTL.

2009-01-02 23:12:54

by Eric Sesterhenn

[permalink] [raw]
Subject: Re: [BUG] NULL pointer deref with rcutorture

* Paul E. McKenney ([email protected]) wrote:
> On Sat, Jan 03, 2009 at 12:23:03AM +0530, Kamalesh Babulal wrote:
> > * Paul E. McKenney <[email protected]> [2009-01-02 09:58:53]:
> >
> > > On Fri, Jan 02, 2009 at 12:18:19PM +0100, Eric Sesterhenn wrote:
> > > > Hi,
> > > >
> > > > modprobing rcutorture and removing it again gives me the following oops:
> > >
> > > This looks similar to something that Kamalesh ran into a few weeks ago.
> > >
> > > Kamalesh, did the fixes for that make it to 2.6.28?
> > >
> > > Thanx, Paul
> > >
> >
> > Hi Paul,
> >
> > The fix was available in linux-next from Dec 23 2008 (commit 392ddc32982a5c661dd90dd49a3cb37f1c68b782
> > 'bio: add support for inlining a number of bio_vecs inside the bio') and it has be pushed for merger in 2.6.29.
>
> Thank you, Kamalesh!
>
> Eric, could you please try applying the above commit and see if the
> problem reproduces?

I tried to apply the patch from
http://git.kernel.dk/?p=linux-2.6-block.git;a=commitdiff_plain;h=392ddc32982a5c661dd90dd49a3cb37f1c68b782;hp=bb799ca0202a360fa74d5f17039b9100caebdde7
but get a message that it is already applied. Seems i already got that
one.

Greetings, Eric

2009-01-03 01:58:00

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [BUG] NULL pointer deref with rcutorture

On Sat, Jan 03, 2009 at 12:12:39AM +0100, Eric Sesterhenn wrote:
> * Paul E. McKenney ([email protected]) wrote:
> > On Sat, Jan 03, 2009 at 12:23:03AM +0530, Kamalesh Babulal wrote:
> > > * Paul E. McKenney <[email protected]> [2009-01-02 09:58:53]:
> > >
> > > > On Fri, Jan 02, 2009 at 12:18:19PM +0100, Eric Sesterhenn wrote:
> > > > > Hi,
> > > > >
> > > > > modprobing rcutorture and removing it again gives me the following oops:
> > > >
> > > > This looks similar to something that Kamalesh ran into a few weeks ago.
> > > >
> > > > Kamalesh, did the fixes for that make it to 2.6.28?
> > > >
> > > > Thanx, Paul
> > > >
> > >
> > > Hi Paul,
> > >
> > > The fix was available in linux-next from Dec 23 2008 (commit 392ddc32982a5c661dd90dd49a3cb37f1c68b782
> > > 'bio: add support for inlining a number of bio_vecs inside the bio') and it has be pushed for merger in 2.6.29.
> >
> > Thank you, Kamalesh!
> >
> > Eric, could you please try applying the above commit and see if the
> > problem reproduces?
>
> I tried to apply the patch from
> http://git.kernel.dk/?p=linux-2.6-block.git;a=commitdiff_plain;h=392ddc32982a5c661dd90dd49a3cb37f1c68b782;hp=bb799ca0202a360fa74d5f17039b9100caebdde7
> but get a message that it is already applied. Seems i already got that
> one.

Is your configuration able to support ftrace?

Looks like someone is (wrongly) freeing some memory that was sent to
call_rcu(), and ftrace might be one way to locate the problem.

Thanx, Paul

2009-01-04 14:57:42

by Eric Sesterhenn

[permalink] [raw]
Subject: Re: [BUG] NULL pointer deref with rcutorture

* Paul E. McKenney ([email protected]) wrote:
> On Sat, Jan 03, 2009 at 10:40:03AM +0100, Eric Sesterhenn wrote:
> > * Paul E. McKenney ([email protected]) wrote:
> > > On Sat, Jan 03, 2009 at 12:12:39AM +0100, Eric Sesterhenn wrote:
> > > > * Paul E. McKenney ([email protected]) wrote:
> > > >
> > > > I tried to apply the patch from
> > > > http://git.kernel.dk/?p=linux-2.6-block.git;a=commitdiff_plain;h=392ddc32982a5c661dd90dd49a3cb37f1c68b782;hp=bb799ca0202a360fa74d5f17039b9100caebdde7
> > > > but get a message that it is already applied. Seems i already got that
> > > > one.
> > >
> > > Is your configuration able to support ftrace?
> >
> > yes
> >
> > > Looks like someone is (wrongly) freeing some memory that was sent to
> > > call_rcu(), and ftrace might be one way to locate the problem.
> >
> > I set the filter to call_rcu() and call_rcu_sched(), reproduced the
> > oops, and got this in the trace file.
>
> Hmmm... The only unique ones are _rcu_barrier() used during module
> removal by rcutorture and rt_worker_func(), which occurs near the
> beginning of the trace. My next step would be to trace the addresses of
> rcu_head structures passed to call_rcu() and friends and to also trace
> the addresses of the structures as they are invoked (where the original
> NULL-pointer exception occurred).
>
> Is this tracing something you would be interested in taking on? My
> test setup is having some trouble, so it would probably be a couple of
> days before I got it working. :-/

Just tell me what i need to do, I am not really familiar with ftrace.
I am only able to test 2.6.28-04980-gb58602a, since current -git is not
able to boot on this box :|

Greetings, Eric

2009-01-04 21:14:03

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [BUG] NULL pointer deref with rcutorture

On Sun, Jan 04, 2009 at 03:57:26PM +0100, Eric Sesterhenn wrote:
> * Paul E. McKenney ([email protected]) wrote:
> > On Sat, Jan 03, 2009 at 10:40:03AM +0100, Eric Sesterhenn wrote:
> > > * Paul E. McKenney ([email protected]) wrote:
> > > > On Sat, Jan 03, 2009 at 12:12:39AM +0100, Eric Sesterhenn wrote:
> > > > > * Paul E. McKenney ([email protected]) wrote:
> > > > >
> > > > > I tried to apply the patch from
> > > > > http://git.kernel.dk/?p=linux-2.6-block.git;a=commitdiff_plain;h=392ddc32982a5c661dd90dd49a3cb37f1c68b782;hp=bb799ca0202a360fa74d5f17039b9100caebdde7
> > > > > but get a message that it is already applied. Seems i already got that
> > > > > one.
> > > >
> > > > Is your configuration able to support ftrace?
> > >
> > > yes
> > >
> > > > Looks like someone is (wrongly) freeing some memory that was sent to
> > > > call_rcu(), and ftrace might be one way to locate the problem.
> > >
> > > I set the filter to call_rcu() and call_rcu_sched(), reproduced the
> > > oops, and got this in the trace file.
> >
> > Hmmm... The only unique ones are _rcu_barrier() used during module
> > removal by rcutorture and rt_worker_func(), which occurs near the
> > beginning of the trace. My next step would be to trace the addresses of
> > rcu_head structures passed to call_rcu() and friends and to also trace
> > the addresses of the structures as they are invoked (where the original
> > NULL-pointer exception occurred).
> >
> > Is this tracing something you would be interested in taking on? My
> > test setup is having some trouble, so it would probably be a couple of
> > days before I got it working. :-/
>
> Just tell me what i need to do, I am not really familiar with ftrace.
> I am only able to test 2.6.28-04980-gb58602a, since current -git is not
> able to boot on this box :|

Very cool!

The idea is to have __call_rcu() in kernel/rcutree.c record the
address of the callback (argument "head") and the function (argument
"func"). In rcu_do_batch(), just before invoking list->func(list),
also record the address of the callback ("list") and the function
(again, "func").

The new ftrace package has some mechanisms for doing this, but there is
always the old-fashioned way of using printk(), for example in
rcu_do_batch():

prefetch(next);
if (rcu_dump_callbacks)
printk("rcu_head=%p, func=%p\n", list, func);
list->func(list);

Initialize rcu_dump_callbacks to zero, then use a small kernel module
(or some such) to set it to one just before running your test.

If you would like, I could rough out the code, though as noted earlier,
I currently have no way of testing it. (Hopefully will be fixed in a
few days.)

I was wondering what should be done for ftrace plugins for RCU, I guess
I now have at least part of the answer. ;-)

Thanx, Paul

2009-01-04 23:39:17

by Eric Sesterhenn

[permalink] [raw]
Subject: Re: [BUG] NULL pointer deref with rcutorture

hi,

* Paul E. McKenney ([email protected]) wrote:
> On Sun, Jan 04, 2009 at 03:57:26PM +0100, Eric Sesterhenn wrote:
> >
> > Just tell me what i need to do, I am not really familiar with ftrace.
> > I am only able to test 2.6.28-04980-gb58602a, since current -git is not
> > able to boot on this box :|
>
> Very cool!
>
> The idea is to have __call_rcu() in kernel/rcutree.c record the
> address of the callback (argument "head") and the function (argument
> "func"). In rcu_do_batch(), just before invoking list->func(list),
> also record the address of the callback ("list") and the function
> (again, "func").
>
> The new ftrace package has some mechanisms for doing this, but there is
> always the old-fashioned way of using printk(), for example in
> rcu_do_batch():
>
> prefetch(next);
> if (rcu_dump_callbacks)
> printk("rcu_head=%p, func=%p\n", list, func);
> list->func(list);
>
> Initialize rcu_dump_callbacks to zero, then use a small kernel module
> (or some such) to set it to one just before running your test.

i did it via the ugly printk and captured it via netconsole

for the box with rcutree you can find the log here:
http://www.cccmz.de/~snakebyte/200.log
The interesting part seems start at 155.858923, thats when I load
the rcutorture module

with http://www.cccmz.de/~snakebyte/200_1.log i had another try,
actually saw a rcu_do_batch: rcu_head=d1907720, func=(null) in the traces,
box went into unresponsive mode afterwards

And just in case it helps somehow: http://www.cccmz.de/~snakebyte/200_3.log

for the one with rcupreemt you can find the log here:
http://www.cccmz.de/~snakebyte/201.log
rcutorture is loaded at 87.312399

Greetings, Eric

2009-01-05 02:28:50

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [BUG] NULL pointer deref with rcutorture

On Mon, Jan 05, 2009 at 12:38:55AM +0100, Eric Sesterhenn wrote:
> hi,
>
> * Paul E. McKenney ([email protected]) wrote:
> > On Sun, Jan 04, 2009 at 03:57:26PM +0100, Eric Sesterhenn wrote:
> > >
> > > Just tell me what i need to do, I am not really familiar with ftrace.
> > > I am only able to test 2.6.28-04980-gb58602a, since current -git is not
> > > able to boot on this box :|
> >
> > Very cool!
> >
> > The idea is to have __call_rcu() in kernel/rcutree.c record the
> > address of the callback (argument "head") and the function (argument
> > "func"). In rcu_do_batch(), just before invoking list->func(list),
> > also record the address of the callback ("list") and the function
> > (again, "func").
> >
> > The new ftrace package has some mechanisms for doing this, but there is
> > always the old-fashioned way of using printk(), for example in
> > rcu_do_batch():
> >
> > prefetch(next);
> > if (rcu_dump_callbacks)
> > printk("rcu_head=%p, func=%p\n", list, func);
> > list->func(list);
> >
> > Initialize rcu_dump_callbacks to zero, then use a small kernel module
> > (or some such) to set it to one just before running your test.
>
> i did it via the ugly printk and captured it via netconsole

Good stuff!!!

> for the box with rcutree you can find the log here:
> http://www.cccmz.de/~snakebyte/200.log
> The interesting part seems start at 155.858923, thats when I load
> the rcutorture module

When I download this, I see an 82.100963 immediately followed by an
167.894017, no 155.858923. Do I have the right log?

> with http://www.cccmz.de/~snakebyte/200_1.log i had another try,
> actually saw a rcu_do_batch: rcu_head=d1907720, func=(null) in the traces,
> box went into unresponsive mode afterwards

Interesting. The original function is 0xd1902ad0, passed to __call_rcu()
at 157.364214 -- would you be able to tell me what function that
corresponds to? (Looks to me like a module, perhaps rcutorture?)

Whatever function it is, the rcu_head definitely got corrupted some
time during the 8 milliseconds or so that the callback was waiting for
a grace period. However, it did just fine being invoked several times
beforehand -- this is a very popular RCU callback function, it appears.

My guess is that the ensuing CPU 0 stall messages are due to CPU 0
having a heart attack trying to execute at address 0.

Are you swapping over NFS, or is the netpoll_send_udp() just garbage on
the stack?

> And just in case it helps somehow: http://www.cccmz.de/~snakebyte/200_3.log

Hmmm... The main recent change to kernel/rcutorture.c is the addition
of a reboot notifier, so that rcutorture can shut itself down gracefully
should the system go down while rcutorture is still running. This panic
occurs in rcu_torture_stutter(), which is shown below:

static int
rcu_torture_stutter(void *arg)
{
VERBOSE_PRINTK_STRING("rcu_torture_stutter task started");
do {
schedule_timeout_interruptible(stutter * HZ);
stutter_pause_test = 1;
if (!kthread_should_stop() && !fullstop)
schedule_timeout_interruptible(stutter * HZ);
stutter_pause_test = 0;
} while (!kthread_should_stop() && !fullstop);
VERBOSE_PRINTK_STRING("rcu_torture_stutter task stopping");
return 0;
}

I don't see any reason why this function would transfer control to
location 0x60, though I do see a bug in the new shutdown code. I
don't expect this to make any difference, but a patch is included
nevertheless.

> for the one with rcupreemt you can find the log here:
> http://www.cccmz.de/~snakebyte/201.log
> rcutorture is loaded at 87.312399

Hmmm... Offset 0x60 seems to be a common thread. This lets
rcu_torture_ops off the hook, since it isn't that large.

Anyway, could you give the attached patch a go, even though I cannot see
how it would help? ;-)

Do you get the same error running Classic RCU?

Thanx, Paul

> Greetings, Eric

Fix an rcutorture bug that prevents the shutdown notifier from ever
actually having any effect, due to the fact that kthreads ignore all
signals.

Signed-off-by: Paul E. McKenney <[email protected]>
---

rcutorture.c | 18 ++++++++----------
1 file changed, 8 insertions(+), 10 deletions(-)

diff --git a/kernel/rcutorture.c b/kernel/rcutorture.c
index b310655..d54111c 100644
--- a/kernel/rcutorture.c
+++ b/kernel/rcutorture.c
@@ -136,7 +136,7 @@ static int stutter_pause_test = 0;
#endif
int rcutorture_runnable = RCUTORTURE_RUNNABLE_INIT;

-#define FULLSTOP_SIGNALED 1 /* Bail due to signal. */
+#define FULLSTOP_SHUTDOWN 1 /* Bail due to system shutdown/panic. */
#define FULLSTOP_CLEANUP 2 /* Orderly shutdown. */
static int fullstop; /* stop generating callbacks at test end. */
DEFINE_MUTEX(fullstop_mutex); /* protect fullstop transitions and */
@@ -151,12 +151,10 @@ rcutorture_shutdown_notify(struct notifier_block *unused1,
{
if (fullstop)
return NOTIFY_DONE;
- if (signal_pending(current)) {
- mutex_lock(&fullstop_mutex);
- if (!ACCESS_ONCE(fullstop))
- fullstop = FULLSTOP_SIGNALED;
- mutex_unlock(&fullstop_mutex);
- }
+ mutex_lock(&fullstop_mutex);
+ if (!fullstop)
+ fullstop = FULLSTOP_SHUTDOWN;
+ mutex_unlock(&fullstop_mutex);
return NOTIFY_DONE;
}

@@ -624,7 +622,7 @@ rcu_torture_writer(void *arg)
rcu_stutter_wait();
} while (!kthread_should_stop() && !fullstop);
VERBOSE_PRINTK_STRING("rcu_torture_writer task stopping");
- while (!kthread_should_stop() && fullstop != FULLSTOP_SIGNALED)
+ while (!kthread_should_stop() && fullstop != FULLSTOP_SHUTDOWN)
schedule_timeout_uninterruptible(1);
return 0;
}
@@ -649,7 +647,7 @@ rcu_torture_fakewriter(void *arg)
} while (!kthread_should_stop() && !fullstop);

VERBOSE_PRINTK_STRING("rcu_torture_fakewriter task stopping");
- while (!kthread_should_stop() && fullstop != FULLSTOP_SIGNALED)
+ while (!kthread_should_stop() && fullstop != FULLSTOP_SHUTDOWN)
schedule_timeout_uninterruptible(1);
return 0;
}
@@ -759,7 +757,7 @@ rcu_torture_reader(void *arg)
VERBOSE_PRINTK_STRING("rcu_torture_reader task stopping");
if (irqreader && cur_ops->irqcapable)
del_timer_sync(&t);
- while (!kthread_should_stop() && fullstop != FULLSTOP_SIGNALED)
+ while (!kthread_should_stop() && fullstop != FULLSTOP_SHUTDOWN)
schedule_timeout_uninterruptible(1);
return 0;
}

2009-01-05 12:14:31

by Eric Sesterhenn

[permalink] [raw]
Subject: Re: [BUG] NULL pointer deref with rcutorture

* Paul E. McKenney ([email protected]) wrote:
> On Mon, Jan 05, 2009 at 12:38:55AM +0100, Eric Sesterhenn wrote:
> > hi,
> >
> > * Paul E. McKenney ([email protected]) wrote:
> > > On Sun, Jan 04, 2009 at 03:57:26PM +0100, Eric Sesterhenn wrote:
> > > >
> > > > Just tell me what i need to do, I am not really familiar with ftrace.
> > > > I am only able to test 2.6.28-04980-gb58602a, since current -git is not
> > > > able to boot on this box :|
> > >
> > > Very cool!
> > >
> > > The idea is to have __call_rcu() in kernel/rcutree.c record the
> > > address of the callback (argument "head") and the function (argument
> > > "func"). In rcu_do_batch(), just before invoking list->func(list),
> > > also record the address of the callback ("list") and the function
> > > (again, "func").
> > >
> > > The new ftrace package has some mechanisms for doing this, but there is
> > > always the old-fashioned way of using printk(), for example in
> > > rcu_do_batch():
> > >
> > > prefetch(next);
> > > if (rcu_dump_callbacks)
> > > printk("rcu_head=%p, func=%p\n", list, func);
> > > list->func(list);
> > >
> > > Initialize rcu_dump_callbacks to zero, then use a small kernel module
> > > (or some such) to set it to one just before running your test.
> >
> > i did it via the ugly printk and captured it via netconsole
>
> Good stuff!!!
>
> > for the box with rcutree you can find the log here:
> > http://www.cccmz.de/~snakebyte/200.log
> > The interesting part seems start at 155.858923, thats when I load
> > the rcutorture module
>
> When I download this, I see an 82.100963 immediately followed by an
> 167.894017, no 155.858923. Do I have the right log?
>
> > with http://www.cccmz.de/~snakebyte/200_1.log i had another try,
> > actually saw a rcu_do_batch: rcu_head=d1907720, func=(null) in the traces,
> > box went into unresponsive mode afterwards
>
> Interesting. The original function is 0xd1902ad0, passed to __call_rcu()
> at 157.364214 -- would you be able to tell me what function that
> corresponds to? (Looks to me like a module, perhaps rcutorture?)

I did the log more than once, guess this was the time from the one
i uploaded earlier with the same name

> Whatever function it is, the rcu_head definitely got corrupted some
> time during the 8 milliseconds or so that the callback was waiting for
> a grace period. However, it did just fine being invoked several times
> beforehand -- this is a very popular RCU callback function, it appears.
>
> My guess is that the ensuing CPU 0 stall messages are due to CPU 0
> having a heart attack trying to execute at address 0.

Could the popular rcu function be registered by rcutorture, but when
we remove the module the callback is no longer valid? I can compile
a kernel just fine and with other stress tests i did not see any oops so
far.

> Are you swapping over NFS, or is the netpoll_send_udp() just garbage on
> the stack?

Swap is locally on a disc, nfs is configured but not used for swap.

> > And just in case it helps somehow: http://www.cccmz.de/~snakebyte/200_3.log
>
> Hmmm... The main recent change to kernel/rcutorture.c is the addition
> of a reboot notifier, so that rcutorture can shut itself down gracefully
> should the system go down while rcutorture is still running. This panic
> occurs in rcu_torture_stutter(), which is shown below:
>
> static int
> rcu_torture_stutter(void *arg)
> {
> VERBOSE_PRINTK_STRING("rcu_torture_stutter task started");
> do {
> schedule_timeout_interruptible(stutter * HZ);
> stutter_pause_test = 1;
> if (!kthread_should_stop() && !fullstop)
> schedule_timeout_interruptible(stutter * HZ);
> stutter_pause_test = 0;
> } while (!kthread_should_stop() && !fullstop);
> VERBOSE_PRINTK_STRING("rcu_torture_stutter task stopping");
> return 0;
> }
>
> I don't see any reason why this function would transfer control to
> location 0x60, though I do see a bug in the new shutdown code. I
> don't expect this to make any difference, but a patch is included
> nevertheless.

Sadly it doesnt.

> > for the one with rcupreemt you can find the log here:
> > http://www.cccmz.de/~snakebyte/201.log
> > rcutorture is loaded at 87.312399
>
> Hmmm... Offset 0x60 seems to be a common thread. This lets
> rcu_torture_ops off the hook, since it isn't that large.
>
> Anyway, could you give the attached patch a go, even though I cannot see
> how it would help? ;-)

This is with the patch on the preemt rcu box (the 201.logs)

[ 593.404728] rcu-torture:--- Start of test: nreaders=2 nfakewriters=4
stat_interval=0 verbose=0 test_no_idle_hz=0 shuffle_interval=3 stutter=5
irqreader=1
[ 601.000124] BUG: unable to handle kernel paging request at d0af8a2d
[ 601.000431] IP: [<d0af8a2d>] 0xd0af8a2d
[ 601.000696] *pde = 0fbb4067 *pte = 00000000
[ 601.000946] Oops: 0000 [#1] PREEMPT DEBUG_PAGEALLOC
[ 601.001027] last sysfs file: /sys/block/ram9/range
[ 601.001027] Modules linked in: [last unloaded: rcutorture]
[ 601.001027]
[ 601.001027] Pid: 5021, comm: rcu_torture_fak Tainted: G W
(2.6.28-05692-g7d3b56b-dirty #169) System Name
[ 601.001027] EIP: 0060:[<d0af8a2d>] EFLAGS: 00010246 CPU: 0
[ 601.001027] EIP is at 0xd0af8a2d
[ 601.001027] EAX: 00000000 EBX: c98acfc4 ECX: c04f2c53 EDX: c98acf74
[ 601.001027] ESI: d0af8c0a EDI: 00000000 EBP: c98acfbc ESP: c98acfbc
[ 601.001027] DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
[ 601.001027] Process rcu_torture_fak (pid: 5021, ti=c98ac000
task=cbb64100 task.ti=c98ac000)
[ 601.001027] Stack:
[ 601.001027] c98acfd0 d0af8c89 74816929 000024a7 00000000 c98acfe0
c0136bcc c0136b8e
[ 601.001027] 00000000 00000000 c0103643 c9897ea4 00000000 00000000
00000000 00001301
[ 601.001027] 00008169
[ 601.001027] Call Trace:
[ 601.001027] [<c0136bcc>] ? kthread+0x3e/0x66
[ 601.001027] [<c0136b8e>] ? kthread+0x0/0x66
[ 601.001027] [<c0103643>] ? kernel_thread_helper+0x7/0x10
[ 601.001027] Code: Bad EIP value.
[ 601.001027] EIP: [<d0af8a2d>] 0xd0af8a2d SS:ESP 0068:c98acfbc
[ 601.001027] ---[ end trace 4eaa2a86a8e2da22 ]---
[ 601.006764] BUG: unable to handle kernel paging request at d0af8a2d
[ 601.007023] IP: [<d0af8a2d>] 0xd0af8a2d
[ 601.007265] *pde = 0fbb4067 *pte = 00000000
[ 601.007511] Oops: 0000 [#2] PREEMPT DEBUG_PAGEALLOC
[ 601.007716] last sysfs file: /sys/block/ram9/range
[ 601.007716] Modules linked in: [last unloaded: rcutorture]
[ 601.007716]
[ 601.007716] Pid: 5020, comm: rcu_torture_fak Tainted: G D W
(2.6.28-05692-g7d3b56b-dirty #169) System Name
[ 601.007716] EIP: 0060:[<d0af8a2d>] EFLAGS: 00010296 CPU: 0
[ 601.007716] EIP is at 0xd0af8a2d
[ 601.007716] EAX: 00000000 EBX: c98adfc4 ECX: c04f2c53 EDX: c98adf74
[ 601.007716] ESI: d0af8c0a EDI: 00000000 EBP: c98adfbc ESP: c98adfbc
[ 601.007716] DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
[ 601.007716] Process rcu_torture_fak (pid: 5020, ti=c98ad000
task=cbb62700 task.ti=c98ad000)
[ 601.007716] Stack:
[ 601.007716] c98adfd0 d0af8c89 e5bc239c 000024ab 00000000 c98adfe0
c0136bcc c0136b8e
[ 601.007716] 00000000 00000000 c0103643 c9897ea4 00000000 00000000
00000000 00000201
[ 601.007716] 00000674
[ 601.007716] Call Trace:
[ 601.007716] [<c0136bcc>] ? kthread+0x3e/0x66
[ 601.007716] [<c0136b8e>] ? kthread+0x0/0x66
[ 601.007716] [<c0103643>] ? kernel_thread_helper+0x7/0x10
[ 601.007716] Code: Bad EIP value.
[ 601.007716] EIP: [<d0af8a2d>] 0xd0af8a2d SS:ESP 0068:c98adfbc
[ 601.007716] ---[ end trace 4eaa2a86a8e2da22 ]---
[ 601.013281] BUG: unable to handle kernel paging request at d0af8a2d
[ 601.013542] IP: [<d0af8a2d>] 0xd0af8a2d
[ 601.013785] *pde = 0fbb4067 *pte = 00000000
[ 601.014032] Oops: 0000 [#3] PREEMPT DEBUG_PAGEALLOC
[ 601.014239] last sysfs file: /sys/block/ram9/range
[ 601.014239] Modules linked in: [last unloaded: rcutorture]
[ 601.014239]
[ 601.014239] Pid: 5019, comm: rcu_torture_fak Tainted: G D W
(2.6.28-05692-g7d3b56b-dirty #169) System Name
[ 601.014239] EIP: 0060:[<d0af8a2d>] EFLAGS: 00010286 CPU: 0
[ 601.014239] EIP is at 0xd0af8a2d
[ 601.014239] EAX: 00000000 EBX: c98b2fc4 ECX: c04f2c53 EDX: c98b2f74
[ 601.014239] ESI: d0af8c0a EDI: 00000000 EBP: c98b2fbc ESP: c98b2fbc
[ 601.014239] DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
[ 601.014239] Process rcu_torture_fak (pid: 5019, ti=c98b2000
task=c9804e00 task.ti=c98b2000)
[ 601.014239] Stack:
[ 601.014239] c98b2fd0 d0af8c89 61a1df8f 000024a5 00000000 c98b2fe0
c0136bcc c0136b8e
[ 601.014239] 00000000 00000000 c0103643 c9897ea4 00000000 00000000
00000000 000007bf
[ 601.014239] 00000012
[ 601.014239] Call Trace:
[ 601.014239] [<c0136bcc>] ? kthread+0x3e/0x66
[ 601.014239] [<c0136b8e>] ? kthread+0x0/0x66
[ 601.014239] [<c0103643>] ? kernel_thread_helper+0x7/0x10
[ 601.014239] Code: Bad EIP value.
[ 601.014239] EIP: [<d0af8a2d>] 0xd0af8a2d SS:ESP 0068:c98b2fbc
[ 601.014239] ---[ end trace 4eaa2a86a8e2da22 ]---

then i added the rcutree patch (Fix rcutree grace-period-latency bug on
small systems) and got

[ 65.968568] rcu-torture:--- Start of test: nreaders=2 nfakewriters=4
stat_interval=0 verbose=0 test_no_idle_hz=0 shuffle_interval=3 stutter=5
irqreader=1
[ 68.872082] BUG: unable to handle kernel NULL pointer dereference at
(null)
[ 68.872455] IP: [<(null)>] (null)
[ 68.872653] *pde = 00000000
[ 68.872849] Oops: 0000 [#1] PREEMPT DEBUG_PAGEALLOC
[ 68.873027] last sysfs file: /sys/block/ram9/range
[ 68.873027] Modules linked in: [last unloaded: rcutorture]
[ 68.873027]
[ 68.873027] Pid: 5023, comm: bash Tainted: G W
(2.6.28-05692-g7d3b56b-dirty #169) System Name
[ 68.873027] EIP: 0060:[<00000000>] EFLAGS: 00010246 CPU: 0
[ 68.873027] EIP is at 0x0
[ 68.873027] EAX: d0afd0b8 EBX: 00000000 ECX: c01612a6 EDX: 00000006
[ 68.873027] ESI: d0afd0b8 EDI: 0000001c EBP: c0b03fe0 ESP: c0b03fd4
[ 68.873027] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068
[ 68.873027] Process bash (pid: 5023, ti=c0b03000 task=c98d1a00
task.ti=ce824000)
[ 68.873027] Stack:
[ 68.873027] c01612ad 00000200 00000001 c0b03ff8 c012aa97 0000000a
ce824e98 00000046
[ 68.873027] c012aa28 ce824ea8 c01042c2
[ 68.873027] Call Trace:
[ 68.873027] [<c01612ad>] ? rcu_process_callbacks+0x65/0x79
[ 68.873027] [<c012aa97>] ? __do_softirq+0x6f/0xf6
[ 68.873027] [<c012aa28>] ? __do_softirq+0x0/0xf6
[ 68.873027] <IRQ> <0> [<c012a9a5>] ? irq_exit+0x40/0x7c
[ 68.873027] [<c0110ce1>] ? smp_apic_timer_interrupt+0x68/0x73
[ 68.873027] [<c0103521>] ? apic_timer_interrupt+0x2d/0x34
[ 68.873027] [<c07a024c>] ? _spin_lock+0x56/0x5d
[ 68.873027] [<c07a0000>] ? _write_unlock_bh+0x28/0x37
[ 68.873027] [<c017ad50>] ? do_wp_page+0x38b/0x453
[ 68.873027] [<c017be9e>] ? handle_mm_fault+0x4e8/0x54d
[ 68.873027] [<c0139da3>] ? down_read_trylock+0x3d/0x47
[ 68.873027] [<c07a25d3>] ? do_page_fault+0x2dc/0x601
[ 68.873027] [<c07a22f7>] ? do_page_fault+0x0/0x601
[ 68.873027] [<c07a0bd7>] ? error_code+0x6f/0x74
[ 68.873027] Code: Bad EIP value.
[ 68.873027] EIP: [<00000000>] 0x0 SS:ESP 0068:c0b03fd4
[ 68.880266] ---[ end trace 4eaa2a86a8e2da22 ]---
[ 68.880394] Kernel panic - not syncing: Fatal exception in interrupt


Tried again without the rcutree patch and got this one,
guess its just a timing thing wheter the NULL pointer deref or the other
appears

[ 99.643517] rcu-torture:--- Start of test: nreaders=2 nfakewriters=4
stat_interval=0 verbose=0 test_no_idle_hz=0 shuffle_interval=3 stutter=5
irqreader=1
[ 104.149086] BUG: unable to handle kernel NULL pointer dereference at
(null)
[ 104.149418] IP: [<(null)>] (null)
[ 104.149615] *pde = 00000000
[ 104.149813] Oops: 0000 [#1] PREEMPT DEBUG_PAGEALLOC
[ 104.150029] last sysfs file: /sys/block/ram9/range
[ 104.150029] Modules linked in: [last unloaded: rcutorture]
[ 104.150029]
[ 104.150029] Pid: 5030, comm: udevd Tainted: G W
(2.6.28-05692-g7d3b56b-dirty #169) System Name
[ 104.150029] EIP: 0060:[<00000000>] EFLAGS: 00010246 CPU: 0
[ 104.150029] EIP is at 0x0
[ 104.150029] EAX: d0afcc38 EBX: 00000000 ECX: c01612a6 EDX: 00000006
[ 104.150029] ESI: d0afcc38 EDI: 0000001c EBP: c0b03fe0 ESP: c0b03fd4
[ 104.150029] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068
[ 104.150029] Process udevd (pid: 5030, ti=c0b03000 task=c98cb400
task.ti=c98b1000)
[ 104.150029] Stack:
[ 104.150029] c01612ad 00000200 00000001 c0b03ff8 c012aa97 0000000a
c98b1e40 00000046
[ 104.150029] c012aa28 c98b1e50 c01042c2
[ 104.150029] Call Trace:
[ 104.150029] [<c01612ad>] ? rcu_process_callbacks+0x65/0x79
[ 104.150029] [<c012aa97>] ? __do_softirq+0x6f/0xf6
[ 104.150029] [<c012aa28>] ? __do_softirq+0x0/0xf6
[ 104.150029] <IRQ> <0> [<c012a9a5>] ? irq_exit+0x40/0x7c
[ 104.150029] [<c0110ce1>] ? smp_apic_timer_interrupt+0x68/0x73
[ 104.150029] [<c0103521>] ? apic_timer_interrupt+0x2d/0x34
[ 104.150029] [<c016e63f>] ? find_lock_page+0x5f/0x63
[ 104.150029] [<c0170122>] ? filemap_fault+0x9f/0x391
[ 104.150029] [<c017a6e8>] ? __do_fault+0x40/0x31d
[ 104.150029] [<c017bc30>] ? handle_mm_fault+0x27a/0x54d
[ 104.150029] [<c0139da3>] ? down_read_trylock+0x3d/0x47
[ 104.150029] [<c07a25d3>] ? do_page_fault+0x2dc/0x601
[ 104.150029] [<c07a22f7>] ? do_page_fault+0x0/0x601
[ 104.150029] [<c07a0bd7>] ? error_code+0x6f/0x74
[ 104.150029] Code: Bad EIP value.
[ 104.150029] EIP: [<00000000>] 0x0 SS:ESP 0068:c0b03fd4
[ 104.157313] ---[ end trace 4eaa2a86a8e2da22 ]---
[ 104.157441] Kernel panic - not syncing: Fatal exception in interrupt

I enabled classic rcu and pulled current -git (without both patches) and got these:

[ 67.424425] rcu-torture:--- Start of test: nreaders=2 nfakewriters=4
stat_interval=0 verbose=0 test_no_idle_hz=0 shuffle_interval=3 stutter=5
irqreader=1
[ 70.205083] BUG: unable to handle kernel NULL pointer dereference at
(null)
[ 70.205416] IP: [<(null)>] (null)
[ 70.205605] *pde = 00000000
[ 70.205798] Oops: 0000 [#1] PREEMPT DEBUG_PAGEALLOC
[ 70.206026] last sysfs file: /sys/block/ram9/range
[ 70.206026] Modules linked in: [last unloaded: rcutorture]
[ 70.206026]
[ 70.206026] Pid: 1726, comm: udevd Tainted: G W
(2.6.28-05716-gfe0bdec #170) System Name
[ 70.206026] EIP: 0060:[<00000000>] EFLAGS: 00010202 CPU: 0
[ 70.206026] EIP is at 0x0
[ 70.206026] EAX: d0afc018 EBX: c10697c0 ECX: c0161905 EDX: d0afc018
[ 70.206026] ESI: 00000000 EDI: 00000001 EBP: c0b0bfd8 ESP: c0b0bfc4
[ 70.206026] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068
[ 70.206026] Process udevd (pid: 1726, ti=c0b0b000 task=cee10000
task.ti=ce80b000)
[ 70.206026] Stack:
[ 70.206026] c016197c 00000202 00000200 00000001 0000001c c0b0bfe0
c0161a00 c0b0bff8
[ 70.206026] c012acca 0000000a ce80bec4 00000046 c012ac5b ce80bed4
c01042c2
[ 70.206026] Call Trace:
[ 70.206026] [<c016197c>] ? __rcu_process_callbacks+0x141/0x1ae
[ 70.206026] [<c0161a00>] ? rcu_process_callbacks+0x17/0x28
[ 70.206026] [<c012acca>] ? __do_softirq+0x6f/0x100
[ 70.206026] [<c012ac5b>] ? __do_softirq+0x0/0x100
[ 70.206026] <IRQ> <0> [<c012abdd>] ? irq_exit+0x40/0x77
[ 70.206026] [<c0110ce9>] ? smp_apic_timer_interrupt+0x68/0x73
[ 70.206026] [<c0103521>] ? apic_timer_interrupt+0x2d/0x34
[ 70.206026] [<c014007b>] ? tick_do_periodic_broadcast+0xf/0x3a
[ 70.206026] [<c018f325>] ? kmem_cache_free+0xbd/0xc5
[ 70.206026] [<c019a45d>] ? putname+0x28/0x33
[ 70.206026] [<c019a45d>] ? putname+0x28/0x33
[ 70.206026] [<c019b7a0>] ? do_rmdir+0xba/0xc3
[ 70.206026] [<c04e8b50>] ? trace_hardirqs_on_thunk+0xc/0x10
[ 70.206026] [<c0102ecf>] ? sysenter_exit+0xf/0x16
[ 70.206026] [<c01448f1>] ? trace_hardirqs_on_caller+0x17/0x15a
[ 70.206026] [<c019b7f2>] ? sys_rmdir+0x15/0x17
[ 70.206026] [<c0102ea1>] ? sysenter_do_call+0x12/0x31
[ 70.206026] Code: Bad EIP value.
[ 70.206026] EIP: [<00000000>] 0x0 SS:ESP 0068:c0b0bfc4
[ 70.213814] ---[ end trace 4eaa2a86a8e2da22 ]---
[ 70.213934] Kernel panic - not syncing: Fatal exception in interrupt




On the other box with tree rcu and the rcutorture patch (200.logs):


[ 168.408126] BUG: unable to handle kernel NULL pointer dereference at
00000029
[ 168.408433] IP: [<d1902ebf>] 0xd1902ebf
[ 168.408614] *pde = 00000000
[ 168.408778] Oops: 0000 [#1] DEBUG_PAGEALLOC
[ 168.408978] last sysfs file: /sys/block/sda/size
[ 168.409164] Modules linked in: nfsd exportfs nfs lockd nfs_acl
auth_rpcgss sunrpc ipv6 fuse unix [last unloaded: rcutorture]
[ 168.410099]
[ 168.410197] Pid: 4154, comm: rcu_torture_rea Not tainted (2.6.28 #83)
[ 168.410332] EIP: 0060:[<d1902ebf>] EFLAGS: 00010202 CPU: 0
[ 168.410516] EIP is at 0xd1902ebf
[ 168.410628] EAX: 00000001 EBX: 00000001 ECX: 00000000 EDX: 00000000
[ 168.410760] ESI: 00000000 EDI: 00000000 EBP: c7d52fd0 ESP: c7d52f84
[ 168.410992] DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
[ 168.411120] Process rcu_torture_rea (pid: 4154, ti=c7d52000
task=c7b73710 task.ti=c7d52000)
[ 168.411302] Stack:
[ 168.411398] 00000002 00000000 c0936a24 c7b48f8c 00000001 d1903040
00000000 c0936000
[ 168.411981] d1902ff7 5f756372 74726f74 5f657275 00616572 0000103a
2e11c1af 00000dd1
[ 168.412021] 00000000 d1902e50 00000000 c7d52fe0 c013d81a c013d7e0
00000000 00000000
[ 168.412021] Call Trace:
[ 168.412021] [<c013d81a>] ? kthread+0x3a/0x70
[ 168.412021] [<c013d7e0>] ? kthread+0x0/0x70
[ 168.412021] [<c0103cf3>] ? kernel_thread_helper+0x7/0x14
[ 168.412021] Code: 28 85 f6 74 1c 8d 45 bc c7 45 c8 40 30 90 d1 c7 45
cc 00 00 00 00 e8 91 03 83 ee 90 a1 00 5e 90 d1 8b 1d a4 59 90 d1 85 db
74 12 <8b> 48 28 85 c9 74 0b 8b 55 bc 85 d2 0f 84 19 01 00 00 8d b4 26
[ 168.412021] EIP: [<d1902ebf>] 0xd1902ebf SS:ESP 0068:c7d52f84
[ 168.418864] ---[ end trace 4bbd74c2c853213b ]---
[ 168.429775] __call_rcu: rcu_head=c3f47f24, func=c01b5a70
[ 168.429958] __call_rcu: rcu_head=c7bbecac, func=c0143580
[ 168.430129] __call_rcu: rcu_head=c7dbeea0, func=c01a6ae0
[ 168.431826] __call_rcu: rcu_head=c7de9328, func=c01a6ae0
[ 168.432089] rcu_do_batch: rcu_head=c3f47f24, func=c01b5a70
[ 168.432869] __call_rcu: rcu_head=c994fe18, func=c01b9c20
[ 168.436064] rcu_do_batch: rcu_head=c7bbecac, func=c0143580
[ 168.436226] rcu_do_batch: rcu_head=c7dbeea0, func=c01a6ae0
[ 168.444046] rcu_do_batch: rcu_head=c7de9328, func=c01a6ae0
[ 168.444193] rcu_do_batch: rcu_head=c994fe18, func=c01b9c20
[ 168.445866] BUG: unable to handle kernel NULL pointer dereference at
00000029
[ 168.446126] IP: [<d1902ebf>] 0xd1902ebf
[ 168.446315] *pde = 00000000
[ 168.446488] Oops: 0000 [#2] DEBUG_PAGEALLOC
[ 168.446692] last sysfs file: /sys/block/sda/size
[ 168.446883] Modules linked in: nfsd exportfs nfs lockd nfs_acl
auth_rpcgss sunrpc ipv6 fuse unix [last unloaded: rcutorture]
[ 168.447818]
[ 168.447916] Pid: 4153, comm: rcu_torture_rea Tainted: G D
(2.6.28 #83)
[ 168.448022] EIP: 0060:[<d1902ebf>] EFLAGS: 00010202 CPU: 0
[ 168.448022] EIP is at 0xd1902ebf
[ 168.448022] EAX: 00000001 EBX: 00000001 ECX: 00000000 EDX: 00000000
[ 168.448022] ESI: 00000000 EDI: 00000000 EBP: c7b48fd0 ESP: c7b48f84
[ 168.448022] DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
[ 168.448022] Process rcu_torture_rea (pid: 4153, ti=c7b48000
task=c7b5c710 task.ti=c7b48000)
[ 168.448022] Stack:
[ 168.448022] 00000002 00000000 c7d52f8c c7de5f74 00000001 d1903040
00000000 c0936000
[ 168.448022] d1902ff7 5f756372 74726f74 5f657275 00616572 00001039
b87c4524 00000ca3
[ 168.448022] 00000000 d1902e50 00000000 c7b48fe0 c013d81a c013d7e0
00000000 00000000
[ 168.448022] Call Trace:
[ 168.448022] [<c013d81a>] ? kthread+0x3a/0x70
[ 168.448022] [<c013d7e0>] ? kthread+0x0/0x70
[ 168.448022] [<c0103cf3>] ? kernel_thread_helper+0x7/0x14
[ 168.448022] Code: 28 85 f6 74 1c 8d 45 bc c7 45 c8 40 30 90 d1 c7 45
cc 00 00 00 00 e8 91 03 83 ee 90 a1 00 5e 90 d1 8b 1d a4 5[ 168.457352]
BUG: unable to handle kernel NULL pointer dereference at 0000001d
[ 168.457603] IP: [<d1903163>] 0xd1903163
[ 168.457776] *pde = 00000000
[ 168.457942] Oops: 0000 [#3] DEBUG_PAGEALLOC
[ 168.458141] last sysfs file: /sys/block/sda/size
[ 168.458323] Modules linked in: nfsd exportfs nfs lockd nfs_acl
auth_rpcgss sunrpc ipv6 fuse unix [last unloaded: rcutorture]
[ 168.459254]
[ 168.459350] Pid: 4151, comm: rcu_torture_fak Tainted: G D
(2.6.28 #83)
[ 168.459523] EIP: 0060:[<d1903163>] EFLAGS: 00010287 CPU: 0
[ 168.459706] EIP is at 0xd1903163
[ 168.459818] EAX: 00000001 EBX: c7dcafc0 ECX: 35502ab8 EDX: 00000021
[ 168.459949] ESI: cccccccd EDI: 00000000 EBP: c7dcafd0 ESP: c7dcafb8
[ 168.460080] DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
[ 168.460262] Process rcu_torture_fak (pid: 4151, ti=c7dca000
task=c7dcb710 task.ti=c7dca000)
[ 168.460442] Stack:
[ 168.460485] c7dbadd8 00000000 01f30e05 00002699 00000000 d19030f0
c7dcafe0 c013d81a
[ 168.460485] c013d7e0 00000000 00000000 c0103cf3 c7ca9e70 00000000
00000000 00000000
[ 168.460485] 20df0493 0a860000
[ 168.460485] Call Trace:
[ 168.460485] [<c013d81a>] ? kthread+0x3a/0x70
[ 168.460485] [<c013d7e0>] ? kthread+0x0/0x70
[ 168.460485] [<c0103cf3>] ? kernel_thread_helper+0x7/0x14
[ 168.460485] Code: c1 f7 e6 c1 ea 03 8d 14 92 01 d2 29 d1 8d 41 01 e8
23 83 ca ee 89 d8 e8 1c f7 ff ff 25 ff 03 00 00 e8 f2 c7 99 ee a1 00 5e
90 d1 <ff> 50 1c e8 75 fc ff ff e8 d0 a4 83 ee 85 c0 8d b6 00 00 00 00
[ 168.460485] EIP: [<d1903163>] 0xd1903163 SS:ESP 0068:c7dcafb8
[ 168.467531] ---[ end trace 4bbd74c2c853213b ]---
[ 168.476855] BUG: unable to handle kernel NULL pointer dereference at
00000004
[ 168.477134] IP: [<d1903233>] 0xd1903233
[ 168.477322] *pde = 00000000
[ 168.477485] Oops: 0002 [#4] DEBUG_PAGEALLOC
[ 168.477680] last sysfs file: /sys/block/sda/size
[ 168.477857] Modules linked in: nfsd exportfs nfs lockd nfs_acl
auth_rpcgss sunrpc ipv6 fuse unix [last unloaded: rcutorture]
[ 168.478774]
[ 168.478871] Pid: 4147, comm: rcu_torture_wri Tainted: G D
(2.6.28 #83)
[ 168.479045] EIP: 0060:[<d1903233>] EFLAGS: 00010202 CPU: 0
[ 168.479228] EIP is at 0xd1903233
[ 168.479340] EAX: 00000000 EBX: d19073fc ECX: 00000001 EDX: 00000000
[ 168.479472] ESI: 0000000a EDI: 00000000 EBP: c7e05fd0 ESP: c7e05fc0
[ 168.479604] DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
[ 168.479787] Process rcu_torture_wri (pid: 4147, ti=c7e05000
task=c994e710 task.ti=c7e05000)
[ 168.479968] Stack:
[ 168.480015] 00000000 00000000 00000000 d19031f0 c7e05fe0 c013d81a
c013d7e0 00000000
[ 168.480015] 00000000 c0103cf3 c7ca9e70 00000000 00000000 00000000
7574726f 725f6572
[ 168.480015] Call Trace:
[ 168.480015] [<c013d81a>] ? kthread+0x3a/0x70
[ 168.480015] [<c013d7e0>] ? kthread+0x0/0x70
[ 168.480015] [<c0103cf3>] ? kernel_thread_helper+0x7/0x14
[ 168.480015] Code: 01 00 00 ba 13 00 00 00 be 0a 00 00 00 a1 00 20 83
c0 e8 a1 21 82 ee e9 a0 00 00 00 8d 74 26 00 ff 05 6c 78 90 d1 8b 43 04
8b 13 <89> 42 04 89 10 b8 cc 59 90 d1 89 1b 89 5b 04 e8 19 a3 ca ee 83
[ 168.480015] EIP: [<d1903233>] 0xd1903233 SS:ESP 0068:c7e05fc0
[ 168.486680] ---[ end trace 4bbd74c2c853213b ]---
[ 168.486798] Kernel panic - not syncing: Fatal exception in interrupt
9 90 d1 85 db
74 12 <8b> 48 28 85 c9 74 0b 8b 55 bc 85 d2 0f 84 19 01 00 00 8d b4 26
[ 168.448022] EIP: [<d1902ebf>] 0xd1902ebf SS:ESP 0068:c7b48f84
[ 168.456684] ---[ end trace 4bbd74c2c853213b ]---


Greetings, Eric

2009-01-05 18:01:16

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [BUG] NULL pointer deref with rcutorture

On Mon, Jan 05, 2009 at 01:14:09PM +0100, Eric Sesterhenn wrote:
> * Paul E. McKenney ([email protected]) wrote:
> > On Mon, Jan 05, 2009 at 12:38:55AM +0100, Eric Sesterhenn wrote:
> > > hi,
> > >
> > > * Paul E. McKenney ([email protected]) wrote:
> > > > On Sun, Jan 04, 2009 at 03:57:26PM +0100, Eric Sesterhenn wrote:
> > > > >
> > > > > Just tell me what i need to do, I am not really familiar with ftrace.
> > > > > I am only able to test 2.6.28-04980-gb58602a, since current -git is not
> > > > > able to boot on this box :|
> > > >
> > > > Very cool!
> > > >
> > > > The idea is to have __call_rcu() in kernel/rcutree.c record the
> > > > address of the callback (argument "head") and the function (argument
> > > > "func"). In rcu_do_batch(), just before invoking list->func(list),
> > > > also record the address of the callback ("list") and the function
> > > > (again, "func").
> > > >
> > > > The new ftrace package has some mechanisms for doing this, but there is
> > > > always the old-fashioned way of using printk(), for example in
> > > > rcu_do_batch():
> > > >
> > > > prefetch(next);
> > > > if (rcu_dump_callbacks)
> > > > printk("rcu_head=%p, func=%p\n", list, func);
> > > > list->func(list);
> > > >
> > > > Initialize rcu_dump_callbacks to zero, then use a small kernel module
> > > > (or some such) to set it to one just before running your test.
> > >
> > > i did it via the ugly printk and captured it via netconsole
> >
> > Good stuff!!!
> >
> > > for the box with rcutree you can find the log here:
> > > http://www.cccmz.de/~snakebyte/200.log
> > > The interesting part seems start at 155.858923, thats when I load
> > > the rcutorture module
> >
> > When I download this, I see an 82.100963 immediately followed by an
> > 167.894017, no 155.858923. Do I have the right log?
> >
> > > with http://www.cccmz.de/~snakebyte/200_1.log i had another try,
> > > actually saw a rcu_do_batch: rcu_head=d1907720, func=(null) in the traces,
> > > box went into unresponsive mode afterwards
> >
> > Interesting. The original function is 0xd1902ad0, passed to __call_rcu()
> > at 157.364214 -- would you be able to tell me what function that
> > corresponds to? (Looks to me like a module, perhaps rcutorture?)
>
> I did the log more than once, guess this was the time from the one
> i uploaded earlier with the same name
>
> > Whatever function it is, the rcu_head definitely got corrupted some
> > time during the 8 milliseconds or so that the callback was waiting for
> > a grace period. However, it did just fine being invoked several times
> > beforehand -- this is a very popular RCU callback function, it appears.
> >
> > My guess is that the ensuing CPU 0 stall messages are due to CPU 0
> > having a heart attack trying to execute at address 0.
>
> Could the popular rcu function be registered by rcutorture, but when
> we remove the module the callback is no longer valid? I can compile
> a kernel just fine and with other stress tests i did not see any oops so
> far.

One approach would be to print out the address of rcutorture's RCU
callbacks at rcutorture module initialization time (in rcu_torture_init()
in kernel/rcutorture.c). The two callbacks are rcu_torture_cb() and
rcu_bh_torture_wakeme_after_cb(). Unless you are specifying the
"torture_type" parameter to rcutorture, only the first one should be in
use.

> > Are you swapping over NFS, or is the netpoll_send_udp() just garbage on
> > the stack?
>
> Swap is locally on a disc, nfs is configured but not used for swap.

OK, just garbage on the stack, then.

> > > And just in case it helps somehow: http://www.cccmz.de/~snakebyte/200_3.log
> >
> > Hmmm... The main recent change to kernel/rcutorture.c is the addition
> > of a reboot notifier, so that rcutorture can shut itself down gracefully
> > should the system go down while rcutorture is still running. This panic
> > occurs in rcu_torture_stutter(), which is shown below:
> >
> > static int
> > rcu_torture_stutter(void *arg)
> > {
> > VERBOSE_PRINTK_STRING("rcu_torture_stutter task started");
> > do {
> > schedule_timeout_interruptible(stutter * HZ);
> > stutter_pause_test = 1;
> > if (!kthread_should_stop() && !fullstop)
> > schedule_timeout_interruptible(stutter * HZ);
> > stutter_pause_test = 0;
> > } while (!kthread_should_stop() && !fullstop);
> > VERBOSE_PRINTK_STRING("rcu_torture_stutter task stopping");
> > return 0;
> > }
> >
> > I don't see any reason why this function would transfer control to
> > location 0x60, though I do see a bug in the new shutdown code. I
> > don't expect this to make any difference, but a patch is included
> > nevertheless.
>
> Sadly it doesnt.

:-/

> > > for the one with rcupreemt you can find the log here:
> > > http://www.cccmz.de/~snakebyte/201.log
> > > rcutorture is loaded at 87.312399
> >
> > Hmmm... Offset 0x60 seems to be a common thread. This lets
> > rcu_torture_ops off the hook, since it isn't that large.
> >
> > Anyway, could you give the attached patch a go, even though I cannot see
> > how it would help? ;-)
>
> This is with the patch on the preemt rcu box (the 201.logs)

Looking these over, I would be surprised if this is a problem in the RCU
infrastructure itself, given that it shows up with all three versions,
but it could easily be a problem in rcutorture. I should be able to get
some tests running locally later today.

Thanx, Paul

> [ 593.404728] rcu-torture:--- Start of test: nreaders=2 nfakewriters=4
> stat_interval=0 verbose=0 test_no_idle_hz=0 shuffle_interval=3 stutter=5
> irqreader=1
> [ 601.000124] BUG: unable to handle kernel paging request at d0af8a2d
> [ 601.000431] IP: [<d0af8a2d>] 0xd0af8a2d
> [ 601.000696] *pde = 0fbb4067 *pte = 00000000
> [ 601.000946] Oops: 0000 [#1] PREEMPT DEBUG_PAGEALLOC
> [ 601.001027] last sysfs file: /sys/block/ram9/range
> [ 601.001027] Modules linked in: [last unloaded: rcutorture]
> [ 601.001027]
> [ 601.001027] Pid: 5021, comm: rcu_torture_fak Tainted: G W
> (2.6.28-05692-g7d3b56b-dirty #169) System Name
> [ 601.001027] EIP: 0060:[<d0af8a2d>] EFLAGS: 00010246 CPU: 0
> [ 601.001027] EIP is at 0xd0af8a2d
> [ 601.001027] EAX: 00000000 EBX: c98acfc4 ECX: c04f2c53 EDX: c98acf74
> [ 601.001027] ESI: d0af8c0a EDI: 00000000 EBP: c98acfbc ESP: c98acfbc
> [ 601.001027] DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
> [ 601.001027] Process rcu_torture_fak (pid: 5021, ti=c98ac000
> task=cbb64100 task.ti=c98ac000)
> [ 601.001027] Stack:
> [ 601.001027] c98acfd0 d0af8c89 74816929 000024a7 00000000 c98acfe0
> c0136bcc c0136b8e
> [ 601.001027] 00000000 00000000 c0103643 c9897ea4 00000000 00000000
> 00000000 00001301
> [ 601.001027] 00008169
> [ 601.001027] Call Trace:
> [ 601.001027] [<c0136bcc>] ? kthread+0x3e/0x66
> [ 601.001027] [<c0136b8e>] ? kthread+0x0/0x66
> [ 601.001027] [<c0103643>] ? kernel_thread_helper+0x7/0x10
> [ 601.001027] Code: Bad EIP value.
> [ 601.001027] EIP: [<d0af8a2d>] 0xd0af8a2d SS:ESP 0068:c98acfbc
> [ 601.001027] ---[ end trace 4eaa2a86a8e2da22 ]---
> [ 601.006764] BUG: unable to handle kernel paging request at d0af8a2d
> [ 601.007023] IP: [<d0af8a2d>] 0xd0af8a2d
> [ 601.007265] *pde = 0fbb4067 *pte = 00000000
> [ 601.007511] Oops: 0000 [#2] PREEMPT DEBUG_PAGEALLOC
> [ 601.007716] last sysfs file: /sys/block/ram9/range
> [ 601.007716] Modules linked in: [last unloaded: rcutorture]
> [ 601.007716]
> [ 601.007716] Pid: 5020, comm: rcu_torture_fak Tainted: G D W
> (2.6.28-05692-g7d3b56b-dirty #169) System Name
> [ 601.007716] EIP: 0060:[<d0af8a2d>] EFLAGS: 00010296 CPU: 0
> [ 601.007716] EIP is at 0xd0af8a2d
> [ 601.007716] EAX: 00000000 EBX: c98adfc4 ECX: c04f2c53 EDX: c98adf74
> [ 601.007716] ESI: d0af8c0a EDI: 00000000 EBP: c98adfbc ESP: c98adfbc
> [ 601.007716] DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
> [ 601.007716] Process rcu_torture_fak (pid: 5020, ti=c98ad000
> task=cbb62700 task.ti=c98ad000)
> [ 601.007716] Stack:
> [ 601.007716] c98adfd0 d0af8c89 e5bc239c 000024ab 00000000 c98adfe0
> c0136bcc c0136b8e
> [ 601.007716] 00000000 00000000 c0103643 c9897ea4 00000000 00000000
> 00000000 00000201
> [ 601.007716] 00000674
> [ 601.007716] Call Trace:
> [ 601.007716] [<c0136bcc>] ? kthread+0x3e/0x66
> [ 601.007716] [<c0136b8e>] ? kthread+0x0/0x66
> [ 601.007716] [<c0103643>] ? kernel_thread_helper+0x7/0x10
> [ 601.007716] Code: Bad EIP value.
> [ 601.007716] EIP: [<d0af8a2d>] 0xd0af8a2d SS:ESP 0068:c98adfbc
> [ 601.007716] ---[ end trace 4eaa2a86a8e2da22 ]---
> [ 601.013281] BUG: unable to handle kernel paging request at d0af8a2d
> [ 601.013542] IP: [<d0af8a2d>] 0xd0af8a2d
> [ 601.013785] *pde = 0fbb4067 *pte = 00000000
> [ 601.014032] Oops: 0000 [#3] PREEMPT DEBUG_PAGEALLOC
> [ 601.014239] last sysfs file: /sys/block/ram9/range
> [ 601.014239] Modules linked in: [last unloaded: rcutorture]
> [ 601.014239]
> [ 601.014239] Pid: 5019, comm: rcu_torture_fak Tainted: G D W
> (2.6.28-05692-g7d3b56b-dirty #169) System Name
> [ 601.014239] EIP: 0060:[<d0af8a2d>] EFLAGS: 00010286 CPU: 0
> [ 601.014239] EIP is at 0xd0af8a2d
> [ 601.014239] EAX: 00000000 EBX: c98b2fc4 ECX: c04f2c53 EDX: c98b2f74
> [ 601.014239] ESI: d0af8c0a EDI: 00000000 EBP: c98b2fbc ESP: c98b2fbc
> [ 601.014239] DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
> [ 601.014239] Process rcu_torture_fak (pid: 5019, ti=c98b2000
> task=c9804e00 task.ti=c98b2000)
> [ 601.014239] Stack:
> [ 601.014239] c98b2fd0 d0af8c89 61a1df8f 000024a5 00000000 c98b2fe0
> c0136bcc c0136b8e
> [ 601.014239] 00000000 00000000 c0103643 c9897ea4 00000000 00000000
> 00000000 000007bf
> [ 601.014239] 00000012
> [ 601.014239] Call Trace:
> [ 601.014239] [<c0136bcc>] ? kthread+0x3e/0x66
> [ 601.014239] [<c0136b8e>] ? kthread+0x0/0x66
> [ 601.014239] [<c0103643>] ? kernel_thread_helper+0x7/0x10
> [ 601.014239] Code: Bad EIP value.
> [ 601.014239] EIP: [<d0af8a2d>] 0xd0af8a2d SS:ESP 0068:c98b2fbc
> [ 601.014239] ---[ end trace 4eaa2a86a8e2da22 ]---
>
> then i added the rcutree patch (Fix rcutree grace-period-latency bug on
> small systems) and got
>
> [ 65.968568] rcu-torture:--- Start of test: nreaders=2 nfakewriters=4
> stat_interval=0 verbose=0 test_no_idle_hz=0 shuffle_interval=3 stutter=5
> irqreader=1
> [ 68.872082] BUG: unable to handle kernel NULL pointer dereference at
> (null)
> [ 68.872455] IP: [<(null)>] (null)
> [ 68.872653] *pde = 00000000
> [ 68.872849] Oops: 0000 [#1] PREEMPT DEBUG_PAGEALLOC
> [ 68.873027] last sysfs file: /sys/block/ram9/range
> [ 68.873027] Modules linked in: [last unloaded: rcutorture]
> [ 68.873027]
> [ 68.873027] Pid: 5023, comm: bash Tainted: G W
> (2.6.28-05692-g7d3b56b-dirty #169) System Name
> [ 68.873027] EIP: 0060:[<00000000>] EFLAGS: 00010246 CPU: 0
> [ 68.873027] EIP is at 0x0
> [ 68.873027] EAX: d0afd0b8 EBX: 00000000 ECX: c01612a6 EDX: 00000006
> [ 68.873027] ESI: d0afd0b8 EDI: 0000001c EBP: c0b03fe0 ESP: c0b03fd4
> [ 68.873027] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068
> [ 68.873027] Process bash (pid: 5023, ti=c0b03000 task=c98d1a00
> task.ti=ce824000)
> [ 68.873027] Stack:
> [ 68.873027] c01612ad 00000200 00000001 c0b03ff8 c012aa97 0000000a
> ce824e98 00000046
> [ 68.873027] c012aa28 ce824ea8 c01042c2
> [ 68.873027] Call Trace:
> [ 68.873027] [<c01612ad>] ? rcu_process_callbacks+0x65/0x79
> [ 68.873027] [<c012aa97>] ? __do_softirq+0x6f/0xf6
> [ 68.873027] [<c012aa28>] ? __do_softirq+0x0/0xf6
> [ 68.873027] <IRQ> <0> [<c012a9a5>] ? irq_exit+0x40/0x7c
> [ 68.873027] [<c0110ce1>] ? smp_apic_timer_interrupt+0x68/0x73
> [ 68.873027] [<c0103521>] ? apic_timer_interrupt+0x2d/0x34
> [ 68.873027] [<c07a024c>] ? _spin_lock+0x56/0x5d
> [ 68.873027] [<c07a0000>] ? _write_unlock_bh+0x28/0x37
> [ 68.873027] [<c017ad50>] ? do_wp_page+0x38b/0x453
> [ 68.873027] [<c017be9e>] ? handle_mm_fault+0x4e8/0x54d
> [ 68.873027] [<c0139da3>] ? down_read_trylock+0x3d/0x47
> [ 68.873027] [<c07a25d3>] ? do_page_fault+0x2dc/0x601
> [ 68.873027] [<c07a22f7>] ? do_page_fault+0x0/0x601
> [ 68.873027] [<c07a0bd7>] ? error_code+0x6f/0x74
> [ 68.873027] Code: Bad EIP value.
> [ 68.873027] EIP: [<00000000>] 0x0 SS:ESP 0068:c0b03fd4
> [ 68.880266] ---[ end trace 4eaa2a86a8e2da22 ]---
> [ 68.880394] Kernel panic - not syncing: Fatal exception in interrupt
>
>
> Tried again without the rcutree patch and got this one,
> guess its just a timing thing wheter the NULL pointer deref or the other
> appears
>
> [ 99.643517] rcu-torture:--- Start of test: nreaders=2 nfakewriters=4
> stat_interval=0 verbose=0 test_no_idle_hz=0 shuffle_interval=3 stutter=5
> irqreader=1
> [ 104.149086] BUG: unable to handle kernel NULL pointer dereference at
> (null)
> [ 104.149418] IP: [<(null)>] (null)
> [ 104.149615] *pde = 00000000
> [ 104.149813] Oops: 0000 [#1] PREEMPT DEBUG_PAGEALLOC
> [ 104.150029] last sysfs file: /sys/block/ram9/range
> [ 104.150029] Modules linked in: [last unloaded: rcutorture]
> [ 104.150029]
> [ 104.150029] Pid: 5030, comm: udevd Tainted: G W
> (2.6.28-05692-g7d3b56b-dirty #169) System Name
> [ 104.150029] EIP: 0060:[<00000000>] EFLAGS: 00010246 CPU: 0
> [ 104.150029] EIP is at 0x0
> [ 104.150029] EAX: d0afcc38 EBX: 00000000 ECX: c01612a6 EDX: 00000006
> [ 104.150029] ESI: d0afcc38 EDI: 0000001c EBP: c0b03fe0 ESP: c0b03fd4
> [ 104.150029] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068
> [ 104.150029] Process udevd (pid: 5030, ti=c0b03000 task=c98cb400
> task.ti=c98b1000)
> [ 104.150029] Stack:
> [ 104.150029] c01612ad 00000200 00000001 c0b03ff8 c012aa97 0000000a
> c98b1e40 00000046
> [ 104.150029] c012aa28 c98b1e50 c01042c2
> [ 104.150029] Call Trace:
> [ 104.150029] [<c01612ad>] ? rcu_process_callbacks+0x65/0x79
> [ 104.150029] [<c012aa97>] ? __do_softirq+0x6f/0xf6
> [ 104.150029] [<c012aa28>] ? __do_softirq+0x0/0xf6
> [ 104.150029] <IRQ> <0> [<c012a9a5>] ? irq_exit+0x40/0x7c
> [ 104.150029] [<c0110ce1>] ? smp_apic_timer_interrupt+0x68/0x73
> [ 104.150029] [<c0103521>] ? apic_timer_interrupt+0x2d/0x34
> [ 104.150029] [<c016e63f>] ? find_lock_page+0x5f/0x63
> [ 104.150029] [<c0170122>] ? filemap_fault+0x9f/0x391
> [ 104.150029] [<c017a6e8>] ? __do_fault+0x40/0x31d
> [ 104.150029] [<c017bc30>] ? handle_mm_fault+0x27a/0x54d
> [ 104.150029] [<c0139da3>] ? down_read_trylock+0x3d/0x47
> [ 104.150029] [<c07a25d3>] ? do_page_fault+0x2dc/0x601
> [ 104.150029] [<c07a22f7>] ? do_page_fault+0x0/0x601
> [ 104.150029] [<c07a0bd7>] ? error_code+0x6f/0x74
> [ 104.150029] Code: Bad EIP value.
> [ 104.150029] EIP: [<00000000>] 0x0 SS:ESP 0068:c0b03fd4
> [ 104.157313] ---[ end trace 4eaa2a86a8e2da22 ]---
> [ 104.157441] Kernel panic - not syncing: Fatal exception in interrupt
>
> I enabled classic rcu and pulled current -git (without both patches) and got these:
>
> [ 67.424425] rcu-torture:--- Start of test: nreaders=2 nfakewriters=4
> stat_interval=0 verbose=0 test_no_idle_hz=0 shuffle_interval=3 stutter=5
> irqreader=1
> [ 70.205083] BUG: unable to handle kernel NULL pointer dereference at
> (null)
> [ 70.205416] IP: [<(null)>] (null)
> [ 70.205605] *pde = 00000000
> [ 70.205798] Oops: 0000 [#1] PREEMPT DEBUG_PAGEALLOC
> [ 70.206026] last sysfs file: /sys/block/ram9/range
> [ 70.206026] Modules linked in: [last unloaded: rcutorture]
> [ 70.206026]
> [ 70.206026] Pid: 1726, comm: udevd Tainted: G W
> (2.6.28-05716-gfe0bdec #170) System Name
> [ 70.206026] EIP: 0060:[<00000000>] EFLAGS: 00010202 CPU: 0
> [ 70.206026] EIP is at 0x0
> [ 70.206026] EAX: d0afc018 EBX: c10697c0 ECX: c0161905 EDX: d0afc018
> [ 70.206026] ESI: 00000000 EDI: 00000001 EBP: c0b0bfd8 ESP: c0b0bfc4
> [ 70.206026] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068
> [ 70.206026] Process udevd (pid: 1726, ti=c0b0b000 task=cee10000
> task.ti=ce80b000)
> [ 70.206026] Stack:
> [ 70.206026] c016197c 00000202 00000200 00000001 0000001c c0b0bfe0
> c0161a00 c0b0bff8
> [ 70.206026] c012acca 0000000a ce80bec4 00000046 c012ac5b ce80bed4
> c01042c2
> [ 70.206026] Call Trace:
> [ 70.206026] [<c016197c>] ? __rcu_process_callbacks+0x141/0x1ae
> [ 70.206026] [<c0161a00>] ? rcu_process_callbacks+0x17/0x28
> [ 70.206026] [<c012acca>] ? __do_softirq+0x6f/0x100
> [ 70.206026] [<c012ac5b>] ? __do_softirq+0x0/0x100
> [ 70.206026] <IRQ> <0> [<c012abdd>] ? irq_exit+0x40/0x77
> [ 70.206026] [<c0110ce9>] ? smp_apic_timer_interrupt+0x68/0x73
> [ 70.206026] [<c0103521>] ? apic_timer_interrupt+0x2d/0x34
> [ 70.206026] [<c014007b>] ? tick_do_periodic_broadcast+0xf/0x3a
> [ 70.206026] [<c018f325>] ? kmem_cache_free+0xbd/0xc5
> [ 70.206026] [<c019a45d>] ? putname+0x28/0x33
> [ 70.206026] [<c019a45d>] ? putname+0x28/0x33
> [ 70.206026] [<c019b7a0>] ? do_rmdir+0xba/0xc3
> [ 70.206026] [<c04e8b50>] ? trace_hardirqs_on_thunk+0xc/0x10
> [ 70.206026] [<c0102ecf>] ? sysenter_exit+0xf/0x16
> [ 70.206026] [<c01448f1>] ? trace_hardirqs_on_caller+0x17/0x15a
> [ 70.206026] [<c019b7f2>] ? sys_rmdir+0x15/0x17
> [ 70.206026] [<c0102ea1>] ? sysenter_do_call+0x12/0x31
> [ 70.206026] Code: Bad EIP value.
> [ 70.206026] EIP: [<00000000>] 0x0 SS:ESP 0068:c0b0bfc4
> [ 70.213814] ---[ end trace 4eaa2a86a8e2da22 ]---
> [ 70.213934] Kernel panic - not syncing: Fatal exception in interrupt
>
>
>
>
> On the other box with tree rcu and the rcutorture patch (200.logs):
>
>
> [ 168.408126] BUG: unable to handle kernel NULL pointer dereference at
> 00000029
> [ 168.408433] IP: [<d1902ebf>] 0xd1902ebf
> [ 168.408614] *pde = 00000000
> [ 168.408778] Oops: 0000 [#1] DEBUG_PAGEALLOC
> [ 168.408978] last sysfs file: /sys/block/sda/size
> [ 168.409164] Modules linked in: nfsd exportfs nfs lockd nfs_acl
> auth_rpcgss sunrpc ipv6 fuse unix [last unloaded: rcutorture]
> [ 168.410099]
> [ 168.410197] Pid: 4154, comm: rcu_torture_rea Not tainted (2.6.28 #83)
> [ 168.410332] EIP: 0060:[<d1902ebf>] EFLAGS: 00010202 CPU: 0
> [ 168.410516] EIP is at 0xd1902ebf
> [ 168.410628] EAX: 00000001 EBX: 00000001 ECX: 00000000 EDX: 00000000
> [ 168.410760] ESI: 00000000 EDI: 00000000 EBP: c7d52fd0 ESP: c7d52f84
> [ 168.410992] DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
> [ 168.411120] Process rcu_torture_rea (pid: 4154, ti=c7d52000
> task=c7b73710 task.ti=c7d52000)
> [ 168.411302] Stack:
> [ 168.411398] 00000002 00000000 c0936a24 c7b48f8c 00000001 d1903040
> 00000000 c0936000
> [ 168.411981] d1902ff7 5f756372 74726f74 5f657275 00616572 0000103a
> 2e11c1af 00000dd1
> [ 168.412021] 00000000 d1902e50 00000000 c7d52fe0 c013d81a c013d7e0
> 00000000 00000000
> [ 168.412021] Call Trace:
> [ 168.412021] [<c013d81a>] ? kthread+0x3a/0x70
> [ 168.412021] [<c013d7e0>] ? kthread+0x0/0x70
> [ 168.412021] [<c0103cf3>] ? kernel_thread_helper+0x7/0x14
> [ 168.412021] Code: 28 85 f6 74 1c 8d 45 bc c7 45 c8 40 30 90 d1 c7 45
> cc 00 00 00 00 e8 91 03 83 ee 90 a1 00 5e 90 d1 8b 1d a4 59 90 d1 85 db
> 74 12 <8b> 48 28 85 c9 74 0b 8b 55 bc 85 d2 0f 84 19 01 00 00 8d b4 26
> [ 168.412021] EIP: [<d1902ebf>] 0xd1902ebf SS:ESP 0068:c7d52f84
> [ 168.418864] ---[ end trace 4bbd74c2c853213b ]---
> [ 168.429775] __call_rcu: rcu_head=c3f47f24, func=c01b5a70
> [ 168.429958] __call_rcu: rcu_head=c7bbecac, func=c0143580
> [ 168.430129] __call_rcu: rcu_head=c7dbeea0, func=c01a6ae0
> [ 168.431826] __call_rcu: rcu_head=c7de9328, func=c01a6ae0
> [ 168.432089] rcu_do_batch: rcu_head=c3f47f24, func=c01b5a70
> [ 168.432869] __call_rcu: rcu_head=c994fe18, func=c01b9c20
> [ 168.436064] rcu_do_batch: rcu_head=c7bbecac, func=c0143580
> [ 168.436226] rcu_do_batch: rcu_head=c7dbeea0, func=c01a6ae0
> [ 168.444046] rcu_do_batch: rcu_head=c7de9328, func=c01a6ae0
> [ 168.444193] rcu_do_batch: rcu_head=c994fe18, func=c01b9c20
> [ 168.445866] BUG: unable to handle kernel NULL pointer dereference at
> 00000029
> [ 168.446126] IP: [<d1902ebf>] 0xd1902ebf
> [ 168.446315] *pde = 00000000
> [ 168.446488] Oops: 0000 [#2] DEBUG_PAGEALLOC
> [ 168.446692] last sysfs file: /sys/block/sda/size
> [ 168.446883] Modules linked in: nfsd exportfs nfs lockd nfs_acl
> auth_rpcgss sunrpc ipv6 fuse unix [last unloaded: rcutorture]
> [ 168.447818]
> [ 168.447916] Pid: 4153, comm: rcu_torture_rea Tainted: G D
> (2.6.28 #83)
> [ 168.448022] EIP: 0060:[<d1902ebf>] EFLAGS: 00010202 CPU: 0
> [ 168.448022] EIP is at 0xd1902ebf
> [ 168.448022] EAX: 00000001 EBX: 00000001 ECX: 00000000 EDX: 00000000
> [ 168.448022] ESI: 00000000 EDI: 00000000 EBP: c7b48fd0 ESP: c7b48f84
> [ 168.448022] DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
> [ 168.448022] Process rcu_torture_rea (pid: 4153, ti=c7b48000
> task=c7b5c710 task.ti=c7b48000)
> [ 168.448022] Stack:
> [ 168.448022] 00000002 00000000 c7d52f8c c7de5f74 00000001 d1903040
> 00000000 c0936000
> [ 168.448022] d1902ff7 5f756372 74726f74 5f657275 00616572 00001039
> b87c4524 00000ca3
> [ 168.448022] 00000000 d1902e50 00000000 c7b48fe0 c013d81a c013d7e0
> 00000000 00000000
> [ 168.448022] Call Trace:
> [ 168.448022] [<c013d81a>] ? kthread+0x3a/0x70
> [ 168.448022] [<c013d7e0>] ? kthread+0x0/0x70
> [ 168.448022] [<c0103cf3>] ? kernel_thread_helper+0x7/0x14
> [ 168.448022] Code: 28 85 f6 74 1c 8d 45 bc c7 45 c8 40 30 90 d1 c7 45
> cc 00 00 00 00 e8 91 03 83 ee 90 a1 00 5e 90 d1 8b 1d a4 5[ 168.457352]
> BUG: unable to handle kernel NULL pointer dereference at 0000001d
> [ 168.457603] IP: [<d1903163>] 0xd1903163
> [ 168.457776] *pde = 00000000
> [ 168.457942] Oops: 0000 [#3] DEBUG_PAGEALLOC
> [ 168.458141] last sysfs file: /sys/block/sda/size
> [ 168.458323] Modules linked in: nfsd exportfs nfs lockd nfs_acl
> auth_rpcgss sunrpc ipv6 fuse unix [last unloaded: rcutorture]
> [ 168.459254]
> [ 168.459350] Pid: 4151, comm: rcu_torture_fak Tainted: G D
> (2.6.28 #83)
> [ 168.459523] EIP: 0060:[<d1903163>] EFLAGS: 00010287 CPU: 0
> [ 168.459706] EIP is at 0xd1903163
> [ 168.459818] EAX: 00000001 EBX: c7dcafc0 ECX: 35502ab8 EDX: 00000021
> [ 168.459949] ESI: cccccccd EDI: 00000000 EBP: c7dcafd0 ESP: c7dcafb8
> [ 168.460080] DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
> [ 168.460262] Process rcu_torture_fak (pid: 4151, ti=c7dca000
> task=c7dcb710 task.ti=c7dca000)
> [ 168.460442] Stack:
> [ 168.460485] c7dbadd8 00000000 01f30e05 00002699 00000000 d19030f0
> c7dcafe0 c013d81a
> [ 168.460485] c013d7e0 00000000 00000000 c0103cf3 c7ca9e70 00000000
> 00000000 00000000
> [ 168.460485] 20df0493 0a860000
> [ 168.460485] Call Trace:
> [ 168.460485] [<c013d81a>] ? kthread+0x3a/0x70
> [ 168.460485] [<c013d7e0>] ? kthread+0x0/0x70
> [ 168.460485] [<c0103cf3>] ? kernel_thread_helper+0x7/0x14
> [ 168.460485] Code: c1 f7 e6 c1 ea 03 8d 14 92 01 d2 29 d1 8d 41 01 e8
> 23 83 ca ee 89 d8 e8 1c f7 ff ff 25 ff 03 00 00 e8 f2 c7 99 ee a1 00 5e
> 90 d1 <ff> 50 1c e8 75 fc ff ff e8 d0 a4 83 ee 85 c0 8d b6 00 00 00 00
> [ 168.460485] EIP: [<d1903163>] 0xd1903163 SS:ESP 0068:c7dcafb8
> [ 168.467531] ---[ end trace 4bbd74c2c853213b ]---
> [ 168.476855] BUG: unable to handle kernel NULL pointer dereference at
> 00000004
> [ 168.477134] IP: [<d1903233>] 0xd1903233
> [ 168.477322] *pde = 00000000
> [ 168.477485] Oops: 0002 [#4] DEBUG_PAGEALLOC
> [ 168.477680] last sysfs file: /sys/block/sda/size
> [ 168.477857] Modules linked in: nfsd exportfs nfs lockd nfs_acl
> auth_rpcgss sunrpc ipv6 fuse unix [last unloaded: rcutorture]
> [ 168.478774]
> [ 168.478871] Pid: 4147, comm: rcu_torture_wri Tainted: G D
> (2.6.28 #83)
> [ 168.479045] EIP: 0060:[<d1903233>] EFLAGS: 00010202 CPU: 0
> [ 168.479228] EIP is at 0xd1903233
> [ 168.479340] EAX: 00000000 EBX: d19073fc ECX: 00000001 EDX: 00000000
> [ 168.479472] ESI: 0000000a EDI: 00000000 EBP: c7e05fd0 ESP: c7e05fc0
> [ 168.479604] DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
> [ 168.479787] Process rcu_torture_wri (pid: 4147, ti=c7e05000
> task=c994e710 task.ti=c7e05000)
> [ 168.479968] Stack:
> [ 168.480015] 00000000 00000000 00000000 d19031f0 c7e05fe0 c013d81a
> c013d7e0 00000000
> [ 168.480015] 00000000 c0103cf3 c7ca9e70 00000000 00000000 00000000
> 7574726f 725f6572
> [ 168.480015] Call Trace:
> [ 168.480015] [<c013d81a>] ? kthread+0x3a/0x70
> [ 168.480015] [<c013d7e0>] ? kthread+0x0/0x70
> [ 168.480015] [<c0103cf3>] ? kernel_thread_helper+0x7/0x14
> [ 168.480015] Code: 01 00 00 ba 13 00 00 00 be 0a 00 00 00 a1 00 20 83
> c0 e8 a1 21 82 ee e9 a0 00 00 00 8d 74 26 00 ff 05 6c 78 90 d1 8b 43 04
> 8b 13 <89> 42 04 89 10 b8 cc 59 90 d1 89 1b 89 5b 04 e8 19 a3 ca ee 83
> [ 168.480015] EIP: [<d1903233>] 0xd1903233 SS:ESP 0068:c7e05fc0
> [ 168.486680] ---[ end trace 4bbd74c2c853213b ]---
> [ 168.486798] Kernel panic - not syncing: Fatal exception in interrupt
> 9 90 d1 85 db
> 74 12 <8b> 48 28 85 c9 74 0b 8b 55 bc 85 d2 0f 84 19 01 00 00 8d b4 26
> [ 168.448022] EIP: [<d1902ebf>] 0xd1902ebf SS:ESP 0068:c7b48f84
> [ 168.456684] ---[ end trace 4bbd74c2c853213b ]---
>
>
> Greetings, Eric

2009-01-05 18:57:20

by Eric Sesterhenn

[permalink] [raw]
Subject: Re: [BUG] NULL pointer deref with rcutorture

* Paul E. McKenney ([email protected]) wrote:
> On Mon, Jan 05, 2009 at 01:14:09PM +0100, Eric Sesterhenn wrote:
> > * Paul E. McKenney ([email protected]) wrote:
> >
> > Could the popular rcu function be registered by rcutorture, but when
> > we remove the module the callback is no longer valid? I can compile
> > a kernel just fine and with other stress tests i did not see any oops so
> > far.
>
> One approach would be to print out the address of rcutorture's RCU
> callbacks at rcutorture module initialization time (in rcu_torture_init()
> in kernel/rcutorture.c). The two callbacks are rcu_torture_cb() and
> rcu_bh_torture_wakeme_after_cb(). Unless you are specifying the
> "torture_type" parameter to rcutorture, only the first one should be in
> use.

with a printk(KERN_ERR "rcu_torture_cb: %p rcu_bh_torture_wakeme_after_cb:
%p\n", rcu_torture_cb, rcu_bh_torture_wakeme_after_cb);



[ 65.135468] rcu_torture_cb: d0af7d1b rcu_bh_torture_wakeme_after_cb:
d0af7bec
[ 65.135672] rcu-torture:--- Start of test: nreaders=2 nfakewriters=4
stat_interval=0 verbose=0 test_no_idle_hz=0 shuffle_interval=3 stutter=5
irqreader=1
[ 71.171603] BUG: unable to handle kernel NULL pointer dereference at
(null)
[ 71.171954] IP: [<d0af7a0f>] 0xd0af7a0f
[ 71.192822] *pde = 00000000
[ 71.196513] Oops: 0002 [#1] PREEMPT DEBUG_PAGEALLOC
[ 71.196826] last sysfs file: /sys/block/ram9/range
[ 71.197010] Modules linked in: [last unloaded: rcutorture]
[ 71.197010]
[ 71.197010] Pid: 4861, comm: rcu_torture_wri Tainted: G W
(2.6.28-05716-gfe0bdec-dirty #171) System Name
[ 71.197010] EIP: 0060:[<d0af7a0f>] EFLAGS: 00010282 CPU: 0
[ 71.197010] EIP is at 0xd0af7a0f
[ 71.197010] EAX: 00000000 EBX: d0afbc20 ECX: c04f5cef EDX: c98abf7c
[ 71.197010] ESI: d0af7df0 EDI: 00000000 EBP: c98abfc4 ESP: c98abfc4
[ 71.197010] DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
[ 71.197010] Process rcu_torture_wri (pid: 4861, ti=c98ab000
task=c9890d00 task.ti=c98ab000)
[ 71.197010] Stack:
[ 71.197010] c98abfd0 d0af7eeb 00000000 c98abfe0 c0137364 c0137326
00000000 00000000
[ 71.197010] c0103643 c981fea4 00000000 00000000 00000000 00000000
00000000
[ 71.197010] Call Trace:
[ 71.197010] [<c0137364>] ? kthread+0x3e/0x66
[ 71.197010] [<c0137326>] ? kthread+0x0/0x66
[ 71.197010] [<c0103643>] ? kernel_thread_helper+0x7/0x10
[ 71.197010] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 <00> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 71.197010] EIP: [<d0af7a0f>] 0xd0af7a0f SS:ESP 0068:c98abfc4
[ 71.301103] ---[ end trace 4eaa2a86a8e2da22 ]---

If i interpret this correctly, this corresponds to

000009e8 <rcu_stutter_wait>:
9e8: 55 push %ebp
9e9: 89 e5 mov %esp,%ebp
9eb: e8 fc ff ff ff call 9ec <rcu_stutter_wait+0x4>
9f0: eb 1d jmp a0f <rcu_stutter_wait+0x27>
9f2: 83 3d 00 00 00 00 00 cmpl $0x0,0x0
9f9: b8 01 00 00 00 mov $0x1,%eax
9fe: 75 0a jne a0a <rcu_stutter_wait+0x22>
a00: b8 e8 03 00 00 mov $0x3e8,%eax
a05: e8 fc ff ff ff call a06 <rcu_stutter_wait+0x1e>
a0a: e8 fc ff ff ff call a0b <rcu_stutter_wait+0x23>
a0f: 83 3d 6c 00 00 00 00 cmpl $0x0,0x6c
^---------- this line
a16: 75 09 jne a21 <rcu_stutter_wait+0x39>
a18: 83 3d 00 00 00 00 00 cmpl $0x0,0x0
a1f: 75 09 jne a2a <rcu_stutter_wait+0x42>
a21: 83 3d 50 1a 00 00 00 cmpl $0x0,0x1a50
a28: 74 c8 je 9f2 <rcu_stutter_wait+0xa>
a2a: 5d pop %ebp
a2b: c3 ret

Greetings, Eric

2009-01-05 19:36:22

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [BUG] NULL pointer deref with rcutorture

On Mon, Jan 05, 2009 at 07:56:55PM +0100, Eric Sesterhenn wrote:
> * Paul E. McKenney ([email protected]) wrote:
> > On Mon, Jan 05, 2009 at 01:14:09PM +0100, Eric Sesterhenn wrote:
> > > * Paul E. McKenney ([email protected]) wrote:
> > >
> > > Could the popular rcu function be registered by rcutorture, but when
> > > we remove the module the callback is no longer valid? I can compile
> > > a kernel just fine and with other stress tests i did not see any oops so
> > > far.
> >
> > One approach would be to print out the address of rcutorture's RCU
> > callbacks at rcutorture module initialization time (in rcu_torture_init()
> > in kernel/rcutorture.c). The two callbacks are rcu_torture_cb() and
> > rcu_bh_torture_wakeme_after_cb(). Unless you are specifying the
> > "torture_type" parameter to rcutorture, only the first one should be in
> > use.
>
> with a printk(KERN_ERR "rcu_torture_cb: %p rcu_bh_torture_wakeme_after_cb:
> %p\n", rcu_torture_cb, rcu_bh_torture_wakeme_after_cb);

Cool!

> [ 65.135468] rcu_torture_cb: d0af7d1b rcu_bh_torture_wakeme_after_cb:
> d0af7bec
> [ 65.135672] rcu-torture:--- Start of test: nreaders=2 nfakewriters=4
> stat_interval=0 verbose=0 test_no_idle_hz=0 shuffle_interval=3 stutter=5
> irqreader=1
> [ 71.171603] BUG: unable to handle kernel NULL pointer dereference at
> (null)
> [ 71.171954] IP: [<d0af7a0f>] 0xd0af7a0f
> [ 71.192822] *pde = 00000000
> [ 71.196513] Oops: 0002 [#1] PREEMPT DEBUG_PAGEALLOC
> [ 71.196826] last sysfs file: /sys/block/ram9/range
> [ 71.197010] Modules linked in: [last unloaded: rcutorture]
> [ 71.197010]
> [ 71.197010] Pid: 4861, comm: rcu_torture_wri Tainted: G W
> (2.6.28-05716-gfe0bdec-dirty #171) System Name
> [ 71.197010] EIP: 0060:[<d0af7a0f>] EFLAGS: 00010282 CPU: 0
> [ 71.197010] EIP is at 0xd0af7a0f
> [ 71.197010] EAX: 00000000 EBX: d0afbc20 ECX: c04f5cef EDX: c98abf7c
> [ 71.197010] ESI: d0af7df0 EDI: 00000000 EBP: c98abfc4 ESP: c98abfc4
> [ 71.197010] DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
> [ 71.197010] Process rcu_torture_wri (pid: 4861, ti=c98ab000
> task=c9890d00 task.ti=c98ab000)
> [ 71.197010] Stack:
> [ 71.197010] c98abfd0 d0af7eeb 00000000 c98abfe0 c0137364 c0137326
> 00000000 00000000
> [ 71.197010] c0103643 c981fea4 00000000 00000000 00000000 00000000
> 00000000
> [ 71.197010] Call Trace:
> [ 71.197010] [<c0137364>] ? kthread+0x3e/0x66
> [ 71.197010] [<c0137326>] ? kthread+0x0/0x66
> [ 71.197010] [<c0103643>] ? kernel_thread_helper+0x7/0x10
> [ 71.197010] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 00 00 <00> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> [ 71.197010] EIP: [<d0af7a0f>] 0xd0af7a0f SS:ESP 0068:c98abfc4
> [ 71.301103] ---[ end trace 4eaa2a86a8e2da22 ]---
>
> If i interpret this correctly, this corresponds to
>
> 000009e8 <rcu_stutter_wait>:
> 9e8: 55 push %ebp
> 9e9: 89 e5 mov %esp,%ebp
> 9eb: e8 fc ff ff ff call 9ec <rcu_stutter_wait+0x4>

Wow!!! Am I reading this correctly? Does the above "call" instruction
-really- call one byte into itself? That is what the hex for the x86
instruction -looks- like it is doing, but I cannot see what would have
possessed the compiler to generate this code.

When I compile on a 32-bit x86 machine, I don't see the above "call"
instruction. Other than that, the code I see looks consistent.

> 9f0: eb 1d jmp a0f <rcu_stutter_wait+0x27>
> 9f2: 83 3d 00 00 00 00 00 cmpl $0x0,0x0
> 9f9: b8 01 00 00 00 mov $0x1,%eax
> 9fe: 75 0a jne a0a <rcu_stutter_wait+0x22>
> a00: b8 e8 03 00 00 mov $0x3e8,%eax
> a05: e8 fc ff ff ff call a06 <rcu_stutter_wait+0x1e>
> a0a: e8 fc ff ff ff call a0b <rcu_stutter_wait+0x23>
> a0f: 83 3d 6c 00 00 00 00 cmpl $0x0,0x6c
> ^---------- this line

This looks like the first test in the "while" loop.

> a16: 75 09 jne a21 <rcu_stutter_wait+0x39>
> a18: 83 3d 00 00 00 00 00 cmpl $0x0,0x0
> a1f: 75 09 jne a2a <rcu_stutter_wait+0x42>
> a21: 83 3d 50 1a 00 00 00 cmpl $0x0,0x1a50
> a28: 74 c8 je 9f2 <rcu_stutter_wait+0xa>
> a2a: 5d pop %ebp
> a2b: c3 ret

The corresponding C code is as follows:

static void
rcu_stutter_wait(void)
{
while ((stutter_pause_test || !rcutorture_runnable) && !fullstop) {
if (rcutorture_runnable)
schedule_timeout_interruptible(1);
else
schedule_timeout_interruptible(round_jiffies_relative(HZ));
}
}

I don't see much opportunity for a page fault here... This is the
binary I get when I compile it, though not as a module:

0000085a <rcu_stutter_wait>:
85a: 55 push %ebp
85b: 89 e5 mov %esp,%ebp
85d: eb 1d jmp 87c <rcu_stutter_wait+0x22>
85f: 83 3d 00 00 00 00 00 cmpl $0x0,0x0
866: b8 01 00 00 00 mov $0x1,%eax
86b: 75 0a jne 877 <rcu_stutter_wait+0x1d>
86d: b8 e8 03 00 00 mov $0x3e8,%eax
872: e8 fc ff ff ff call 873 <rcu_stutter_wait+0x19>
877: e8 fc ff ff ff call 878 <rcu_stutter_wait+0x1e>
87c: 83 3d 14 00 00 00 00 cmpl $0x0,0x14
883: 75 09 jne 88e <rcu_stutter_wait+0x34>
885: 83 3d 00 00 00 00 00 cmpl $0x0,0x0
88c: 75 09 jne 897 <rcu_stutter_wait+0x3d>
88e: 83 3d 08 1a 00 00 00 cmpl $0x0,0x1a08
895: 74 c8 je 85f <rcu_stutter_wait+0x5>
897: 5d pop %ebp
898: c3 ret

I confess, I am confused!!!

Thanx, Paul

2009-01-05 20:02:07

by Eric Sesterhenn

[permalink] [raw]
Subject: Re: [BUG] NULL pointer deref with rcutorture

hi,

* Paul E. McKenney ([email protected]) wrote:
> On Mon, Jan 05, 2009 at 07:56:55PM +0100, Eric Sesterhenn wrote:
> > * Paul E. McKenney ([email protected]) wrote:
> > [ 65.135468] rcu_torture_cb: d0af7d1b rcu_bh_torture_wakeme_after_cb:
> > d0af7bec
> > [ 65.135672] rcu-torture:--- Start of test: nreaders=2 nfakewriters=4
> > stat_interval=0 verbose=0 test_no_idle_hz=0 shuffle_interval=3 stutter=5
> > irqreader=1
> > [ 71.171603] BUG: unable to handle kernel NULL pointer dereference at
> > (null)
> > [ 71.171954] IP: [<d0af7a0f>] 0xd0af7a0f
> > [ 71.192822] *pde = 00000000
> > [ 71.196513] Oops: 0002 [#1] PREEMPT DEBUG_PAGEALLOC
> > [ 71.196826] last sysfs file: /sys/block/ram9/range
> > [ 71.197010] Modules linked in: [last unloaded: rcutorture]
> > [ 71.197010]
> > [ 71.197010] Pid: 4861, comm: rcu_torture_wri Tainted: G W
> > (2.6.28-05716-gfe0bdec-dirty #171) System Name
> > [ 71.197010] EIP: 0060:[<d0af7a0f>] EFLAGS: 00010282 CPU: 0
> > [ 71.197010] EIP is at 0xd0af7a0f
> > [ 71.197010] EAX: 00000000 EBX: d0afbc20 ECX: c04f5cef EDX: c98abf7c
> > [ 71.197010] ESI: d0af7df0 EDI: 00000000 EBP: c98abfc4 ESP: c98abfc4
> > [ 71.197010] DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
> > [ 71.197010] Process rcu_torture_wri (pid: 4861, ti=c98ab000
> > task=c9890d00 task.ti=c98ab000)
> > [ 71.197010] Stack:
> > [ 71.197010] c98abfd0 d0af7eeb 00000000 c98abfe0 c0137364 c0137326
> > 00000000 00000000
> > [ 71.197010] c0103643 c981fea4 00000000 00000000 00000000 00000000
> > 00000000
> > [ 71.197010] Call Trace:
> > [ 71.197010] [<c0137364>] ? kthread+0x3e/0x66
> > [ 71.197010] [<c0137326>] ? kthread+0x0/0x66
> > [ 71.197010] [<c0103643>] ? kernel_thread_helper+0x7/0x10
> > [ 71.197010] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > 00 00 <00> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > [ 71.197010] EIP: [<d0af7a0f>] 0xd0af7a0f SS:ESP 0068:c98abfc4
> > [ 71.301103] ---[ end trace 4eaa2a86a8e2da22 ]---
> >
> > If i interpret this correctly, this corresponds to
> >
> > 000009e8 <rcu_stutter_wait>:
> > 9e8: 55 push %ebp
> > 9e9: 89 e5 mov %esp,%ebp
> > 9eb: e8 fc ff ff ff call 9ec <rcu_stutter_wait+0x4>
>
> Wow!!! Am I reading this correctly? Does the above "call" instruction
> -really- call one byte into itself? That is what the hex for the x86
> instruction -looks- like it is doing, but I cannot see what would have
> possessed the compiler to generate this code.

Compiler is gcc version 4.2.4 (Ubuntu 4.2.4-1ubuntu3)


> When I compile on a 32-bit x86 machine, I don't see the above "call"
> instruction. Other than that, the code I see looks consistent.
>
> > 9f0: eb 1d jmp a0f <rcu_stutter_wait+0x27>
> > 9f2: 83 3d 00 00 00 00 00 cmpl $0x0,0x0
> > 9f9: b8 01 00 00 00 mov $0x1,%eax
> > 9fe: 75 0a jne a0a <rcu_stutter_wait+0x22>
> > a00: b8 e8 03 00 00 mov $0x3e8,%eax
> > a05: e8 fc ff ff ff call a06 <rcu_stutter_wait+0x1e>
> > a0a: e8 fc ff ff ff call a0b <rcu_stutter_wait+0x23>
> > a0f: 83 3d 6c 00 00 00 00 cmpl $0x0,0x6c
> > ^---------- this line
>
> This looks like the first test in the "while" loop.
>
> > a16: 75 09 jne a21 <rcu_stutter_wait+0x39>
> > a18: 83 3d 00 00 00 00 00 cmpl $0x0,0x0
> > a1f: 75 09 jne a2a <rcu_stutter_wait+0x42>
> > a21: 83 3d 50 1a 00 00 00 cmpl $0x0,0x1a50
> > a28: 74 c8 je 9f2 <rcu_stutter_wait+0xa>
> > a2a: 5d pop %ebp
> > a2b: c3 ret
>
> The corresponding C code is as follows:
>
> static void
> rcu_stutter_wait(void)
> {
> while ((stutter_pause_test || !rcutorture_runnable) && !fullstop) {
> if (rcutorture_runnable)
> schedule_timeout_interruptible(1);
> else
> schedule_timeout_interruptible(round_jiffies_relative(HZ));
> }
> }
>
> I don't see much opportunity for a page fault here... This is the
> binary I get when I compile it, though not as a module:
>
> 0000085a <rcu_stutter_wait>:
> 85a: 55 push %ebp
> 85b: 89 e5 mov %esp,%ebp
> 85d: eb 1d jmp 87c <rcu_stutter_wait+0x22>
> 85f: 83 3d 00 00 00 00 00 cmpl $0x0,0x0
> 866: b8 01 00 00 00 mov $0x1,%eax
> 86b: 75 0a jne 877 <rcu_stutter_wait+0x1d>
> 86d: b8 e8 03 00 00 mov $0x3e8,%eax
> 872: e8 fc ff ff ff call 873 <rcu_stutter_wait+0x19>
> 877: e8 fc ff ff ff call 878 <rcu_stutter_wait+0x1e>
> 87c: 83 3d 14 00 00 00 00 cmpl $0x0,0x14
> 883: 75 09 jne 88e <rcu_stutter_wait+0x34>
> 885: 83 3d 00 00 00 00 00 cmpl $0x0,0x0
> 88c: 75 09 jne 897 <rcu_stutter_wait+0x3d>
> 88e: 83 3d 08 1a 00 00 00 cmpl $0x0,0x1a08
> 895: 74 c8 je 85f <rcu_stutter_wait+0x5>
> 897: 5d pop %ebp
> 898: c3 ret
>
> I confess, I am confused!!!

on the other box with a different gcc version

gcc version 4.3.2 (Ubuntu 4.3.2-1ubuntu11)

d1902e90 is the start of rcu_stutter_wait

[ 533.391719] d087e000 d1902e90
[ 533.392294] rcu-torture:--- Start of test: nreaders=2 nfakewriters=4 stat_interval=0 verbose=0 test_no_idle_hz=0 shuffle_interval=3 stutter=5 irqreader=1
[ 541.000139] BUG: unable to handle kernel paging request at d1902efd
[ 541.000423] IP: [<d1902efd>] 0xd1902efd
[ 541.000660] *pde = 0f08f067 *pte = 00000000
[ 541.000867] Oops: 0000 [#1] DEBUG_PAGEALLOC
[ 541.001126] last sysfs file: /sys/block/sda/size
[ 541.001246] Modules linked in: nfsd exportfs nfs lockd nfs_acl auth_rpcgss sunrpc ipv6 fuse unix [last unloaded: rcutorture]
[ 541.002235]
[ 541.002334] Pid: 5292, comm: rcu_torture_wri Not tainted (2.6.28 #84)
[ 541.002470] EIP: 0060:[<d1902efd>] EFLAGS: 00010296 CPU: 0
[ 541.002598] EIP is at 0xd1902efd
[ 541.002767] EAX: 00000000 EBX: d19073c0 ECX: 00000000 EDX: 00000000
[ 541.002900] ESI: 0000000a EDI: 00000000 EBP: c7b63fb8 ESP: c7b63fb8
[ 541.003033] DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
[ 541.003160] Process rcu_torture_wri (pid: 5292, ti=c7b63000 task=c7b09710 task.ti=c7b63000)
[ 541.003400] Stack:
[ 541.003497] c7b63fd0 d19032c1 00000000 00000000 00000000 d1903200 c7b63fe0 c013d80a
[ 541.004022] c013d7d0 00000000 00000000 c0103cf3 cef6ee70 00000000 00000000 00000000
[ 541.004022] 00000201 000004b4
[ 541.004022] Call Trace:
[ 541.004022] [<c013d80a>] ? kthread+0x3a/0x70
[ 541.004022] [<c013d7d0>] ? kthread+0x0/0x70
[ 541.004022] [<c0103cf3>] ? kernel_thread_helper+0x7/0x14
[ 541.004022] Code: Bad EIP value.
[ 541.004022] EIP: [<d1902efd>] 0xd1902efd SS:ESP 0068:c7b63fb8
[ 541.004022] ---[ end trace cb3b10c2bb94b4e3 ]---


00000e90 <rcu_stutter_wait>:
e90: 55 push %ebp
e91: 89 e5 mov %esp,%ebp
e93: 90 nop
e94: 8d 74 26 00 lea 0x0(%esi,%eiz,1),%esi
e98: a1 98 00 00 00 mov 0x98,%eax
e9d: 85 c0 test %eax,%eax
e9f: 75 09 jne eaa <rcu_stutter_wait+0x1a>
ea1: a1 00 00 00 00 mov 0x0,%eax
ea6: 85 c0 test %eax,%eax
ea8: 75 36 jne ee0 <rcu_stutter_wait+0x50>
eaa: a1 88 1a 00 00 mov 0x1a88,%eax
eaf: 85 c0 test %eax,%eax
eb1: 75 2d jne ee0 <rcu_stutter_wait+0x50>
eb3: 8b 15 00 00 00 00 mov 0x0,%edx
eb9: 85 d2 test %edx,%edx
ebb: 74 2b je ee8 <rcu_stutter_wait+0x58>
ebd: b8 01 00 00 00 mov $0x1,%eax
ec2: e8 fc ff ff ff call ec3 <rcu_stutter_wait+0x33>
ec7: a1 98 00 00 00 mov 0x98,%eax
ecc: 85 c0 test %eax,%eax
ece: 74 d1 je ea1 <rcu_stutter_wait+0x11>
ed0: a1 88 1a 00 00 mov 0x1a88,%eax
ed5: 85 c0 test %eax,%eax
ed7: 74 da je eb3 <rcu_stutter_wait+0x23>
ed9: 8d b4 26 00 00 00 00 lea 0x0(%esi,%eiz,1),%esi
ee0: 5d pop %ebp
ee1: c3 ret
ee2: 8d b6 00 00 00 00 lea 0x0(%esi),%esi
ee8: b8 fa 00 00 00 mov $0xfa,%eax
eed: e8 fc ff ff ff call eee <rcu_stutter_wait+0x5e>
ef2: 8d b6 00 00 00 00 lea 0x0(%esi),%esi
ef8: e8 fc ff ff ff call ef9 <rcu_stutter_wait+0x69>
efd: 8d 76 00 lea 0x0(%esi),%esi
^------------- here

This one looks more like it can explain a page fault

f00: eb 96 jmp e98 <rcu_stutter_wait+0x8>
f02: 8d b4 26 00 00 00 00 lea 0x0(%esi,%eiz,1),%esi
f09: 8d bc 27 00 00 00 00 lea 0x0(%edi,%eiz,1),%edi

Greetings, Eric

2009-01-05 20:16:53

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [BUG] NULL pointer deref with rcutorture

On Mon, Jan 05, 2009 at 09:01:45PM +0100, Eric Sesterhenn wrote:
> hi,
>
> * Paul E. McKenney ([email protected]) wrote:
> > On Mon, Jan 05, 2009 at 07:56:55PM +0100, Eric Sesterhenn wrote:
> > > * Paul E. McKenney ([email protected]) wrote:
> > > [ 65.135468] rcu_torture_cb: d0af7d1b rcu_bh_torture_wakeme_after_cb:
> > > d0af7bec
> > > [ 65.135672] rcu-torture:--- Start of test: nreaders=2 nfakewriters=4
> > > stat_interval=0 verbose=0 test_no_idle_hz=0 shuffle_interval=3 stutter=5
> > > irqreader=1
> > > [ 71.171603] BUG: unable to handle kernel NULL pointer dereference at
> > > (null)
> > > [ 71.171954] IP: [<d0af7a0f>] 0xd0af7a0f
> > > [ 71.192822] *pde = 00000000
> > > [ 71.196513] Oops: 0002 [#1] PREEMPT DEBUG_PAGEALLOC
> > > [ 71.196826] last sysfs file: /sys/block/ram9/range
> > > [ 71.197010] Modules linked in: [last unloaded: rcutorture]
> > > [ 71.197010]
> > > [ 71.197010] Pid: 4861, comm: rcu_torture_wri Tainted: G W
> > > (2.6.28-05716-gfe0bdec-dirty #171) System Name
> > > [ 71.197010] EIP: 0060:[<d0af7a0f>] EFLAGS: 00010282 CPU: 0
> > > [ 71.197010] EIP is at 0xd0af7a0f
> > > [ 71.197010] EAX: 00000000 EBX: d0afbc20 ECX: c04f5cef EDX: c98abf7c
> > > [ 71.197010] ESI: d0af7df0 EDI: 00000000 EBP: c98abfc4 ESP: c98abfc4
> > > [ 71.197010] DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
> > > [ 71.197010] Process rcu_torture_wri (pid: 4861, ti=c98ab000
> > > task=c9890d00 task.ti=c98ab000)
> > > [ 71.197010] Stack:
> > > [ 71.197010] c98abfd0 d0af7eeb 00000000 c98abfe0 c0137364 c0137326
> > > 00000000 00000000
> > > [ 71.197010] c0103643 c981fea4 00000000 00000000 00000000 00000000
> > > 00000000
> > > [ 71.197010] Call Trace:
> > > [ 71.197010] [<c0137364>] ? kthread+0x3e/0x66
> > > [ 71.197010] [<c0137326>] ? kthread+0x0/0x66
> > > [ 71.197010] [<c0103643>] ? kernel_thread_helper+0x7/0x10
> > > [ 71.197010] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > > 00 00 <00> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > > [ 71.197010] EIP: [<d0af7a0f>] 0xd0af7a0f SS:ESP 0068:c98abfc4
> > > [ 71.301103] ---[ end trace 4eaa2a86a8e2da22 ]---
> > >
> > > If i interpret this correctly, this corresponds to
> > >
> > > 000009e8 <rcu_stutter_wait>:
> > > 9e8: 55 push %ebp
> > > 9e9: 89 e5 mov %esp,%ebp
> > > 9eb: e8 fc ff ff ff call 9ec <rcu_stutter_wait+0x4>
> >
> > Wow!!! Am I reading this correctly? Does the above "call" instruction
> > -really- call one byte into itself? That is what the hex for the x86
> > instruction -looks- like it is doing, but I cannot see what would have
> > possessed the compiler to generate this code.
>
> Compiler is gcc version 4.2.4 (Ubuntu 4.2.4-1ubuntu3)

I am using 4.1.3, for whatever it is worth. (Ancient, I know!)

> > When I compile on a 32-bit x86 machine, I don't see the above "call"
> > instruction. Other than that, the code I see looks consistent.
> >
> > > 9f0: eb 1d jmp a0f <rcu_stutter_wait+0x27>
> > > 9f2: 83 3d 00 00 00 00 00 cmpl $0x0,0x0
> > > 9f9: b8 01 00 00 00 mov $0x1,%eax
> > > 9fe: 75 0a jne a0a <rcu_stutter_wait+0x22>
> > > a00: b8 e8 03 00 00 mov $0x3e8,%eax
> > > a05: e8 fc ff ff ff call a06 <rcu_stutter_wait+0x1e>
> > > a0a: e8 fc ff ff ff call a0b <rcu_stutter_wait+0x23>
> > > a0f: 83 3d 6c 00 00 00 00 cmpl $0x0,0x6c
> > > ^---------- this line
> >
> > This looks like the first test in the "while" loop.
> >
> > > a16: 75 09 jne a21 <rcu_stutter_wait+0x39>
> > > a18: 83 3d 00 00 00 00 00 cmpl $0x0,0x0
> > > a1f: 75 09 jne a2a <rcu_stutter_wait+0x42>
> > > a21: 83 3d 50 1a 00 00 00 cmpl $0x0,0x1a50
> > > a28: 74 c8 je 9f2 <rcu_stutter_wait+0xa>
> > > a2a: 5d pop %ebp
> > > a2b: c3 ret
> >
> > The corresponding C code is as follows:
> >
> > static void
> > rcu_stutter_wait(void)
> > {
> > while ((stutter_pause_test || !rcutorture_runnable) && !fullstop) {
> > if (rcutorture_runnable)
> > schedule_timeout_interruptible(1);
> > else
> > schedule_timeout_interruptible(round_jiffies_relative(HZ));
> > }
> > }
> >
> > I don't see much opportunity for a page fault here... This is the
> > binary I get when I compile it, though not as a module:
> >
> > 0000085a <rcu_stutter_wait>:
> > 85a: 55 push %ebp
> > 85b: 89 e5 mov %esp,%ebp
> > 85d: eb 1d jmp 87c <rcu_stutter_wait+0x22>
> > 85f: 83 3d 00 00 00 00 00 cmpl $0x0,0x0
> > 866: b8 01 00 00 00 mov $0x1,%eax
> > 86b: 75 0a jne 877 <rcu_stutter_wait+0x1d>
> > 86d: b8 e8 03 00 00 mov $0x3e8,%eax
> > 872: e8 fc ff ff ff call 873 <rcu_stutter_wait+0x19>
> > 877: e8 fc ff ff ff call 878 <rcu_stutter_wait+0x1e>
> > 87c: 83 3d 14 00 00 00 00 cmpl $0x0,0x14
> > 883: 75 09 jne 88e <rcu_stutter_wait+0x34>
> > 885: 83 3d 00 00 00 00 00 cmpl $0x0,0x0
> > 88c: 75 09 jne 897 <rcu_stutter_wait+0x3d>
> > 88e: 83 3d 08 1a 00 00 00 cmpl $0x0,0x1a08
> > 895: 74 c8 je 85f <rcu_stutter_wait+0x5>
> > 897: 5d pop %ebp
> > 898: c3 ret
> >
> > I confess, I am confused!!!
>
> on the other box with a different gcc version
>
> gcc version 4.3.2 (Ubuntu 4.3.2-1ubuntu11)
>
> d1902e90 is the start of rcu_stutter_wait
>
> [ 533.391719] d087e000 d1902e90
> [ 533.392294] rcu-torture:--- Start of test: nreaders=2 nfakewriters=4 stat_interval=0 verbose=0 test_no_idle_hz=0 shuffle_interval=3 stutter=5 irqreader=1
> [ 541.000139] BUG: unable to handle kernel paging request at d1902efd
> [ 541.000423] IP: [<d1902efd>] 0xd1902efd
> [ 541.000660] *pde = 0f08f067 *pte = 00000000
> [ 541.000867] Oops: 0000 [#1] DEBUG_PAGEALLOC
> [ 541.001126] last sysfs file: /sys/block/sda/size
> [ 541.001246] Modules linked in: nfsd exportfs nfs lockd nfs_acl auth_rpcgss sunrpc ipv6 fuse unix [last unloaded: rcutorture]
> [ 541.002235]
> [ 541.002334] Pid: 5292, comm: rcu_torture_wri Not tainted (2.6.28 #84)
> [ 541.002470] EIP: 0060:[<d1902efd>] EFLAGS: 00010296 CPU: 0
> [ 541.002598] EIP is at 0xd1902efd
> [ 541.002767] EAX: 00000000 EBX: d19073c0 ECX: 00000000 EDX: 00000000
> [ 541.002900] ESI: 0000000a EDI: 00000000 EBP: c7b63fb8 ESP: c7b63fb8
> [ 541.003033] DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
> [ 541.003160] Process rcu_torture_wri (pid: 5292, ti=c7b63000 task=c7b09710 task.ti=c7b63000)
> [ 541.003400] Stack:
> [ 541.003497] c7b63fd0 d19032c1 00000000 00000000 00000000 d1903200 c7b63fe0 c013d80a
> [ 541.004022] c013d7d0 00000000 00000000 c0103cf3 cef6ee70 00000000 00000000 00000000
> [ 541.004022] 00000201 000004b4
> [ 541.004022] Call Trace:
> [ 541.004022] [<c013d80a>] ? kthread+0x3a/0x70
> [ 541.004022] [<c013d7d0>] ? kthread+0x0/0x70
> [ 541.004022] [<c0103cf3>] ? kernel_thread_helper+0x7/0x14
> [ 541.004022] Code: Bad EIP value.
> [ 541.004022] EIP: [<d1902efd>] 0xd1902efd SS:ESP 0068:c7b63fb8
> [ 541.004022] ---[ end trace cb3b10c2bb94b4e3 ]---
>
>
> 00000e90 <rcu_stutter_wait>:
> e90: 55 push %ebp
> e91: 89 e5 mov %esp,%ebp
> e93: 90 nop
> e94: 8d 74 26 00 lea 0x0(%esi,%eiz,1),%esi
> e98: a1 98 00 00 00 mov 0x98,%eax
> e9d: 85 c0 test %eax,%eax
> e9f: 75 09 jne eaa <rcu_stutter_wait+0x1a>
> ea1: a1 00 00 00 00 mov 0x0,%eax
> ea6: 85 c0 test %eax,%eax
> ea8: 75 36 jne ee0 <rcu_stutter_wait+0x50>
> eaa: a1 88 1a 00 00 mov 0x1a88,%eax
> eaf: 85 c0 test %eax,%eax
> eb1: 75 2d jne ee0 <rcu_stutter_wait+0x50>
> eb3: 8b 15 00 00 00 00 mov 0x0,%edx
> eb9: 85 d2 test %edx,%edx
> ebb: 74 2b je ee8 <rcu_stutter_wait+0x58>
> ebd: b8 01 00 00 00 mov $0x1,%eax
> ec2: e8 fc ff ff ff call ec3 <rcu_stutter_wait+0x33>
> ec7: a1 98 00 00 00 mov 0x98,%eax
> ecc: 85 c0 test %eax,%eax
> ece: 74 d1 je ea1 <rcu_stutter_wait+0x11>
> ed0: a1 88 1a 00 00 mov 0x1a88,%eax
> ed5: 85 c0 test %eax,%eax
> ed7: 74 da je eb3 <rcu_stutter_wait+0x23>
> ed9: 8d b4 26 00 00 00 00 lea 0x0(%esi,%eiz,1),%esi
> ee0: 5d pop %ebp
> ee1: c3 ret
> ee2: 8d b6 00 00 00 00 lea 0x0(%esi),%esi
> ee8: b8 fa 00 00 00 mov $0xfa,%eax
> eed: e8 fc ff ff ff call eee <rcu_stutter_wait+0x5e>

Here we are again calling one byte into the current instruction!!!

Or am I misinterpreting this code?

> ef2: 8d b6 00 00 00 00 lea 0x0(%esi),%esi
> ef8: e8 fc ff ff ff call ef9 <rcu_stutter_wait+0x69>
> efd: 8d 76 00 lea 0x0(%esi),%esi
> ^------------- here
>
> This one looks more like it can explain a page fault

I don't understand why there are indirections in the assembly given the
C code for rcu_stutter_wait().

> f00: eb 96 jmp e98 <rcu_stutter_wait+0x8>
> f02: 8d b4 26 00 00 00 00 lea 0x0(%esi,%eiz,1),%esi
> f09: 8d bc 27 00 00 00 00 lea 0x0(%edi,%eiz,1),%edi
>
> Greetings, Eric

Thanx, Paul

2009-01-05 20:32:27

by Eric Sesterhenn

[permalink] [raw]
Subject: Re: [BUG] NULL pointer deref with rcutorture

* Paul E. McKenney ([email protected]) wrote:
> On Mon, Jan 05, 2009 at 09:01:45PM +0100, Eric Sesterhenn wrote:
> > hi,
> >
> > * Paul E. McKenney ([email protected]) wrote:
> > > On Mon, Jan 05, 2009 at 07:56:55PM +0100, Eric Sesterhenn wrote:
> > >
> > > Wow!!! Am I reading this correctly? Does the above "call" instruction
> > > -really- call one byte into itself? That is what the hex for the x86
> > > instruction -looks- like it is doing, but I cannot see what would have
> > > possessed the compiler to generate this code.
> >
> > Compiler is gcc version 4.2.4 (Ubuntu 4.2.4-1ubuntu3)
>
> I am using 4.1.3, for whatever it is worth. (Ancient, I know!)
>
> > > When I compile on a 32-bit x86 machine, I don't see the above "call"
> > > instruction. Other than that, the code I see looks consistent.
> > >
> > > > 9f0: eb 1d jmp a0f <rcu_stutter_wait+0x27>
> > > > 9f2: 83 3d 00 00 00 00 00 cmpl $0x0,0x0
> > > > 9f9: b8 01 00 00 00 mov $0x1,%eax
> > > > 9fe: 75 0a jne a0a <rcu_stutter_wait+0x22>
> > > > a00: b8 e8 03 00 00 mov $0x3e8,%eax
> > > > a05: e8 fc ff ff ff call a06 <rcu_stutter_wait+0x1e>
> > > > a0a: e8 fc ff ff ff call a0b <rcu_stutter_wait+0x23>
> > > > a0f: 83 3d 6c 00 00 00 00 cmpl $0x0,0x6c
> > > > ^---------- this line
> > >
> > > This looks like the first test in the "while" loop.
> > >
> > > > a16: 75 09 jne a21 <rcu_stutter_wait+0x39>
> > > > a18: 83 3d 00 00 00 00 00 cmpl $0x0,0x0
> > > > a1f: 75 09 jne a2a <rcu_stutter_wait+0x42>
> > > > a21: 83 3d 50 1a 00 00 00 cmpl $0x0,0x1a50
> > > > a28: 74 c8 je 9f2 <rcu_stutter_wait+0xa>
> > > > a2a: 5d pop %ebp
> > > > a2b: c3 ret
> > >
> > > The corresponding C code is as follows:
> > >
> > > static void
> > > rcu_stutter_wait(void)
> > > {
> > > while ((stutter_pause_test || !rcutorture_runnable) && !fullstop) {
> > > if (rcutorture_runnable)
> > > schedule_timeout_interruptible(1);
> > > else
> > > schedule_timeout_interruptible(round_jiffies_relative(HZ));
> > > }
> > > }
> > >
> > > I don't see much opportunity for a page fault here... This is the
> > > binary I get when I compile it, though not as a module:
> > >
> > > 0000085a <rcu_stutter_wait>:
> > > 85a: 55 push %ebp
> > > 85b: 89 e5 mov %esp,%ebp
> > > 85d: eb 1d jmp 87c <rcu_stutter_wait+0x22>
> > > 85f: 83 3d 00 00 00 00 00 cmpl $0x0,0x0
> > > 866: b8 01 00 00 00 mov $0x1,%eax
> > > 86b: 75 0a jne 877 <rcu_stutter_wait+0x1d>
> > > 86d: b8 e8 03 00 00 mov $0x3e8,%eax
> > > 872: e8 fc ff ff ff call 873 <rcu_stutter_wait+0x19>
> > > 877: e8 fc ff ff ff call 878 <rcu_stutter_wait+0x1e>
> > > 87c: 83 3d 14 00 00 00 00 cmpl $0x0,0x14
> > > 883: 75 09 jne 88e <rcu_stutter_wait+0x34>
> > > 885: 83 3d 00 00 00 00 00 cmpl $0x0,0x0
> > > 88c: 75 09 jne 897 <rcu_stutter_wait+0x3d>
> > > 88e: 83 3d 08 1a 00 00 00 cmpl $0x0,0x1a08
> > > 895: 74 c8 je 85f <rcu_stutter_wait+0x5>
> > > 897: 5d pop %ebp
> > > 898: c3 ret
> > >
> > > I confess, I am confused!!!
> >
> > on the other box with a different gcc version
> >
> > gcc version 4.3.2 (Ubuntu 4.3.2-1ubuntu11)
> >
> > d1902e90 is the start of rcu_stutter_wait
> >
> > [ 533.391719] d087e000 d1902e90
> > [ 533.392294] rcu-torture:--- Start of test: nreaders=2 nfakewriters=4 stat_interval=0 verbose=0 test_no_idle_hz=0 shuffle_interval=3 stutter=5 irqreader=1
> > [ 541.000139] BUG: unable to handle kernel paging request at d1902efd
> > [ 541.000423] IP: [<d1902efd>] 0xd1902efd
> > [ 541.000660] *pde = 0f08f067 *pte = 00000000
> > [ 541.000867] Oops: 0000 [#1] DEBUG_PAGEALLOC
> > [ 541.001126] last sysfs file: /sys/block/sda/size
> > [ 541.001246] Modules linked in: nfsd exportfs nfs lockd nfs_acl auth_rpcgss sunrpc ipv6 fuse unix [last unloaded: rcutorture]
> > [ 541.002235]
> > [ 541.002334] Pid: 5292, comm: rcu_torture_wri Not tainted (2.6.28 #84)
> > [ 541.002470] EIP: 0060:[<d1902efd>] EFLAGS: 00010296 CPU: 0
> > [ 541.002598] EIP is at 0xd1902efd
> > [ 541.002767] EAX: 00000000 EBX: d19073c0 ECX: 00000000 EDX: 00000000
> > [ 541.002900] ESI: 0000000a EDI: 00000000 EBP: c7b63fb8 ESP: c7b63fb8
> > [ 541.003033] DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
> > [ 541.003160] Process rcu_torture_wri (pid: 5292, ti=c7b63000 task=c7b09710 task.ti=c7b63000)
> > [ 541.003400] Stack:
> > [ 541.003497] c7b63fd0 d19032c1 00000000 00000000 00000000 d1903200 c7b63fe0 c013d80a
> > [ 541.004022] c013d7d0 00000000 00000000 c0103cf3 cef6ee70 00000000 00000000 00000000
> > [ 541.004022] 00000201 000004b4
> > [ 541.004022] Call Trace:
> > [ 541.004022] [<c013d80a>] ? kthread+0x3a/0x70
> > [ 541.004022] [<c013d7d0>] ? kthread+0x0/0x70
> > [ 541.004022] [<c0103cf3>] ? kernel_thread_helper+0x7/0x14
> > [ 541.004022] Code: Bad EIP value.
> > [ 541.004022] EIP: [<d1902efd>] 0xd1902efd SS:ESP 0068:c7b63fb8
> > [ 541.004022] ---[ end trace cb3b10c2bb94b4e3 ]---
> >
> >
> > 00000e90 <rcu_stutter_wait>:
> > e90: 55 push %ebp
> > e91: 89 e5 mov %esp,%ebp
> > e93: 90 nop
> > e94: 8d 74 26 00 lea 0x0(%esi,%eiz,1),%esi
> > e98: a1 98 00 00 00 mov 0x98,%eax
> > e9d: 85 c0 test %eax,%eax
> > e9f: 75 09 jne eaa <rcu_stutter_wait+0x1a>
> > ea1: a1 00 00 00 00 mov 0x0,%eax
> > ea6: 85 c0 test %eax,%eax
> > ea8: 75 36 jne ee0 <rcu_stutter_wait+0x50>
> > eaa: a1 88 1a 00 00 mov 0x1a88,%eax
> > eaf: 85 c0 test %eax,%eax
> > eb1: 75 2d jne ee0 <rcu_stutter_wait+0x50>
> > eb3: 8b 15 00 00 00 00 mov 0x0,%edx
> > eb9: 85 d2 test %edx,%edx
> > ebb: 74 2b je ee8 <rcu_stutter_wait+0x58>
> > ebd: b8 01 00 00 00 mov $0x1,%eax
> > ec2: e8 fc ff ff ff call ec3 <rcu_stutter_wait+0x33>
> > ec7: a1 98 00 00 00 mov 0x98,%eax
> > ecc: 85 c0 test %eax,%eax
> > ece: 74 d1 je ea1 <rcu_stutter_wait+0x11>
> > ed0: a1 88 1a 00 00 mov 0x1a88,%eax
> > ed5: 85 c0 test %eax,%eax
> > ed7: 74 da je eb3 <rcu_stutter_wait+0x23>
> > ed9: 8d b4 26 00 00 00 00 lea 0x0(%esi,%eiz,1),%esi
> > ee0: 5d pop %ebp
> > ee1: c3 ret
> > ee2: 8d b6 00 00 00 00 lea 0x0(%esi),%esi
> > ee8: b8 fa 00 00 00 mov $0xfa,%eax
> > eed: e8 fc ff ff ff call eee <rcu_stutter_wait+0x5e>
>
> Here we are again calling one byte into the current instruction!!!
>
> Or am I misinterpreting this code?
>
> > ef2: 8d b6 00 00 00 00 lea 0x0(%esi),%esi
> > ef8: e8 fc ff ff ff call ef9 <rcu_stutter_wait+0x69>
> > efd: 8d 76 00 lea 0x0(%esi),%esi
> > ^------------- here
> >
> > This one looks more like it can explain a page fault
>
> I don't understand why there are indirections in the assembly given the
> C code for rcu_stutter_wait().
>
> > f00: eb 96 jmp e98 <rcu_stutter_wait+0x8>
> > f02: 8d b4 26 00 00 00 00 lea 0x0(%esi,%eiz,1),%esi
> > f09: 8d bc 27 00 00 00 00 lea 0x0(%edi,%eiz,1),%edi

ok, after trying to find out if the ubuntu gccs are broken, i stumbled
upon this:
http://forum.soft32.com/linux/Strange-problem-disassembling-shared-lib-ftopict439936.html

Seems the difference is that you dont compile it as a module and the
jump is perfectly normal, it gets overwritten when the stuff is loaded
objdump -dr gives me

00000e90 <rcu_stutter_wait>:
e90: 55 push %ebp
e91: 89 e5 mov %esp,%ebp
e93: 90 nop
e94: 8d 74 26 00 lea 0x0(%esi,%eiz,1),%esi
e98: a1 98 00 00 00 mov 0x98,%eax
e99: R_386_32 .bss
e9d: 85 c0 test %eax,%eax
e9f: 75 09 jne eaa <rcu_stutter_wait+0x1a>
ea1: a1 00 00 00 00 mov 0x0,%eax
ea2: R_386_32 rcutorture_runnable
ea6: 85 c0 test %eax,%eax
ea8: 75 36 jne ee0 <rcu_stutter_wait+0x50>
eaa: a1 88 1a 00 00 mov 0x1a88,%eax
eab: R_386_32 .bss
eaf: 85 c0 test %eax,%eax
eb1: 75 2d jne ee0 <rcu_stutter_wait+0x50>
eb3: 8b 15 00 00 00 00 mov 0x0,%edx
eb5: R_386_32 rcutorture_runnable
eb9: 85 d2 test %edx,%edx
ebb: 74 2b je ee8 <rcu_stutter_wait+0x58>
ebd: b8 01 00 00 00 mov $0x1,%eax
ec2: e8 fc ff ff ff call ec3 <rcu_stutter_wait+0x33>
ec3: R_386_PC32 schedule_timeout_interruptible
ec7: a1 98 00 00 00 mov 0x98,%eax
ec8: R_386_32 .bss
ecc: 85 c0 test %eax,%eax
ece: 74 d1 je ea1 <rcu_stutter_wait+0x11>
ed0: a1 88 1a 00 00 mov 0x1a88,%eax
ed1: R_386_32 .bss
ed5: 85 c0 test %eax,%eax
ed7: 74 da je eb3 <rcu_stutter_wait+0x23>
ed9: 8d b4 26 00 00 00 00 lea 0x0(%esi,%eiz,1),%esi
ee0: 5d pop %ebp
ee1: c3 ret
ee2: 8d b6 00 00 00 00 lea 0x0(%esi),%esi
ee8: b8 fa 00 00 00 mov $0xfa,%eax
eed: e8 fc ff ff ff call eee <rcu_stutter_wait+0x5e>
eee: R_386_PC32 round_jiffies_relative
ef2: 8d b6 00 00 00 00 lea 0x0(%esi),%esi
ef8: e8 fc ff ff ff call ef9 <rcu_stutter_wait+0x69>
ef9: R_386_PC32 schedule_timeout_interruptible
efd: 8d 76 00 lea 0x0(%esi),%esi

here is the deref ------------------------^

f00: eb 96 jmp e98 <rcu_stutter_wait+0x8>
f02: 8d b4 26 00 00 00 00 lea 0x0(%esi,%eiz,1),%esi
f09: 8d bc 27 00 00 00 00 lea 0x0(%edi,%eiz,1),%edi

2009-01-05 22:19:05

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [BUG] NULL pointer deref with rcutorture

On Mon, Jan 05, 2009 at 09:31:53PM +0100, Eric Sesterhenn wrote:
> * Paul E. McKenney ([email protected]) wrote:
> > On Mon, Jan 05, 2009 at 09:01:45PM +0100, Eric Sesterhenn wrote:
> > > hi,
> > >
> > > * Paul E. McKenney ([email protected]) wrote:
> > > > On Mon, Jan 05, 2009 at 07:56:55PM +0100, Eric Sesterhenn wrote:
> > > >
> > > > Wow!!! Am I reading this correctly? Does the above "call" instruction
> > > > -really- call one byte into itself? That is what the hex for the x86
> > > > instruction -looks- like it is doing, but I cannot see what would have
> > > > possessed the compiler to generate this code.
> > >
> > > Compiler is gcc version 4.2.4 (Ubuntu 4.2.4-1ubuntu3)
> >
> > I am using 4.1.3, for whatever it is worth. (Ancient, I know!)
> >
> > > > When I compile on a 32-bit x86 machine, I don't see the above "call"
> > > > instruction. Other than that, the code I see looks consistent.
> > > >
> > > > > 9f0: eb 1d jmp a0f <rcu_stutter_wait+0x27>
> > > > > 9f2: 83 3d 00 00 00 00 00 cmpl $0x0,0x0
> > > > > 9f9: b8 01 00 00 00 mov $0x1,%eax
> > > > > 9fe: 75 0a jne a0a <rcu_stutter_wait+0x22>
> > > > > a00: b8 e8 03 00 00 mov $0x3e8,%eax
> > > > > a05: e8 fc ff ff ff call a06 <rcu_stutter_wait+0x1e>
> > > > > a0a: e8 fc ff ff ff call a0b <rcu_stutter_wait+0x23>
> > > > > a0f: 83 3d 6c 00 00 00 00 cmpl $0x0,0x6c
> > > > > ^---------- this line
> > > >
> > > > This looks like the first test in the "while" loop.
> > > >
> > > > > a16: 75 09 jne a21 <rcu_stutter_wait+0x39>
> > > > > a18: 83 3d 00 00 00 00 00 cmpl $0x0,0x0
> > > > > a1f: 75 09 jne a2a <rcu_stutter_wait+0x42>
> > > > > a21: 83 3d 50 1a 00 00 00 cmpl $0x0,0x1a50
> > > > > a28: 74 c8 je 9f2 <rcu_stutter_wait+0xa>
> > > > > a2a: 5d pop %ebp
> > > > > a2b: c3 ret
> > > >
> > > > The corresponding C code is as follows:
> > > >
> > > > static void
> > > > rcu_stutter_wait(void)
> > > > {
> > > > while ((stutter_pause_test || !rcutorture_runnable) && !fullstop) {
> > > > if (rcutorture_runnable)
> > > > schedule_timeout_interruptible(1);
> > > > else
> > > > schedule_timeout_interruptible(round_jiffies_relative(HZ));
> > > > }
> > > > }
> > > >
> > > > I don't see much opportunity for a page fault here... This is the
> > > > binary I get when I compile it, though not as a module:
> > > >
> > > > 0000085a <rcu_stutter_wait>:
> > > > 85a: 55 push %ebp
> > > > 85b: 89 e5 mov %esp,%ebp
> > > > 85d: eb 1d jmp 87c <rcu_stutter_wait+0x22>
> > > > 85f: 83 3d 00 00 00 00 00 cmpl $0x0,0x0
> > > > 866: b8 01 00 00 00 mov $0x1,%eax
> > > > 86b: 75 0a jne 877 <rcu_stutter_wait+0x1d>
> > > > 86d: b8 e8 03 00 00 mov $0x3e8,%eax
> > > > 872: e8 fc ff ff ff call 873 <rcu_stutter_wait+0x19>
> > > > 877: e8 fc ff ff ff call 878 <rcu_stutter_wait+0x1e>
> > > > 87c: 83 3d 14 00 00 00 00 cmpl $0x0,0x14
> > > > 883: 75 09 jne 88e <rcu_stutter_wait+0x34>
> > > > 885: 83 3d 00 00 00 00 00 cmpl $0x0,0x0
> > > > 88c: 75 09 jne 897 <rcu_stutter_wait+0x3d>
> > > > 88e: 83 3d 08 1a 00 00 00 cmpl $0x0,0x1a08
> > > > 895: 74 c8 je 85f <rcu_stutter_wait+0x5>
> > > > 897: 5d pop %ebp
> > > > 898: c3 ret
> > > >
> > > > I confess, I am confused!!!
> > >
> > > on the other box with a different gcc version
> > >
> > > gcc version 4.3.2 (Ubuntu 4.3.2-1ubuntu11)
> > >
> > > d1902e90 is the start of rcu_stutter_wait
> > >
> > > [ 533.391719] d087e000 d1902e90
> > > [ 533.392294] rcu-torture:--- Start of test: nreaders=2 nfakewriters=4 stat_interval=0 verbose=0 test_no_idle_hz=0 shuffle_interval=3 stutter=5 irqreader=1
> > > [ 541.000139] BUG: unable to handle kernel paging request at d1902efd
> > > [ 541.000423] IP: [<d1902efd>] 0xd1902efd
> > > [ 541.000660] *pde = 0f08f067 *pte = 00000000
> > > [ 541.000867] Oops: 0000 [#1] DEBUG_PAGEALLOC
> > > [ 541.001126] last sysfs file: /sys/block/sda/size
> > > [ 541.001246] Modules linked in: nfsd exportfs nfs lockd nfs_acl auth_rpcgss sunrpc ipv6 fuse unix [last unloaded: rcutorture]
> > > [ 541.002235]
> > > [ 541.002334] Pid: 5292, comm: rcu_torture_wri Not tainted (2.6.28 #84)
> > > [ 541.002470] EIP: 0060:[<d1902efd>] EFLAGS: 00010296 CPU: 0
> > > [ 541.002598] EIP is at 0xd1902efd
> > > [ 541.002767] EAX: 00000000 EBX: d19073c0 ECX: 00000000 EDX: 00000000
> > > [ 541.002900] ESI: 0000000a EDI: 00000000 EBP: c7b63fb8 ESP: c7b63fb8
> > > [ 541.003033] DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
> > > [ 541.003160] Process rcu_torture_wri (pid: 5292, ti=c7b63000 task=c7b09710 task.ti=c7b63000)
> > > [ 541.003400] Stack:
> > > [ 541.003497] c7b63fd0 d19032c1 00000000 00000000 00000000 d1903200 c7b63fe0 c013d80a
> > > [ 541.004022] c013d7d0 00000000 00000000 c0103cf3 cef6ee70 00000000 00000000 00000000
> > > [ 541.004022] 00000201 000004b4
> > > [ 541.004022] Call Trace:
> > > [ 541.004022] [<c013d80a>] ? kthread+0x3a/0x70
> > > [ 541.004022] [<c013d7d0>] ? kthread+0x0/0x70
> > > [ 541.004022] [<c0103cf3>] ? kernel_thread_helper+0x7/0x14
> > > [ 541.004022] Code: Bad EIP value.
> > > [ 541.004022] EIP: [<d1902efd>] 0xd1902efd SS:ESP 0068:c7b63fb8
> > > [ 541.004022] ---[ end trace cb3b10c2bb94b4e3 ]---
> > >
> > >
> > > 00000e90 <rcu_stutter_wait>:
> > > e90: 55 push %ebp
> > > e91: 89 e5 mov %esp,%ebp
> > > e93: 90 nop
> > > e94: 8d 74 26 00 lea 0x0(%esi,%eiz,1),%esi
> > > e98: a1 98 00 00 00 mov 0x98,%eax
> > > e9d: 85 c0 test %eax,%eax
> > > e9f: 75 09 jne eaa <rcu_stutter_wait+0x1a>
> > > ea1: a1 00 00 00 00 mov 0x0,%eax
> > > ea6: 85 c0 test %eax,%eax
> > > ea8: 75 36 jne ee0 <rcu_stutter_wait+0x50>
> > > eaa: a1 88 1a 00 00 mov 0x1a88,%eax
> > > eaf: 85 c0 test %eax,%eax
> > > eb1: 75 2d jne ee0 <rcu_stutter_wait+0x50>
> > > eb3: 8b 15 00 00 00 00 mov 0x0,%edx
> > > eb9: 85 d2 test %edx,%edx
> > > ebb: 74 2b je ee8 <rcu_stutter_wait+0x58>
> > > ebd: b8 01 00 00 00 mov $0x1,%eax
> > > ec2: e8 fc ff ff ff call ec3 <rcu_stutter_wait+0x33>
> > > ec7: a1 98 00 00 00 mov 0x98,%eax
> > > ecc: 85 c0 test %eax,%eax
> > > ece: 74 d1 je ea1 <rcu_stutter_wait+0x11>
> > > ed0: a1 88 1a 00 00 mov 0x1a88,%eax
> > > ed5: 85 c0 test %eax,%eax
> > > ed7: 74 da je eb3 <rcu_stutter_wait+0x23>
> > > ed9: 8d b4 26 00 00 00 00 lea 0x0(%esi,%eiz,1),%esi
> > > ee0: 5d pop %ebp
> > > ee1: c3 ret
> > > ee2: 8d b6 00 00 00 00 lea 0x0(%esi),%esi
> > > ee8: b8 fa 00 00 00 mov $0xfa,%eax
> > > eed: e8 fc ff ff ff call eee <rcu_stutter_wait+0x5e>
> >
> > Here we are again calling one byte into the current instruction!!!
> >
> > Or am I misinterpreting this code?
> >
> > > ef2: 8d b6 00 00 00 00 lea 0x0(%esi),%esi
> > > ef8: e8 fc ff ff ff call ef9 <rcu_stutter_wait+0x69>
> > > efd: 8d 76 00 lea 0x0(%esi),%esi
> > > ^------------- here
> > >
> > > This one looks more like it can explain a page fault
> >
> > I don't understand why there are indirections in the assembly given the
> > C code for rcu_stutter_wait().
> >
> > > f00: eb 96 jmp e98 <rcu_stutter_wait+0x8>
> > > f02: 8d b4 26 00 00 00 00 lea 0x0(%esi,%eiz,1),%esi
> > > f09: 8d bc 27 00 00 00 00 lea 0x0(%edi,%eiz,1),%edi
>
> ok, after trying to find out if the ubuntu gccs are broken, i stumbled
> upon this:
> http://forum.soft32.com/linux/Strange-problem-disassembling-shared-lib-ftopict439936.html
>
> Seems the difference is that you dont compile it as a module and the
> jump is perfectly normal, it gets overwritten when the stuff is loaded
> objdump -dr gives me
>
> 00000e90 <rcu_stutter_wait>:
> e90: 55 push %ebp
> e91: 89 e5 mov %esp,%ebp
> e93: 90 nop
> e94: 8d 74 26 00 lea 0x0(%esi,%eiz,1),%esi
> e98: a1 98 00 00 00 mov 0x98,%eax
> e99: R_386_32 .bss
> e9d: 85 c0 test %eax,%eax
> e9f: 75 09 jne eaa <rcu_stutter_wait+0x1a>
> ea1: a1 00 00 00 00 mov 0x0,%eax
> ea2: R_386_32 rcutorture_runnable
> ea6: 85 c0 test %eax,%eax
> ea8: 75 36 jne ee0 <rcu_stutter_wait+0x50>
> eaa: a1 88 1a 00 00 mov 0x1a88,%eax
> eab: R_386_32 .bss
> eaf: 85 c0 test %eax,%eax
> eb1: 75 2d jne ee0 <rcu_stutter_wait+0x50>
> eb3: 8b 15 00 00 00 00 mov 0x0,%edx
> eb5: R_386_32 rcutorture_runnable
> eb9: 85 d2 test %edx,%edx
> ebb: 74 2b je ee8 <rcu_stutter_wait+0x58>
> ebd: b8 01 00 00 00 mov $0x1,%eax
> ec2: e8 fc ff ff ff call ec3 <rcu_stutter_wait+0x33>
> ec3: R_386_PC32 schedule_timeout_interruptible
> ec7: a1 98 00 00 00 mov 0x98,%eax
> ec8: R_386_32 .bss
> ecc: 85 c0 test %eax,%eax
> ece: 74 d1 je ea1 <rcu_stutter_wait+0x11>
> ed0: a1 88 1a 00 00 mov 0x1a88,%eax
> ed1: R_386_32 .bss
> ed5: 85 c0 test %eax,%eax
> ed7: 74 da je eb3 <rcu_stutter_wait+0x23>
> ed9: 8d b4 26 00 00 00 00 lea 0x0(%esi,%eiz,1),%esi
> ee0: 5d pop %ebp
> ee1: c3 ret
> ee2: 8d b6 00 00 00 00 lea 0x0(%esi),%esi
> ee8: b8 fa 00 00 00 mov $0xfa,%eax
> eed: e8 fc ff ff ff call eee <rcu_stutter_wait+0x5e>
> eee: R_386_PC32 round_jiffies_relative
> ef2: 8d b6 00 00 00 00 lea 0x0(%esi),%esi
> ef8: e8 fc ff ff ff call ef9 <rcu_stutter_wait+0x69>
> ef9: R_386_PC32 schedule_timeout_interruptible
> efd: 8d 76 00 lea 0x0(%esi),%esi
>
> here is the deref ------------------------^

Ah!!! We are getting a page fault while cleaning up the stack frame?

Ouch!

> f00: eb 96 jmp e98 <rcu_stutter_wait+0x8>
> f02: 8d b4 26 00 00 00 00 lea 0x0(%esi,%eiz,1),%esi
> f09: 8d bc 27 00 00 00 00 lea 0x0(%edi,%eiz,1),%edi

Thanx, Paul

2009-01-06 00:29:25

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [BUG] NULL pointer deref with rcutorture

On Mon, Jan 05, 2009 at 02:18:36PM -0800, Paul E. McKenney wrote:
> On Mon, Jan 05, 2009 at 09:31:53PM +0100, Eric Sesterhenn wrote:
> > * Paul E. McKenney ([email protected]) wrote:
> > > On Mon, Jan 05, 2009 at 09:01:45PM +0100, Eric Sesterhenn wrote:
> > > > hi,
> > > >
> > > > * Paul E. McKenney ([email protected]) wrote:
> > > > > On Mon, Jan 05, 2009 at 07:56:55PM +0100, Eric Sesterhenn wrote:
> > > > >
> > > > > Wow!!! Am I reading this correctly? Does the above "call" instruction
> > > > > -really- call one byte into itself? That is what the hex for the x86
> > > > > instruction -looks- like it is doing, but I cannot see what would have
> > > > > possessed the compiler to generate this code.
> > > >
> > > > Compiler is gcc version 4.2.4 (Ubuntu 4.2.4-1ubuntu3)
> > >
> > > I am using 4.1.3, for whatever it is worth. (Ancient, I know!)
> > >
> > > > > When I compile on a 32-bit x86 machine, I don't see the above "call"
> > > > > instruction. Other than that, the code I see looks consistent.
> > > > >
> > > > > > 9f0: eb 1d jmp a0f <rcu_stutter_wait+0x27>
> > > > > > 9f2: 83 3d 00 00 00 00 00 cmpl $0x0,0x0
> > > > > > 9f9: b8 01 00 00 00 mov $0x1,%eax
> > > > > > 9fe: 75 0a jne a0a <rcu_stutter_wait+0x22>
> > > > > > a00: b8 e8 03 00 00 mov $0x3e8,%eax
> > > > > > a05: e8 fc ff ff ff call a06 <rcu_stutter_wait+0x1e>
> > > > > > a0a: e8 fc ff ff ff call a0b <rcu_stutter_wait+0x23>
> > > > > > a0f: 83 3d 6c 00 00 00 00 cmpl $0x0,0x6c
> > > > > > ^---------- this line
> > > > >
> > > > > This looks like the first test in the "while" loop.
> > > > >
> > > > > > a16: 75 09 jne a21 <rcu_stutter_wait+0x39>
> > > > > > a18: 83 3d 00 00 00 00 00 cmpl $0x0,0x0
> > > > > > a1f: 75 09 jne a2a <rcu_stutter_wait+0x42>
> > > > > > a21: 83 3d 50 1a 00 00 00 cmpl $0x0,0x1a50
> > > > > > a28: 74 c8 je 9f2 <rcu_stutter_wait+0xa>
> > > > > > a2a: 5d pop %ebp
> > > > > > a2b: c3 ret
> > > > >
> > > > > The corresponding C code is as follows:
> > > > >
> > > > > static void
> > > > > rcu_stutter_wait(void)
> > > > > {
> > > > > while ((stutter_pause_test || !rcutorture_runnable) && !fullstop) {
> > > > > if (rcutorture_runnable)
> > > > > schedule_timeout_interruptible(1);
> > > > > else
> > > > > schedule_timeout_interruptible(round_jiffies_relative(HZ));
> > > > > }
> > > > > }
> > > > >
> > > > > I don't see much opportunity for a page fault here... This is the
> > > > > binary I get when I compile it, though not as a module:
> > > > >
> > > > > 0000085a <rcu_stutter_wait>:
> > > > > 85a: 55 push %ebp
> > > > > 85b: 89 e5 mov %esp,%ebp
> > > > > 85d: eb 1d jmp 87c <rcu_stutter_wait+0x22>
> > > > > 85f: 83 3d 00 00 00 00 00 cmpl $0x0,0x0
> > > > > 866: b8 01 00 00 00 mov $0x1,%eax
> > > > > 86b: 75 0a jne 877 <rcu_stutter_wait+0x1d>
> > > > > 86d: b8 e8 03 00 00 mov $0x3e8,%eax
> > > > > 872: e8 fc ff ff ff call 873 <rcu_stutter_wait+0x19>
> > > > > 877: e8 fc ff ff ff call 878 <rcu_stutter_wait+0x1e>
> > > > > 87c: 83 3d 14 00 00 00 00 cmpl $0x0,0x14
> > > > > 883: 75 09 jne 88e <rcu_stutter_wait+0x34>
> > > > > 885: 83 3d 00 00 00 00 00 cmpl $0x0,0x0
> > > > > 88c: 75 09 jne 897 <rcu_stutter_wait+0x3d>
> > > > > 88e: 83 3d 08 1a 00 00 00 cmpl $0x0,0x1a08
> > > > > 895: 74 c8 je 85f <rcu_stutter_wait+0x5>
> > > > > 897: 5d pop %ebp
> > > > > 898: c3 ret
> > > > >
> > > > > I confess, I am confused!!!
> > > >
> > > > on the other box with a different gcc version
> > > >
> > > > gcc version 4.3.2 (Ubuntu 4.3.2-1ubuntu11)
> > > >
> > > > d1902e90 is the start of rcu_stutter_wait
> > > >
> > > > [ 533.391719] d087e000 d1902e90
> > > > [ 533.392294] rcu-torture:--- Start of test: nreaders=2 nfakewriters=4 stat_interval=0 verbose=0 test_no_idle_hz=0 shuffle_interval=3 stutter=5 irqreader=1
> > > > [ 541.000139] BUG: unable to handle kernel paging request at d1902efd
> > > > [ 541.000423] IP: [<d1902efd>] 0xd1902efd
> > > > [ 541.000660] *pde = 0f08f067 *pte = 00000000
> > > > [ 541.000867] Oops: 0000 [#1] DEBUG_PAGEALLOC
> > > > [ 541.001126] last sysfs file: /sys/block/sda/size
> > > > [ 541.001246] Modules linked in: nfsd exportfs nfs lockd nfs_acl auth_rpcgss sunrpc ipv6 fuse unix [last unloaded: rcutorture]
> > > > [ 541.002235]
> > > > [ 541.002334] Pid: 5292, comm: rcu_torture_wri Not tainted (2.6.28 #84)
> > > > [ 541.002470] EIP: 0060:[<d1902efd>] EFLAGS: 00010296 CPU: 0
> > > > [ 541.002598] EIP is at 0xd1902efd
> > > > [ 541.002767] EAX: 00000000 EBX: d19073c0 ECX: 00000000 EDX: 00000000
> > > > [ 541.002900] ESI: 0000000a EDI: 00000000 EBP: c7b63fb8 ESP: c7b63fb8
> > > > [ 541.003033] DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
> > > > [ 541.003160] Process rcu_torture_wri (pid: 5292, ti=c7b63000 task=c7b09710 task.ti=c7b63000)
> > > > [ 541.003400] Stack:
> > > > [ 541.003497] c7b63fd0 d19032c1 00000000 00000000 00000000 d1903200 c7b63fe0 c013d80a
> > > > [ 541.004022] c013d7d0 00000000 00000000 c0103cf3 cef6ee70 00000000 00000000 00000000
> > > > [ 541.004022] 00000201 000004b4
> > > > [ 541.004022] Call Trace:
> > > > [ 541.004022] [<c013d80a>] ? kthread+0x3a/0x70
> > > > [ 541.004022] [<c013d7d0>] ? kthread+0x0/0x70
> > > > [ 541.004022] [<c0103cf3>] ? kernel_thread_helper+0x7/0x14
> > > > [ 541.004022] Code: Bad EIP value.
> > > > [ 541.004022] EIP: [<d1902efd>] 0xd1902efd SS:ESP 0068:c7b63fb8
> > > > [ 541.004022] ---[ end trace cb3b10c2bb94b4e3 ]---
> > > >
> > > >
> > > > 00000e90 <rcu_stutter_wait>:
> > > > e90: 55 push %ebp
> > > > e91: 89 e5 mov %esp,%ebp
> > > > e93: 90 nop
> > > > e94: 8d 74 26 00 lea 0x0(%esi,%eiz,1),%esi
> > > > e98: a1 98 00 00 00 mov 0x98,%eax
> > > > e9d: 85 c0 test %eax,%eax
> > > > e9f: 75 09 jne eaa <rcu_stutter_wait+0x1a>
> > > > ea1: a1 00 00 00 00 mov 0x0,%eax
> > > > ea6: 85 c0 test %eax,%eax
> > > > ea8: 75 36 jne ee0 <rcu_stutter_wait+0x50>
> > > > eaa: a1 88 1a 00 00 mov 0x1a88,%eax
> > > > eaf: 85 c0 test %eax,%eax
> > > > eb1: 75 2d jne ee0 <rcu_stutter_wait+0x50>
> > > > eb3: 8b 15 00 00 00 00 mov 0x0,%edx
> > > > eb9: 85 d2 test %edx,%edx
> > > > ebb: 74 2b je ee8 <rcu_stutter_wait+0x58>
> > > > ebd: b8 01 00 00 00 mov $0x1,%eax
> > > > ec2: e8 fc ff ff ff call ec3 <rcu_stutter_wait+0x33>
> > > > ec7: a1 98 00 00 00 mov 0x98,%eax
> > > > ecc: 85 c0 test %eax,%eax
> > > > ece: 74 d1 je ea1 <rcu_stutter_wait+0x11>
> > > > ed0: a1 88 1a 00 00 mov 0x1a88,%eax
> > > > ed5: 85 c0 test %eax,%eax
> > > > ed7: 74 da je eb3 <rcu_stutter_wait+0x23>
> > > > ed9: 8d b4 26 00 00 00 00 lea 0x0(%esi,%eiz,1),%esi
> > > > ee0: 5d pop %ebp
> > > > ee1: c3 ret
> > > > ee2: 8d b6 00 00 00 00 lea 0x0(%esi),%esi
> > > > ee8: b8 fa 00 00 00 mov $0xfa,%eax
> > > > eed: e8 fc ff ff ff call eee <rcu_stutter_wait+0x5e>
> > >
> > > Here we are again calling one byte into the current instruction!!!
> > >
> > > Or am I misinterpreting this code?
> > >
> > > > ef2: 8d b6 00 00 00 00 lea 0x0(%esi),%esi
> > > > ef8: e8 fc ff ff ff call ef9 <rcu_stutter_wait+0x69>
> > > > efd: 8d 76 00 lea 0x0(%esi),%esi
> > > > ^------------- here
> > > >
> > > > This one looks more like it can explain a page fault
> > >
> > > I don't understand why there are indirections in the assembly given the
> > > C code for rcu_stutter_wait().
> > >
> > > > f00: eb 96 jmp e98 <rcu_stutter_wait+0x8>
> > > > f02: 8d b4 26 00 00 00 00 lea 0x0(%esi,%eiz,1),%esi
> > > > f09: 8d bc 27 00 00 00 00 lea 0x0(%edi,%eiz,1),%edi
> >
> > ok, after trying to find out if the ubuntu gccs are broken, i stumbled
> > upon this:
> > http://forum.soft32.com/linux/Strange-problem-disassembling-shared-lib-ftopict439936.html
> >
> > Seems the difference is that you dont compile it as a module and the
> > jump is perfectly normal, it gets overwritten when the stuff is loaded
> > objdump -dr gives me
> >
> > 00000e90 <rcu_stutter_wait>:
> > e90: 55 push %ebp
> > e91: 89 e5 mov %esp,%ebp
> > e93: 90 nop
> > e94: 8d 74 26 00 lea 0x0(%esi,%eiz,1),%esi
> > e98: a1 98 00 00 00 mov 0x98,%eax
> > e99: R_386_32 .bss
> > e9d: 85 c0 test %eax,%eax
> > e9f: 75 09 jne eaa <rcu_stutter_wait+0x1a>
> > ea1: a1 00 00 00 00 mov 0x0,%eax
> > ea2: R_386_32 rcutorture_runnable
> > ea6: 85 c0 test %eax,%eax
> > ea8: 75 36 jne ee0 <rcu_stutter_wait+0x50>
> > eaa: a1 88 1a 00 00 mov 0x1a88,%eax
> > eab: R_386_32 .bss
> > eaf: 85 c0 test %eax,%eax
> > eb1: 75 2d jne ee0 <rcu_stutter_wait+0x50>
> > eb3: 8b 15 00 00 00 00 mov 0x0,%edx
> > eb5: R_386_32 rcutorture_runnable
> > eb9: 85 d2 test %edx,%edx
> > ebb: 74 2b je ee8 <rcu_stutter_wait+0x58>
> > ebd: b8 01 00 00 00 mov $0x1,%eax
> > ec2: e8 fc ff ff ff call ec3 <rcu_stutter_wait+0x33>
> > ec3: R_386_PC32 schedule_timeout_interruptible
> > ec7: a1 98 00 00 00 mov 0x98,%eax
> > ec8: R_386_32 .bss
> > ecc: 85 c0 test %eax,%eax
> > ece: 74 d1 je ea1 <rcu_stutter_wait+0x11>
> > ed0: a1 88 1a 00 00 mov 0x1a88,%eax
> > ed1: R_386_32 .bss
> > ed5: 85 c0 test %eax,%eax
> > ed7: 74 da je eb3 <rcu_stutter_wait+0x23>
> > ed9: 8d b4 26 00 00 00 00 lea 0x0(%esi,%eiz,1),%esi
> > ee0: 5d pop %ebp
> > ee1: c3 ret
> > ee2: 8d b6 00 00 00 00 lea 0x0(%esi),%esi
> > ee8: b8 fa 00 00 00 mov $0xfa,%eax
> > eed: e8 fc ff ff ff call eee <rcu_stutter_wait+0x5e>
> > eee: R_386_PC32 round_jiffies_relative
> > ef2: 8d b6 00 00 00 00 lea 0x0(%esi),%esi
> > ef8: e8 fc ff ff ff call ef9 <rcu_stutter_wait+0x69>
> > ef9: R_386_PC32 schedule_timeout_interruptible
> > efd: 8d 76 00 lea 0x0(%esi),%esi
> >
> > here is the deref ------------------------^
>
> Ah!!! We are getting a page fault while cleaning up the stack frame?
>
> Ouch!
>
> > f00: eb 96 jmp e98 <rcu_stutter_wait+0x8>
> > f02: 8d b4 26 00 00 00 00 lea 0x0(%esi,%eiz,1),%esi
> > f09: 8d bc 27 00 00 00 00 lea 0x0(%edi,%eiz,1),%edi

And now I can run this. gcc v3.4.4 gives yet a different error:

divide error: 0000 [#1] PREEMPT SMP

last sysfs file: /sys/devices/pci0000:00/0000:00:0a.0/0000:02:04.0/host0/target0:0:6/0:0:6:0/type

CPU 1

Modules linked in: [last unloaded: rcutorture]

Pid: 3369, comm: rcu_torture_rea Not tainted 2.6.28-autokern1 #1

RIP: 0010:[<ffffffffa0000142>] [<ffffffffa0000142>] 0xffffffffa0000142

RSP: 0000:ffff88007f0afeb0 EFLAGS: 00010246

RAX: 00000000def36d6a RBX: ffffffffa0006850 RCX: 0000000000000000

RDX: 0000000000000000 RSI: ffff88007ecf1600 RDI: ffff88007f0afef0

RBP: 0000000000000dd4 R08: ffff88007f0ae000 R09: ffffffff8037c7d9

R10: 0000000000000000 R11: ffffffff8037c7d9 R12: 0000000000000000

R13: 000000000000000a R14: 0000000000000000 R15: 0000000000000000

FS: 0000000000000000(0000) GS:ffff8800e3801c00(0000) knlGS:00000000f7f3ab80

CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b

CR2: 00000000080dc87c CR3: 0000000000201000 CR4: 00000000000006e0

DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000

DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400

Process rcu_torture_rea (pid: 3369, threadinfo ffff88007f0ae000, task ffff88007ecf5d80)

Stack:

ffffffff8037c7d9 ffffffffa000090d ffff88007ecfdec0 ffff88007ec17eb0

0000000000000001 ffffffffa00006e6 0000000000000000 ffff8800e3488000

c744ca136d6adef3 0000000000000256 0000000000000000 ffffffffa0000807

Call Trace:

[<ffffffff8037c7d9>] ? delay_tsc+0x0/0x99

[<ffffffff80245b94>] ? kthread+0x3d/0x63

[<ffffffff8020c3ea>] ? child_rip+0xa/0x20

[<ffffffff80245b57>] ? kthread+0x0/0x63

[<ffffffff8020c3e0>] ? child_rip+0x0/0x20

Code: 58 c3 56 bf 01 00 00 00 e8 88 bd 22 e0 59 31 c0 c3 41 51 e8 69 ff ff ff 48 63 15 0a 58 00 00 48 69 d2 90 01 00 00 48 89 d1 31 d2 <48> f7 f1 48 85 d2 75 0c 41 58 bf 78 1b 0d 00 e9 32 c7 37 e0 5f

RIP [<ffffffffa0000142>] 0xffffffffa0000142

RSP <ffff88007f0afeb0>


I will see what I can find from this.

Thanx, Paul

2009-01-06 02:15:31

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [BUG] NULL pointer deref with rcutorture

On Mon, Jan 05, 2009 at 04:29:12PM -0800, Paul E. McKenney wrote:
> On Mon, Jan 05, 2009 at 02:18:36PM -0800, Paul E. McKenney wrote:
> > On Mon, Jan 05, 2009 at 09:31:53PM +0100, Eric Sesterhenn wrote:
> > > * Paul E. McKenney ([email protected]) wrote:
> > > > On Mon, Jan 05, 2009 at 09:01:45PM +0100, Eric Sesterhenn wrote:
> > > > > hi,
> > > > >
> > > > > * Paul E. McKenney ([email protected]) wrote:
> > > > > > On Mon, Jan 05, 2009 at 07:56:55PM +0100, Eric Sesterhenn wrote:
> > > > > >
> > > > > > Wow!!! Am I reading this correctly? Does the above "call" instruction
> > > > > > -really- call one byte into itself? That is what the hex for the x86
> > > > > > instruction -looks- like it is doing, but I cannot see what would have
> > > > > > possessed the compiler to generate this code.
> > > > >
> > > > > Compiler is gcc version 4.2.4 (Ubuntu 4.2.4-1ubuntu3)
> > > >
> > > > I am using 4.1.3, for whatever it is worth. (Ancient, I know!)
> > > >
> > > > > > When I compile on a 32-bit x86 machine, I don't see the above "call"
> > > > > > instruction. Other than that, the code I see looks consistent.
> > > > > >
> > > > > > > 9f0: eb 1d jmp a0f <rcu_stutter_wait+0x27>
> > > > > > > 9f2: 83 3d 00 00 00 00 00 cmpl $0x0,0x0
> > > > > > > 9f9: b8 01 00 00 00 mov $0x1,%eax
> > > > > > > 9fe: 75 0a jne a0a <rcu_stutter_wait+0x22>
> > > > > > > a00: b8 e8 03 00 00 mov $0x3e8,%eax
> > > > > > > a05: e8 fc ff ff ff call a06 <rcu_stutter_wait+0x1e>
> > > > > > > a0a: e8 fc ff ff ff call a0b <rcu_stutter_wait+0x23>
> > > > > > > a0f: 83 3d 6c 00 00 00 00 cmpl $0x0,0x6c
> > > > > > > ^---------- this line
> > > > > >
> > > > > > This looks like the first test in the "while" loop.
> > > > > >
> > > > > > > a16: 75 09 jne a21 <rcu_stutter_wait+0x39>
> > > > > > > a18: 83 3d 00 00 00 00 00 cmpl $0x0,0x0
> > > > > > > a1f: 75 09 jne a2a <rcu_stutter_wait+0x42>
> > > > > > > a21: 83 3d 50 1a 00 00 00 cmpl $0x0,0x1a50
> > > > > > > a28: 74 c8 je 9f2 <rcu_stutter_wait+0xa>
> > > > > > > a2a: 5d pop %ebp
> > > > > > > a2b: c3 ret
> > > > > >
> > > > > > The corresponding C code is as follows:
> > > > > >
> > > > > > static void
> > > > > > rcu_stutter_wait(void)
> > > > > > {
> > > > > > while ((stutter_pause_test || !rcutorture_runnable) && !fullstop) {
> > > > > > if (rcutorture_runnable)
> > > > > > schedule_timeout_interruptible(1);
> > > > > > else
> > > > > > schedule_timeout_interruptible(round_jiffies_relative(HZ));
> > > > > > }
> > > > > > }
> > > > > >
> > > > > > I don't see much opportunity for a page fault here... This is the
> > > > > > binary I get when I compile it, though not as a module:
> > > > > >
> > > > > > 0000085a <rcu_stutter_wait>:
> > > > > > 85a: 55 push %ebp
> > > > > > 85b: 89 e5 mov %esp,%ebp
> > > > > > 85d: eb 1d jmp 87c <rcu_stutter_wait+0x22>
> > > > > > 85f: 83 3d 00 00 00 00 00 cmpl $0x0,0x0
> > > > > > 866: b8 01 00 00 00 mov $0x1,%eax
> > > > > > 86b: 75 0a jne 877 <rcu_stutter_wait+0x1d>
> > > > > > 86d: b8 e8 03 00 00 mov $0x3e8,%eax
> > > > > > 872: e8 fc ff ff ff call 873 <rcu_stutter_wait+0x19>
> > > > > > 877: e8 fc ff ff ff call 878 <rcu_stutter_wait+0x1e>
> > > > > > 87c: 83 3d 14 00 00 00 00 cmpl $0x0,0x14
> > > > > > 883: 75 09 jne 88e <rcu_stutter_wait+0x34>
> > > > > > 885: 83 3d 00 00 00 00 00 cmpl $0x0,0x0
> > > > > > 88c: 75 09 jne 897 <rcu_stutter_wait+0x3d>
> > > > > > 88e: 83 3d 08 1a 00 00 00 cmpl $0x0,0x1a08
> > > > > > 895: 74 c8 je 85f <rcu_stutter_wait+0x5>
> > > > > > 897: 5d pop %ebp
> > > > > > 898: c3 ret
> > > > > >
> > > > > > I confess, I am confused!!!
> > > > >
> > > > > on the other box with a different gcc version
> > > > >
> > > > > gcc version 4.3.2 (Ubuntu 4.3.2-1ubuntu11)
> > > > >
> > > > > d1902e90 is the start of rcu_stutter_wait
> > > > >
> > > > > [ 533.391719] d087e000 d1902e90
> > > > > [ 533.392294] rcu-torture:--- Start of test: nreaders=2 nfakewriters=4 stat_interval=0 verbose=0 test_no_idle_hz=0 shuffle_interval=3 stutter=5 irqreader=1
> > > > > [ 541.000139] BUG: unable to handle kernel paging request at d1902efd
> > > > > [ 541.000423] IP: [<d1902efd>] 0xd1902efd
> > > > > [ 541.000660] *pde = 0f08f067 *pte = 00000000
> > > > > [ 541.000867] Oops: 0000 [#1] DEBUG_PAGEALLOC
> > > > > [ 541.001126] last sysfs file: /sys/block/sda/size
> > > > > [ 541.001246] Modules linked in: nfsd exportfs nfs lockd nfs_acl auth_rpcgss sunrpc ipv6 fuse unix [last unloaded: rcutorture]
> > > > > [ 541.002235]
> > > > > [ 541.002334] Pid: 5292, comm: rcu_torture_wri Not tainted (2.6.28 #84)
> > > > > [ 541.002470] EIP: 0060:[<d1902efd>] EFLAGS: 00010296 CPU: 0
> > > > > [ 541.002598] EIP is at 0xd1902efd
> > > > > [ 541.002767] EAX: 00000000 EBX: d19073c0 ECX: 00000000 EDX: 00000000
> > > > > [ 541.002900] ESI: 0000000a EDI: 00000000 EBP: c7b63fb8 ESP: c7b63fb8
> > > > > [ 541.003033] DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
> > > > > [ 541.003160] Process rcu_torture_wri (pid: 5292, ti=c7b63000 task=c7b09710 task.ti=c7b63000)
> > > > > [ 541.003400] Stack:
> > > > > [ 541.003497] c7b63fd0 d19032c1 00000000 00000000 00000000 d1903200 c7b63fe0 c013d80a
> > > > > [ 541.004022] c013d7d0 00000000 00000000 c0103cf3 cef6ee70 00000000 00000000 00000000
> > > > > [ 541.004022] 00000201 000004b4
> > > > > [ 541.004022] Call Trace:
> > > > > [ 541.004022] [<c013d80a>] ? kthread+0x3a/0x70
> > > > > [ 541.004022] [<c013d7d0>] ? kthread+0x0/0x70
> > > > > [ 541.004022] [<c0103cf3>] ? kernel_thread_helper+0x7/0x14
> > > > > [ 541.004022] Code: Bad EIP value.
> > > > > [ 541.004022] EIP: [<d1902efd>] 0xd1902efd SS:ESP 0068:c7b63fb8
> > > > > [ 541.004022] ---[ end trace cb3b10c2bb94b4e3 ]---
> > > > >
> > > > >
> > > > > 00000e90 <rcu_stutter_wait>:
> > > > > e90: 55 push %ebp
> > > > > e91: 89 e5 mov %esp,%ebp
> > > > > e93: 90 nop
> > > > > e94: 8d 74 26 00 lea 0x0(%esi,%eiz,1),%esi
> > > > > e98: a1 98 00 00 00 mov 0x98,%eax
> > > > > e9d: 85 c0 test %eax,%eax
> > > > > e9f: 75 09 jne eaa <rcu_stutter_wait+0x1a>
> > > > > ea1: a1 00 00 00 00 mov 0x0,%eax
> > > > > ea6: 85 c0 test %eax,%eax
> > > > > ea8: 75 36 jne ee0 <rcu_stutter_wait+0x50>
> > > > > eaa: a1 88 1a 00 00 mov 0x1a88,%eax
> > > > > eaf: 85 c0 test %eax,%eax
> > > > > eb1: 75 2d jne ee0 <rcu_stutter_wait+0x50>
> > > > > eb3: 8b 15 00 00 00 00 mov 0x0,%edx
> > > > > eb9: 85 d2 test %edx,%edx
> > > > > ebb: 74 2b je ee8 <rcu_stutter_wait+0x58>
> > > > > ebd: b8 01 00 00 00 mov $0x1,%eax
> > > > > ec2: e8 fc ff ff ff call ec3 <rcu_stutter_wait+0x33>
> > > > > ec7: a1 98 00 00 00 mov 0x98,%eax
> > > > > ecc: 85 c0 test %eax,%eax
> > > > > ece: 74 d1 je ea1 <rcu_stutter_wait+0x11>
> > > > > ed0: a1 88 1a 00 00 mov 0x1a88,%eax
> > > > > ed5: 85 c0 test %eax,%eax
> > > > > ed7: 74 da je eb3 <rcu_stutter_wait+0x23>
> > > > > ed9: 8d b4 26 00 00 00 00 lea 0x0(%esi,%eiz,1),%esi
> > > > > ee0: 5d pop %ebp
> > > > > ee1: c3 ret
> > > > > ee2: 8d b6 00 00 00 00 lea 0x0(%esi),%esi
> > > > > ee8: b8 fa 00 00 00 mov $0xfa,%eax
> > > > > eed: e8 fc ff ff ff call eee <rcu_stutter_wait+0x5e>
> > > >
> > > > Here we are again calling one byte into the current instruction!!!
> > > >
> > > > Or am I misinterpreting this code?
> > > >
> > > > > ef2: 8d b6 00 00 00 00 lea 0x0(%esi),%esi
> > > > > ef8: e8 fc ff ff ff call ef9 <rcu_stutter_wait+0x69>
> > > > > efd: 8d 76 00 lea 0x0(%esi),%esi
> > > > > ^------------- here
> > > > >
> > > > > This one looks more like it can explain a page fault
> > > >
> > > > I don't understand why there are indirections in the assembly given the
> > > > C code for rcu_stutter_wait().
> > > >
> > > > > f00: eb 96 jmp e98 <rcu_stutter_wait+0x8>
> > > > > f02: 8d b4 26 00 00 00 00 lea 0x0(%esi,%eiz,1),%esi
> > > > > f09: 8d bc 27 00 00 00 00 lea 0x0(%edi,%eiz,1),%edi
> > >
> > > ok, after trying to find out if the ubuntu gccs are broken, i stumbled
> > > upon this:
> > > http://forum.soft32.com/linux/Strange-problem-disassembling-shared-lib-ftopict439936.html
> > >
> > > Seems the difference is that you dont compile it as a module and the
> > > jump is perfectly normal, it gets overwritten when the stuff is loaded
> > > objdump -dr gives me
> > >
> > > 00000e90 <rcu_stutter_wait>:
> > > e90: 55 push %ebp
> > > e91: 89 e5 mov %esp,%ebp
> > > e93: 90 nop
> > > e94: 8d 74 26 00 lea 0x0(%esi,%eiz,1),%esi
> > > e98: a1 98 00 00 00 mov 0x98,%eax
> > > e99: R_386_32 .bss
> > > e9d: 85 c0 test %eax,%eax
> > > e9f: 75 09 jne eaa <rcu_stutter_wait+0x1a>
> > > ea1: a1 00 00 00 00 mov 0x0,%eax
> > > ea2: R_386_32 rcutorture_runnable
> > > ea6: 85 c0 test %eax,%eax
> > > ea8: 75 36 jne ee0 <rcu_stutter_wait+0x50>
> > > eaa: a1 88 1a 00 00 mov 0x1a88,%eax
> > > eab: R_386_32 .bss
> > > eaf: 85 c0 test %eax,%eax
> > > eb1: 75 2d jne ee0 <rcu_stutter_wait+0x50>
> > > eb3: 8b 15 00 00 00 00 mov 0x0,%edx
> > > eb5: R_386_32 rcutorture_runnable
> > > eb9: 85 d2 test %edx,%edx
> > > ebb: 74 2b je ee8 <rcu_stutter_wait+0x58>
> > > ebd: b8 01 00 00 00 mov $0x1,%eax
> > > ec2: e8 fc ff ff ff call ec3 <rcu_stutter_wait+0x33>
> > > ec3: R_386_PC32 schedule_timeout_interruptible
> > > ec7: a1 98 00 00 00 mov 0x98,%eax
> > > ec8: R_386_32 .bss
> > > ecc: 85 c0 test %eax,%eax
> > > ece: 74 d1 je ea1 <rcu_stutter_wait+0x11>
> > > ed0: a1 88 1a 00 00 mov 0x1a88,%eax
> > > ed1: R_386_32 .bss
> > > ed5: 85 c0 test %eax,%eax
> > > ed7: 74 da je eb3 <rcu_stutter_wait+0x23>
> > > ed9: 8d b4 26 00 00 00 00 lea 0x0(%esi,%eiz,1),%esi
> > > ee0: 5d pop %ebp
> > > ee1: c3 ret
> > > ee2: 8d b6 00 00 00 00 lea 0x0(%esi),%esi
> > > ee8: b8 fa 00 00 00 mov $0xfa,%eax
> > > eed: e8 fc ff ff ff call eee <rcu_stutter_wait+0x5e>
> > > eee: R_386_PC32 round_jiffies_relative
> > > ef2: 8d b6 00 00 00 00 lea 0x0(%esi),%esi
> > > ef8: e8 fc ff ff ff call ef9 <rcu_stutter_wait+0x69>
> > > ef9: R_386_PC32 schedule_timeout_interruptible
> > > efd: 8d 76 00 lea 0x0(%esi),%esi
> > >
> > > here is the deref ------------------------^
> >
> > Ah!!! We are getting a page fault while cleaning up the stack frame?
> >
> > Ouch!
> >
> > > f00: eb 96 jmp e98 <rcu_stutter_wait+0x8>
> > > f02: 8d b4 26 00 00 00 00 lea 0x0(%esi,%eiz,1),%esi
> > > f09: 8d bc 27 00 00 00 00 lea 0x0(%edi,%eiz,1),%edi
>
> And now I can run this. gcc v3.4.4 gives yet a different error:
>
> divide error: 0000 [#1] PREEMPT SMP
> last sysfs file: /sys/devices/pci0000:00/0000:00:0a.0/0000:02:04.0/host0/target0:0:6/0:0:6:0/type
> CPU 1
> Modules linked in: [last unloaded: rcutorture]
> Pid: 3369, comm: rcu_torture_rea Not tainted 2.6.28-autokern1 #1
> RIP: 0010:[<ffffffffa0000142>] [<ffffffffa0000142>] 0xffffffffa0000142
> RSP: 0000:ffff88007f0afeb0 EFLAGS: 00010246
> RAX: 00000000def36d6a RBX: ffffffffa0006850 RCX: 0000000000000000
> RDX: 0000000000000000 RSI: ffff88007ecf1600 RDI: ffff88007f0afef0
> RBP: 0000000000000dd4 R08: ffff88007f0ae000 R09: ffffffff8037c7d9
> R10: 0000000000000000 R11: ffffffff8037c7d9 R12: 0000000000000000
> R13: 000000000000000a R14: 0000000000000000 R15: 0000000000000000
> FS: 0000000000000000(0000) GS:ffff8800e3801c00(0000) knlGS:00000000f7f3ab80
> CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> CR2: 00000000080dc87c CR3: 0000000000201000 CR4: 00000000000006e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process rcu_torture_rea (pid: 3369, threadinfo ffff88007f0ae000, task ffff88007ecf5d80)
> Stack:
> ffffffff8037c7d9 ffffffffa000090d ffff88007ecfdec0 ffff88007ec17eb0
> 0000000000000001 ffffffffa00006e6 0000000000000000 ffff8800e3488000
> c744ca136d6adef3 0000000000000256 0000000000000000 ffffffffa0000807
> Call Trace:
> [<ffffffff8037c7d9>] ? delay_tsc+0x0/0x99
> [<ffffffff80245b94>] ? kthread+0x3d/0x63
> [<ffffffff8020c3ea>] ? child_rip+0xa/0x20
> [<ffffffff80245b57>] ? kthread+0x0/0x63
> [<ffffffff8020c3e0>] ? child_rip+0x0/0x20
> Code: 58 c3 56 bf 01 00 00 00 e8 88 bd 22 e0 59 31 c0 c3 41 51 e8 69 ff ff ff 48 63 15 0a 58 00 00 48 69 d2 90 01 00 00 48 89 d1 31 d2 <48> f7 f1 48 85 d2 75 0c 41 58 bf 78 1b 0d 00 e9 32 c7 37 e0 5f
> RIP [<ffffffffa0000142>] 0xffffffffa0000142
> RSP <ffff88007f0afeb0>
>
> I will see what I can find from this.

Hello, Eric,

Just for grins, I built the 2.6.28 kernel/rcutorture.c on b58602a. It
insmoded and rmmoded without errors.

Could you please try this on your setup? I have attached this file in
case that helps.

Thanx, Paul


Attachments:
(No filename) (14.25 kB)
rcutorture.c (31.11 kB)
kernel/rcutorture.c
Download all attachments

2009-01-06 07:47:23

by Eric Sesterhenn

[permalink] [raw]
Subject: Re: [BUG] NULL pointer deref with rcutorture

* Paul E. McKenney ([email protected]) wrote:
> On Mon, Jan 05, 2009 at 04:29:12PM -0800, Paul E. McKenney wrote:
> > On Mon, Jan 05, 2009 at 02:18:36PM -0800, Paul E. McKenney wrote:
> > > > e98: a1 98 00 00 00 mov 0x98,%eax
> > > > e99: R_386_32 .bss
> > > > e9d: 85 c0 test %eax,%eax
> > > > e9f: 75 09 jne eaa <rcu_stutter_wait+0x1a>
> > > > ea1: a1 00 00 00 00 mov 0x0,%eax
> > > > ea2: R_386_32 rcutorture_runnable
> > > > ea6: 85 c0 test %eax,%eax
> > > > ea8: 75 36 jne ee0 <rcu_stutter_wait+0x50>
> > > > eaa: a1 88 1a 00 00 mov 0x1a88,%eax
> > > > eab: R_386_32 .bss
> > > > eaf: 85 c0 test %eax,%eax
> > > > eb1: 75 2d jne ee0 <rcu_stutter_wait+0x50>
> > > > eb3: 8b 15 00 00 00 00 mov 0x0,%edx
> > > > eb5: R_386_32 rcutorture_runnable
> > > > eb9: 85 d2 test %edx,%edx
> > > > ebb: 74 2b je ee8 <rcu_stutter_wait+0x58>
> > > > ebd: b8 01 00 00 00 mov $0x1,%eax
> > > > ec2: e8 fc ff ff ff call ec3 <rcu_stutter_wait+0x33>
> > > > ec3: R_386_PC32 schedule_timeout_interruptible
> > > > ec7: a1 98 00 00 00 mov 0x98,%eax
> > > > ec8: R_386_32 .bss
> > > > ecc: 85 c0 test %eax,%eax
> > > > ece: 74 d1 je ea1 <rcu_stutter_wait+0x11>
> > > > ed0: a1 88 1a 00 00 mov 0x1a88,%eax
> > > > ed1: R_386_32 .bss
> > > > ed5: 85 c0 test %eax,%eax
> > > > ed7: 74 da je eb3 <rcu_stutter_wait+0x23>
> > > > ed9: 8d b4 26 00 00 00 00 lea 0x0(%esi,%eiz,1),%esi
> > > > ee0: 5d pop %ebp
> > > > ee1: c3 ret
> > > > ee2: 8d b6 00 00 00 00 lea 0x0(%esi),%esi
> > > > ee8: b8 fa 00 00 00 mov $0xfa,%eax
> > > > eed: e8 fc ff ff ff call eee <rcu_stutter_wait+0x5e>
> > > > eee: R_386_PC32 round_jiffies_relative
> > > > ef2: 8d b6 00 00 00 00 lea 0x0(%esi),%esi
> > > > ef8: e8 fc ff ff ff call ef9 <rcu_stutter_wait+0x69>
> > > > ef9: R_386_PC32 schedule_timeout_interruptible
> > > > efd: 8d 76 00 lea 0x0(%esi),%esi
> > > >
> > > > here is the deref ------------------------^
> > >
> > > Ah!!! We are getting a page fault while cleaning up the stack frame?
> > >
> > > Ouch!
> > >
> > > > f00: eb 96 jmp e98 <rcu_stutter_wait+0x8>
> > > > f02: 8d b4 26 00 00 00 00 lea 0x0(%esi,%eiz,1),%esi
> > > > f09: 8d bc 27 00 00 00 00 lea 0x0(%edi,%eiz,1),%edi
> >
> > And now I can run this. gcc v3.4.4 gives yet a different error:
> >
> > divide error: 0000 [#1] PREEMPT SMP
> > last sysfs file: /sys/devices/pci0000:00/0000:00:0a.0/0000:02:04.0/host0/target0:0:6/0:0:6:0/type
> > CPU 1
> > Modules linked in: [last unloaded: rcutorture]
> > Pid: 3369, comm: rcu_torture_rea Not tainted 2.6.28-autokern1 #1
> > RIP: 0010:[<ffffffffa0000142>] [<ffffffffa0000142>] 0xffffffffa0000142
> > RSP: 0000:ffff88007f0afeb0 EFLAGS: 00010246
> > RAX: 00000000def36d6a RBX: ffffffffa0006850 RCX: 0000000000000000
> > RDX: 0000000000000000 RSI: ffff88007ecf1600 RDI: ffff88007f0afef0
> > RBP: 0000000000000dd4 R08: ffff88007f0ae000 R09: ffffffff8037c7d9
> > R10: 0000000000000000 R11: ffffffff8037c7d9 R12: 0000000000000000
> > R13: 000000000000000a R14: 0000000000000000 R15: 0000000000000000
> > FS: 0000000000000000(0000) GS:ffff8800e3801c00(0000) knlGS:00000000f7f3ab80
> > CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> > CR2: 00000000080dc87c CR3: 0000000000201000 CR4: 00000000000006e0
> > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> > Process rcu_torture_rea (pid: 3369, threadinfo ffff88007f0ae000, task ffff88007ecf5d80)
> > Stack:
> > ffffffff8037c7d9 ffffffffa000090d ffff88007ecfdec0 ffff88007ec17eb0
> > 0000000000000001 ffffffffa00006e6 0000000000000000 ffff8800e3488000
> > c744ca136d6adef3 0000000000000256 0000000000000000 ffffffffa0000807
> > Call Trace:
> > [<ffffffff8037c7d9>] ? delay_tsc+0x0/0x99
> > [<ffffffff80245b94>] ? kthread+0x3d/0x63
> > [<ffffffff8020c3ea>] ? child_rip+0xa/0x20
> > [<ffffffff80245b57>] ? kthread+0x0/0x63
> > [<ffffffff8020c3e0>] ? child_rip+0x0/0x20
> > Code: 58 c3 56 bf 01 00 00 00 e8 88 bd 22 e0 59 31 c0 c3 41 51 e8 69 ff ff ff 48 63 15 0a 58 00 00 48 69 d2 90 01 00 00 48 89 d1 31 d2 <48> f7 f1 48 85 d2 75 0c 41 58 bf 78 1b 0d 00 e9 32 c7 37 e0 5f
> > RIP [<ffffffffa0000142>] 0xffffffffa0000142
> > RSP <ffff88007f0afeb0>
> >
> > I will see what I can find from this.
>
> Hello, Eric,
>
> Just for grins, I built the 2.6.28 kernel/rcutorture.c on b58602a. It
> insmoded and rmmoded without errors.
>
> Could you please try this on your setup? I have attached this file in
> case that helps.

works like a charm

Greetings, Eric

2009-01-06 12:48:20

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [BUG] NULL pointer deref with rcutorture

On Tue, Jan 06, 2009 at 08:47:07AM +0100, Eric Sesterhenn wrote:
> * Paul E. McKenney ([email protected]) wrote:
> > On Mon, Jan 05, 2009 at 04:29:12PM -0800, Paul E. McKenney wrote:
> > > On Mon, Jan 05, 2009 at 02:18:36PM -0800, Paul E. McKenney wrote:
> > > I will see what I can find from this.
> >
> > Hello, Eric,
> >
> > Just for grins, I built the 2.6.28 kernel/rcutorture.c on b58602a. It
> > insmoded and rmmoded without errors.
> >
> > Could you please try this on your setup? I have attached this file in
> > case that helps.
>
> works like a charm

Well, that certainly tells me where the bug lies. ;-)

Thank you for all your hard work on this, I should have a patch soon.

Thanx, Paul

2009-01-07 19:46:51

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [BUG] NULL pointer deref with rcutorture

On Tue, Jan 06, 2009 at 04:48:09AM -0800, Paul E. McKenney wrote:
> On Tue, Jan 06, 2009 at 08:47:07AM +0100, Eric Sesterhenn wrote:
> > * Paul E. McKenney ([email protected]) wrote:
> > > On Mon, Jan 05, 2009 at 04:29:12PM -0800, Paul E. McKenney wrote:
> > > > On Mon, Jan 05, 2009 at 02:18:36PM -0800, Paul E. McKenney wrote:
> > > > I will see what I can find from this.
> > >
> > > Hello, Eric,
> > >
> > > Just for grins, I built the 2.6.28 kernel/rcutorture.c on b58602a. It
> > > insmoded and rmmoded without errors.
> > >
> > > Could you please try this on your setup? I have attached this file in
> > > case that helps.
> >
> > works like a charm
>
> Well, that certainly tells me where the bug lies. ;-)
>
> Thank you for all your hard work on this, I should have a patch soon.

And here it is, against the original version of rcutorture.c (revert
all of my previous patches against rcutorture.c). I am also attaching
the resulting rcutorture.c, if that makes things more convenient for you.

This one survives modprobe and rmmod, and also survives when shutdown
while running. Here are the operating modes:

CONFIG_RCU_TORTURE_TEST=m

This is the normal rcutorture build. Use "modprobe rcutorture"
(with optional arguments) to start, and "rmmod rcutorture" to
stop. If you shut the system down without doing the rmmod, you
should see console output like:

rcutorture thread rcu_torture_writer parking due to system shutdown

One for each rcutorture kthread.

CONFIG_RCU_TORTURE_TEST=y
CONFIG_RCU_TORTURE_TEST_RUNNABLE=n

Use this if you want rcutorture built in, but don't want the
test to start running during early boot. To start the
torturing:

echo 1 > /proc/sys/kernel/rcutorture_runnable

To stop the torturing, s/1/0/

You will get "parking" console messages as noted above when
you shut the system down.

CONFIG_RCU_TORTURE_TEST=y
CONFIG_RCU_TORTURE_TEST_RUNNABLE=y

Same as above, except that the torturing starts during early
boot. Only for the stout of heart and strong of stomach.
The same /proc entry noted above may be used to control the
test.

Could you please fire this up and see if it fixes the rmmod-time bug
that you located?

Thanx, Paul

Signed-off-by: Paul E. McKenney <[email protected]>
---

rcutorture.c | 94 +++++++++++++++++++++++++++++++++++++----------------------
1 file changed, 59 insertions(+), 35 deletions(-)

diff --git a/kernel/rcutorture.c b/kernel/rcutorture.c
index b310655..7c4142a 100644
--- a/kernel/rcutorture.c
+++ b/kernel/rcutorture.c
@@ -136,31 +136,47 @@ static int stutter_pause_test = 0;
#endif
int rcutorture_runnable = RCUTORTURE_RUNNABLE_INIT;

-#define FULLSTOP_SIGNALED 1 /* Bail due to signal. */
-#define FULLSTOP_CLEANUP 2 /* Orderly shutdown. */
-static int fullstop; /* stop generating callbacks at test end. */
-DEFINE_MUTEX(fullstop_mutex); /* protect fullstop transitions and */
- /* spawning of kthreads. */
+/* Mediate rmmod and system shutdown. Concurrent rmmod & shutdown illegal! */
+
+#define FULLSTOP_DONTSTOP 0 /* Normal operation. */
+#define FULLSTOP_SHUTDOWN 1 /* System shutdown with rcutorture running. */
+#define FULLSTOP_RMMOD 2 /* Normal rmmod of rcutorture. */
+static int fullstop = FULLSTOP_RMMOD;
+DEFINE_MUTEX(fullstop_mutex); /* Protect fullstop transitions and spawning */
+ /* of kthreads. */

/*
- * Detect and respond to a signal-based shutdown.
+ * Detect and respond to a system shutdown.
*/
static int
rcutorture_shutdown_notify(struct notifier_block *unused1,
unsigned long unused2, void *unused3)
{
- if (fullstop)
- return NOTIFY_DONE;
- if (signal_pending(current)) {
- mutex_lock(&fullstop_mutex);
- if (!ACCESS_ONCE(fullstop))
- fullstop = FULLSTOP_SIGNALED;
- mutex_unlock(&fullstop_mutex);
- }
+ mutex_lock(&fullstop_mutex);
+ if (fullstop == FULLSTOP_DONTSTOP)
+ fullstop = FULLSTOP_SHUTDOWN;
+ else
+ printk(KERN_WARNING /* but going down anyway, so... */
+ "Concurrent 'rmmod rcutorture' and shutdown illegal!\n");
+ mutex_unlock(&fullstop_mutex);
return NOTIFY_DONE;
}

/*
+ * Absorb kthreads into a kernel function that won't return, so that
+ * they won't ever access module text or data again.
+ */
+static void rcutorture_shutdown_absorb(char *title)
+{
+ if (ACCESS_ONCE(fullstop) == FULLSTOP_SHUTDOWN) {
+ printk(KERN_NOTICE
+ "rcutorture thread %s parking due to system shutdown\n",
+ title);
+ schedule_timeout_uninterruptible(MAX_SCHEDULE_TIMEOUT);
+ }
+}
+
+/*
* Allocate an element from the rcu_tortures pool.
*/
static struct rcu_torture *
@@ -221,13 +237,14 @@ rcu_random(struct rcu_random_state *rrsp)
}

static void
-rcu_stutter_wait(void)
+rcu_stutter_wait(char *title)
{
- while ((stutter_pause_test || !rcutorture_runnable) && !fullstop) {
+ while (stutter_pause_test || !rcutorture_runnable) {
if (rcutorture_runnable)
schedule_timeout_interruptible(1);
else
schedule_timeout_interruptible(round_jiffies_relative(HZ));
+ rcutorture_shutdown_absorb(title);
}
}

@@ -289,7 +306,7 @@ rcu_torture_cb(struct rcu_head *p)
int i;
struct rcu_torture *rp = container_of(p, struct rcu_torture, rtort_rcu);

- if (fullstop) {
+ if (fullstop != FULLSTOP_DONTSTOP) {
/* Test is ending, just drop callbacks on the floor. */
/* The next initialization will pick up the pieces. */
return;
@@ -621,10 +638,11 @@ rcu_torture_writer(void *arg)
}
rcu_torture_current_version++;
oldbatch = cur_ops->completed();
- rcu_stutter_wait();
- } while (!kthread_should_stop() && !fullstop);
+ rcu_stutter_wait("rcu_torture_writer");
+ } while (!kthread_should_stop() && fullstop == FULLSTOP_DONTSTOP);
VERBOSE_PRINTK_STRING("rcu_torture_writer task stopping");
- while (!kthread_should_stop() && fullstop != FULLSTOP_SIGNALED)
+ rcutorture_shutdown_absorb("rcu_torture_writer");
+ while (!kthread_should_stop())
schedule_timeout_uninterruptible(1);
return 0;
}
@@ -645,11 +663,12 @@ rcu_torture_fakewriter(void *arg)
schedule_timeout_uninterruptible(1 + rcu_random(&rand)%10);
udelay(rcu_random(&rand) & 0x3ff);
cur_ops->sync();
- rcu_stutter_wait();
- } while (!kthread_should_stop() && !fullstop);
+ rcu_stutter_wait("rcu_torture_fakewriter");
+ } while (!kthread_should_stop() && fullstop == FULLSTOP_DONTSTOP);

VERBOSE_PRINTK_STRING("rcu_torture_fakewriter task stopping");
- while (!kthread_should_stop() && fullstop != FULLSTOP_SIGNALED)
+ rcutorture_shutdown_absorb("rcu_torture_fakewriter");
+ while (!kthread_should_stop())
schedule_timeout_uninterruptible(1);
return 0;
}
@@ -754,12 +773,13 @@ rcu_torture_reader(void *arg)
preempt_enable();
cur_ops->readunlock(idx);
schedule();
- rcu_stutter_wait();
- } while (!kthread_should_stop() && !fullstop);
+ rcu_stutter_wait("rcu_torture_reader");
+ } while (!kthread_should_stop() && fullstop == FULLSTOP_DONTSTOP);
VERBOSE_PRINTK_STRING("rcu_torture_reader task stopping");
+ rcutorture_shutdown_absorb("rcu_torture_reader");
if (irqreader && cur_ops->irqcapable)
del_timer_sync(&t);
- while (!kthread_should_stop() && fullstop != FULLSTOP_SIGNALED)
+ while (!kthread_should_stop())
schedule_timeout_uninterruptible(1);
return 0;
}
@@ -856,7 +876,8 @@ rcu_torture_stats(void *arg)
do {
schedule_timeout_interruptible(stat_interval * HZ);
rcu_torture_stats_print();
- } while (!kthread_should_stop() && !fullstop);
+ rcutorture_shutdown_absorb("rcu_torture_stats");
+ } while (!kthread_should_stop());
VERBOSE_PRINTK_STRING("rcu_torture_stats task stopping");
return 0;
}
@@ -924,7 +945,8 @@ rcu_torture_shuffle(void *arg)
do {
schedule_timeout_interruptible(shuffle_interval * HZ);
rcu_torture_shuffle_tasks();
- } while (!kthread_should_stop() && !fullstop);
+ rcutorture_shutdown_absorb("rcu_torture_shuffle");
+ } while (!kthread_should_stop());
VERBOSE_PRINTK_STRING("rcu_torture_shuffle task stopping");
return 0;
}
@@ -939,10 +961,11 @@ rcu_torture_stutter(void *arg)
do {
schedule_timeout_interruptible(stutter * HZ);
stutter_pause_test = 1;
- if (!kthread_should_stop() && !fullstop)
+ if (!kthread_should_stop())
schedule_timeout_interruptible(stutter * HZ);
stutter_pause_test = 0;
- } while (!kthread_should_stop() && !fullstop);
+ rcutorture_shutdown_absorb("rcu_torture_stutter");
+ } while (!kthread_should_stop());
VERBOSE_PRINTK_STRING("rcu_torture_stutter task stopping");
return 0;
}
@@ -969,15 +992,16 @@ rcu_torture_cleanup(void)
int i;

mutex_lock(&fullstop_mutex);
- if (!fullstop) {
- /* If being signaled, let it happen, then exit. */
+ if (fullstop == FULLSTOP_SHUTDOWN) {
+ printk(KERN_WARNING /* but going down anyway, so... */
+ "Concurrent 'rmmod rcutorture' and shutdown illegal!\n");
mutex_unlock(&fullstop_mutex);
- schedule_timeout_interruptible(10 * HZ);
+ schedule_timeout_uninterruptible(10);
if (cur_ops->cb_barrier != NULL)
cur_ops->cb_barrier();
return;
}
- fullstop = FULLSTOP_CLEANUP;
+ fullstop = FULLSTOP_RMMOD;
mutex_unlock(&fullstop_mutex);
unregister_reboot_notifier(&rcutorture_nb);
if (stutter_task) {
@@ -1077,7 +1101,7 @@ rcu_torture_init(void)
else
nrealreaders = 2 * num_online_cpus();
rcu_torture_print_module_parms("Start of test");
- fullstop = 0;
+ fullstop = FULLSTOP_DONTSTOP;

/* Set up the freelist. */


Attachments:
(No filename) (9.15 kB)
rcutorture.c (33.49 kB)
kernel/rcutorture.c
Download all attachments

2009-01-07 20:19:34

by Eric Sesterhenn

[permalink] [raw]
Subject: Re: [BUG] NULL pointer deref with rcutorture

* Paul E. McKenney ([email protected]) wrote:
> On Tue, Jan 06, 2009 at 04:48:09AM -0800, Paul E. McKenney wrote:
> > On Tue, Jan 06, 2009 at 08:47:07AM +0100, Eric Sesterhenn wrote:
> > > * Paul E. McKenney ([email protected]) wrote:
> > > > On Mon, Jan 05, 2009 at 04:29:12PM -0800, Paul E. McKenney wrote:
> > > > > On Mon, Jan 05, 2009 at 02:18:36PM -0800, Paul E. McKenney wrote:
> > > > > I will see what I can find from this.
> > > >
> > > > Hello, Eric,
> > > >
> > > > Just for grins, I built the 2.6.28 kernel/rcutorture.c on b58602a. It
> > > > insmoded and rmmoded without errors.
> > > >
> > > > Could you please try this on your setup? I have attached this file in
> > > > case that helps.
> > >
> > > works like a charm
> >
> > Well, that certainly tells me where the bug lies. ;-)
> >
> > Thank you for all your hard work on this, I should have a patch soon.
>
> And here it is, against the original version of rcutorture.c (revert
> all of my previous patches against rcutorture.c). I am also attaching
> the resulting rcutorture.c, if that makes things more convenient for you.
>
> Could you please fire this up and see if it fixes the rmmod-time bug
> that you located?

it does fix the bug, thanks for your work

Greetings, Eric

2009-01-07 22:06:56

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [BUG] NULL pointer deref with rcutorture

On Wed, Jan 07, 2009 at 09:19:04PM +0100, Eric Sesterhenn wrote:
> * Paul E. McKenney ([email protected]) wrote:
> > On Tue, Jan 06, 2009 at 04:48:09AM -0800, Paul E. McKenney wrote:
> > > On Tue, Jan 06, 2009 at 08:47:07AM +0100, Eric Sesterhenn wrote:
> > > > * Paul E. McKenney ([email protected]) wrote:
> > > > > On Mon, Jan 05, 2009 at 04:29:12PM -0800, Paul E. McKenney wrote:
> > > > > > On Mon, Jan 05, 2009 at 02:18:36PM -0800, Paul E. McKenney wrote:
> > > > > > I will see what I can find from this.
> > > > >
> > > > > Hello, Eric,
> > > > >
> > > > > Just for grins, I built the 2.6.28 kernel/rcutorture.c on b58602a. It
> > > > > insmoded and rmmoded without errors.
> > > > >
> > > > > Could you please try this on your setup? I have attached this file in
> > > > > case that helps.
> > > >
> > > > works like a charm
> > >
> > > Well, that certainly tells me where the bug lies. ;-)
> > >
> > > Thank you for all your hard work on this, I should have a patch soon.
> >
> > And here it is, against the original version of rcutorture.c (revert
> > all of my previous patches against rcutorture.c). I am also attaching
> > the resulting rcutorture.c, if that makes things more convenient for you.
> >
> > Could you please fire this up and see if it fixes the rmmod-time bug
> > that you located?
>
> it does fix the bug, thanks for your work

Thank you very much for finding this one, bearing with me during the
debug process, and testing it! I will post the patch with credit to you.

Does this fix all of the problems that you observed, or is there still
some breakage?

Thanx, Paul

2009-01-07 22:38:41

by Eric Sesterhenn

[permalink] [raw]
Subject: Re: [BUG] NULL pointer deref with rcutorture

* Paul E. McKenney ([email protected]) wrote:
> On Wed, Jan 07, 2009 at 09:19:04PM +0100, Eric Sesterhenn wrote:
> > * Paul E. McKenney ([email protected]) wrote:
> > > On Tue, Jan 06, 2009 at 04:48:09AM -0800, Paul E. McKenney wrote:
> > > > On Tue, Jan 06, 2009 at 08:47:07AM +0100, Eric Sesterhenn wrote:
> > > > > * Paul E. McKenney ([email protected]) wrote:
> > > > > > On Mon, Jan 05, 2009 at 04:29:12PM -0800, Paul E. McKenney wrote:
> > > > > > > On Mon, Jan 05, 2009 at 02:18:36PM -0800, Paul E. McKenney wrote:
> > > > > > > I will see what I can find from this.
> > > > > >
> > > > > > Hello, Eric,
> > > > > >
> > > > > > Just for grins, I built the 2.6.28 kernel/rcutorture.c on b58602a. It
> > > > > > insmoded and rmmoded without errors.
> > > > > >
> > > > > > Could you please try this on your setup? I have attached this file in
> > > > > > case that helps.
> > > > >
> > > > > works like a charm
> > > >
> > > > Well, that certainly tells me where the bug lies. ;-)
> > > >
> > > > Thank you for all your hard work on this, I should have a patch soon.
> > >
> > > And here it is, against the original version of rcutorture.c (revert
> > > all of my previous patches against rcutorture.c). I am also attaching
> > > the resulting rcutorture.c, if that makes things more convenient for you.
> > >
> > > Could you please fire this up and see if it fixes the rmmod-time bug
> > > that you located?
> >
> > it does fix the bug, thanks for your work
>
> Thank you very much for finding this one, bearing with me during the
> debug process, and testing it! I will post the patch with credit to you.
>
> Does this fix all of the problems that you observed, or is there still
> some breakage?

It fixes everything i saw. Thanks.

Eric

2009-01-07 22:49:22

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [BUG] NULL pointer deref with rcutorture

On Wed, Jan 07, 2009 at 11:34:12PM +0100, Eric Sesterhenn wrote:
> * Paul E. McKenney ([email protected]) wrote:
> > On Wed, Jan 07, 2009 at 09:19:04PM +0100, Eric Sesterhenn wrote:
> > > * Paul E. McKenney ([email protected]) wrote:
> > > > On Tue, Jan 06, 2009 at 04:48:09AM -0800, Paul E. McKenney wrote:
> > > > > On Tue, Jan 06, 2009 at 08:47:07AM +0100, Eric Sesterhenn wrote:
> > > > > > * Paul E. McKenney ([email protected]) wrote:
> > > > > > > On Mon, Jan 05, 2009 at 04:29:12PM -0800, Paul E. McKenney wrote:
> > > > > > > > On Mon, Jan 05, 2009 at 02:18:36PM -0800, Paul E. McKenney wrote:
> > > > > > > > I will see what I can find from this.
> > > > > > >
> > > > > > > Hello, Eric,
> > > > > > >
> > > > > > > Just for grins, I built the 2.6.28 kernel/rcutorture.c on b58602a. It
> > > > > > > insmoded and rmmoded without errors.
> > > > > > >
> > > > > > > Could you please try this on your setup? I have attached this file in
> > > > > > > case that helps.
> > > > > >
> > > > > > works like a charm
> > > > >
> > > > > Well, that certainly tells me where the bug lies. ;-)
> > > > >
> > > > > Thank you for all your hard work on this, I should have a patch soon.
> > > >
> > > > And here it is, against the original version of rcutorture.c (revert
> > > > all of my previous patches against rcutorture.c). I am also attaching
> > > > the resulting rcutorture.c, if that makes things more convenient for you.
> > > >
> > > > Could you please fire this up and see if it fixes the rmmod-time bug
> > > > that you located?
> > >
> > > it does fix the bug, thanks for your work
> >
> > Thank you very much for finding this one, bearing with me during the
> > debug process, and testing it! I will post the patch with credit to you.
> >
> > Does this fix all of the problems that you observed, or is there still
> > some breakage?
>
> It fixes everything i saw. Thanks.

Woo-hoo!!! ;-)

Thanx, Paul