2010-06-30 11:22:44

by divya

[permalink] [raw]
Subject: Oops while running fs_racer test on a POWER6 box against latest git

While running fs_racer test from LTP on a POWER6 box against latest git(2.6.35-rc3-git4 - commitid 984bc9601f64fd)
came across the following warning followed by multiple oops.

------------[ cut here ]------------

Badness at kernel/mutex-debug.c:64
NIP: c0000000000be9e8 LR: c0000000000be9cc CTR: 0000000000000000
REGS: c00000010be8f6f0 TRAP: 0700 Not tainted (2.6.35-rc3-git4-autotest)
MSR: 8000000000029032<EE,ME,CE,IR,DR> CR: 24224422 XER: 00000012
TASK = c00000010727cf00[8211] 'fs_racer_file_c' THREAD: c00000010be8bb50 CPU: 2
GPR00: 0000000000000000 c00000010be8f970 c000000000d3d798 0000000000000001
GPR04: c00000010be8fa70 c00000010be8c000 c00000010727d9f8 0000000000000000
GPR08: c0000000043042f0 c0000000016534e8 000000000000017a c000000000c29a1c
GPR12: 0000000028228424 c00000000f600500 c00000010be8fc40 0000000020000000
GPR16: fffffffffffff000 c000000109c73000 c00000010be8fc30 0000000000010442
GPR20: 0000000000000000 0000000000000000 00000000000001b6 c00000010dd12250
GPR24: c00000000017c08c c00000010727cf00 c00000010dd12278 c00000010dd12210
GPR28: 0000000000000001 c00000010be8c000 c000000000ca2008 c00000010be8fa70
NIP [c0000000000be9e8] .mutex_remove_waiter+0xa4/0x130
LR [c0000000000be9cc] .mutex_remove_waiter+0x88/0x130
Call Trace:
[c00000010be8f970] [c00000010be8fa00] 0xc00000010be8fa00 (unreliable)
[c00000010be8fa00] [c00000000064a9f0] .mutex_lock_nested+0x384/0x430
Instruction dump:
e81f0010 e93d0000 7fa04800 41fe0028 482e96e5 60000000 2fa30000 419e0018
e93e8008 80090000 2f800000 409e0008<0fe00000> e93e8000 80090000 2f800000
Unable to handle kernel paging request for unknown fault
Faulting instruction address: 0xc00000000008d0f4
Oops: Kernel access of bad area, sig: 7 [#1]
SMP NR_CPUS=1024 NUMA
Unrecoverable FP Unavailable Exception 800 at c000000000648ed4
pSeries
last sysfs file: /sys/devices/system/cpu/cpu19/cache/index1/shared_cpu_map
Modules linked in: ipv6 fuse loop dm_mod sr_mod cdrom ibmveth sg
sd_mod crc_t10dif ibmvscsic scsi_transport_srp scsi_tgt scsi_mod
NIP: c00000000008d0f4 LR: c00000000008d0d0 CTR: 0000000000000000
REGS: c00000010978f900 TRAP: 0600 Tainted: G W (2.6.35-rc3-git4-autotest)
MSR: 8000000000009032
Unrecoverable FP Unavailable Exception 800 at c000000000648ed4
Unrecoverable FP Unavailable Exception 800 at c000000000648ed4
Unrecoverable FP Unavailable Exception 800 at c000000000648ed4
Unrecoverable FP Unavailable Exception 800 at c000000000648ed4
Unrecoverable FP Unavailable Exception 800 at c000000000648ed4
EE,ME,IR,DR> CR: 24022442 XER: 00000012
DAR: c000000000648f54, DSISR: 0000000040010000
TASK = c0000001096e4900[7353] 'fs_racer_file_s' THREAD: c00000010978c000 CPU: 10
GPR00: 0000000000004000 c00000010978fb80 c000000000d3d798 0000000000000001
GPR04: c00000000083539e c000000001610228 0000000000000000 c0000000054c6880
GPR08: 00000000000006a5 c000000000648f54 0000000000000007 00000000049b0000
GPR12: 0000000000000000 c00000000f601900 00000000ffffffff ffffffffffffffff
GPR16: 000000004b7dc520 0000000000000000 0000000000000000 c00000010978fea0
GPR20: 00000fffcca7e7a0 00000fffcca7e7a0 00000fffabf7dfd0 00000fffabf7dfd0
GPR24: 0000000000000000 0000000001200011 c000000000e1c0a8 c000000000648ed4
GPR28: 0000000000000000 c0000001096e4900 c000000000ca0458 c00000010725d400
NIP [c00000000008d0f4] .copy_process+0x310/0xf40
LR [c00000000008d0d0] .copy_process+0x2ec/0xf40
Call Trace:
[c00000010978fb80] [c00000000008d0d0] .copy_process+0x2ec/0xf40 (unreliable)
[c00000010978fc80] [c00000000008deb4] .do_fork+0x190/0x3cc
[c00000010978fdc0] [c000000000011ef4] .sys_clone+0x58/0x70
[c00000010978fe30] [c0000000000087f0] .ppc_clone+0x8/0xc
Instruction dump:
419e0010 7fe3fb78 480774cd 60000000 801f0014 e93f0008 7800b842 39290080
78004800 60000042 901f0014 38004000<7d6048a8> 7d6b0078 7d6049ad 40c2fff4

Kernel version 2.6.34-rc3-git3 works fine.

Thanks
Divya



Attachments:
2.6.34-rc3-git4.log (42.64 kB)

2010-07-01 05:04:57

by Michael Neuling

[permalink] [raw]
Subject: Re: Oops while running fs_racer test on a POWER6 box against latest git

> While running fs_racer test from LTP on a POWER6 box against latest git(2.6.3
5-rc3-git4 - commitid 984bc9601f64fd)
> came across the following warning followed by multiple oops.
>
> ------------[ cut here ]------------
>
> Badness at kernel/mutex-debug.c:64
> NIP: c0000000000be9e8 LR: c0000000000be9cc CTR: 0000000000000000
> REGS: c00000010be8f6f0 TRAP: 0700 Not tainted (2.6.35-rc3-git4-autotest)
> MSR: 8000000000029032<EE,ME,CE,IR,DR> CR: 24224422 XER: 00000012
> TASK = c00000010727cf00[8211] 'fs_racer_file_c' THREAD: c00000010be8bb50 CPU:
2
> GPR00: 0000000000000000 c00000010be8f970 c000000000d3d798 0000000000000001
> GPR04: c00000010be8fa70 c00000010be8c000 c00000010727d9f8 0000000000000000
> GPR08: c0000000043042f0 c0000000016534e8 000000000000017a c000000000c29a1c
> GPR12: 0000000028228424 c00000000f600500 c00000010be8fc40 0000000020000000
> GPR16: fffffffffffff000 c000000109c73000 c00000010be8fc30 0000000000010442
> GPR20: 0000000000000000 0000000000000000 00000000000001b6 c00000010dd12250
> GPR24: c00000000017c08c c00000010727cf00 c00000010dd12278 c00000010dd12210
> GPR28: 0000000000000001 c00000010be8c000 c000000000ca2008 c00000010be8fa70
> NIP [c0000000000be9e8] .mutex_remove_waiter+0xa4/0x130
> LR [c0000000000be9cc] .mutex_remove_waiter+0x88/0x130
> Call Trace:
> [c00000010be8f970] [c00000010be8fa00] 0xc00000010be8fa00 (unreliable)
> [c00000010be8fa00] [c00000000064a9f0] .mutex_lock_nested+0x384/0x430
> Instruction dump:
> e81f0010 e93d0000 7fa04800 41fe0028 482e96e5 60000000 2fa30000 419e0018
> e93e8008 80090000 2f800000 409e0008<0fe00000> e93e8000 80090000 2f800000
> Unable to handle kernel paging request for unknown fault
> Faulting instruction address: 0xc00000000008d0f4
> Oops: Kernel access of bad area, sig: 7 [#1]
> SMP NR_CPUS=1024 NUMA
> Unrecoverable FP Unavailable Exception 800 at c000000000648ed4
> pSeries
> last sysfs file: /sys/devices/system/cpu/cpu19/cache/index1/shared_cpu_map
> Modules linked in: ipv6 fuse loop dm_mod sr_mod cdrom ibmveth sg
> sd_mod crc_t10dif ibmvscsic scsi_transport_srp scsi_tgt scsi_mod
> NIP: c00000000008d0f4 LR: c00000000008d0d0 CTR: 0000000000000000
> REGS: c00000010978f900 TRAP: 0600 Tainted: G W (2.6.35-rc3-git4-a
utotest)
> MSR: 8000000000009032
> Unrecoverable FP Unavailable Exception 800 at c000000000648ed4
> Unrecoverable FP Unavailable Exception 800 at c000000000648ed4
> Unrecoverable FP Unavailable Exception 800 at c000000000648ed4
> Unrecoverable FP Unavailable Exception 800 at c000000000648ed4
> Unrecoverable FP Unavailable Exception 800 at c000000000648ed4
> EE,ME,IR,DR> CR: 24022442 XER: 00000012
> DAR: c000000000648f54, DSISR: 0000000040010000
> TASK = c0000001096e4900[7353] 'fs_racer_file_s' THREAD: c00000010978c000 CPU:
10
> GPR00: 0000000000004000 c00000010978fb80 c000000000d3d798 0000000000000001
> GPR04: c00000000083539e c000000001610228 0000000000000000 c0000000054c6880
> GPR08: 00000000000006a5 c000000000648f54 0000000000000007 00000000049b0000
> GPR12: 0000000000000000 c00000000f601900 00000000ffffffff ffffffffffffffff
> GPR16: 000000004b7dc520 0000000000000000 0000000000000000 c00000010978fea0
> GPR20: 00000fffcca7e7a0 00000fffcca7e7a0 00000fffabf7dfd0 00000fffabf7dfd0
> GPR24: 0000000000000000 0000000001200011 c000000000e1c0a8 c000000000648ed4
> GPR28: 0000000000000000 c0000001096e4900 c000000000ca0458 c00000010725d400
> NIP [c00000000008d0f4] .copy_process+0x310/0xf40
> LR [c00000000008d0d0] .copy_process+0x2ec/0xf40
> Call Trace:
> [c00000010978fb80] [c00000000008d0d0] .copy_process+0x2ec/0xf40 (unreliable)
> [c00000010978fc80] [c00000000008deb4] .do_fork+0x190/0x3cc
> [c00000010978fdc0] [c000000000011ef4] .sys_clone+0x58/0x70
> [c00000010978fe30] [c0000000000087f0] .ppc_clone+0x8/0xc
> Instruction dump:
> 419e0010 7fe3fb78 480774cd 60000000 801f0014 e93f0008 7800b842 39290080
> 78004800 60000042 901f0014 38004000<7d6048a8> 7d6b0078 7d6049ad 40c2fff4
>
> Kernel version 2.6.34-rc3-git3 works fine.

Should this read 2.6.35-rc3-git3?

If so, there's only about 20 commits in:
5904b3b81d2516..984bc9601f64fd

The likely fs related candidates are from Christoph and Nick Piggin
(added to CC)

No commits relating to POWER6 or PPC.

Mikey

2010-07-01 10:59:17

by Nick Piggin

[permalink] [raw]
Subject: Re: Oops while running fs_racer test on a POWER6 box against latest git

On Thu, Jul 01, 2010 at 03:04:54PM +1000, Michael Neuling wrote:
> > While running fs_racer test from LTP on a POWER6 box against latest git(2.6.3
> 5-rc3-git4 - commitid 984bc9601f64fd)
> > came across the following warning followed by multiple oops.
> >
> > ------------[ cut here ]------------
> >
> > Badness at kernel/mutex-debug.c:64
> > NIP: c0000000000be9e8 LR: c0000000000be9cc CTR: 0000000000000000
> > REGS: c00000010be8f6f0 TRAP: 0700 Not tainted (2.6.35-rc3-git4-autotest)
> > MSR: 8000000000029032<EE,ME,CE,IR,DR> CR: 24224422 XER: 00000012
> > TASK = c00000010727cf00[8211] 'fs_racer_file_c' THREAD: c00000010be8bb50 CPU:
> 2
> > GPR00: 0000000000000000 c00000010be8f970 c000000000d3d798 0000000000000001
> > GPR04: c00000010be8fa70 c00000010be8c000 c00000010727d9f8 0000000000000000
> > GPR08: c0000000043042f0 c0000000016534e8 000000000000017a c000000000c29a1c
> > GPR12: 0000000028228424 c00000000f600500 c00000010be8fc40 0000000020000000
> > GPR16: fffffffffffff000 c000000109c73000 c00000010be8fc30 0000000000010442
> > GPR20: 0000000000000000 0000000000000000 00000000000001b6 c00000010dd12250
> > GPR24: c00000000017c08c c00000010727cf00 c00000010dd12278 c00000010dd12210
> > GPR28: 0000000000000001 c00000010be8c000 c000000000ca2008 c00000010be8fa70
> > NIP [c0000000000be9e8] .mutex_remove_waiter+0xa4/0x130
> > LR [c0000000000be9cc] .mutex_remove_waiter+0x88/0x130
> > Call Trace:
> > [c00000010be8f970] [c00000010be8fa00] 0xc00000010be8fa00 (unreliable)
> > [c00000010be8fa00] [c00000000064a9f0] .mutex_lock_nested+0x384/0x430
> > Instruction dump:
> > e81f0010 e93d0000 7fa04800 41fe0028 482e96e5 60000000 2fa30000 419e0018
> > e93e8008 80090000 2f800000 409e0008<0fe00000> e93e8000 80090000 2f800000
> > Unable to handle kernel paging request for unknown fault
> > Faulting instruction address: 0xc00000000008d0f4
> > Oops: Kernel access of bad area, sig: 7 [#1]
> > SMP NR_CPUS=1024 NUMA
> > Unrecoverable FP Unavailable Exception 800 at c000000000648ed4
> > pSeries
> > last sysfs file: /sys/devices/system/cpu/cpu19/cache/index1/shared_cpu_map
> > Modules linked in: ipv6 fuse loop dm_mod sr_mod cdrom ibmveth sg
> > sd_mod crc_t10dif ibmvscsic scsi_transport_srp scsi_tgt scsi_mod
> > NIP: c00000000008d0f4 LR: c00000000008d0d0 CTR: 0000000000000000
> > REGS: c00000010978f900 TRAP: 0600 Tainted: G W (2.6.35-rc3-git4-a
> utotest)
> > MSR: 8000000000009032
> > Unrecoverable FP Unavailable Exception 800 at c000000000648ed4
> > Unrecoverable FP Unavailable Exception 800 at c000000000648ed4
> > Unrecoverable FP Unavailable Exception 800 at c000000000648ed4
> > Unrecoverable FP Unavailable Exception 800 at c000000000648ed4
> > Unrecoverable FP Unavailable Exception 800 at c000000000648ed4
> > EE,ME,IR,DR> CR: 24022442 XER: 00000012
> > DAR: c000000000648f54, DSISR: 0000000040010000
> > TASK = c0000001096e4900[7353] 'fs_racer_file_s' THREAD: c00000010978c000 CPU:
> 10
> > GPR00: 0000000000004000 c00000010978fb80 c000000000d3d798 0000000000000001
> > GPR04: c00000000083539e c000000001610228 0000000000000000 c0000000054c6880
> > GPR08: 00000000000006a5 c000000000648f54 0000000000000007 00000000049b0000
> > GPR12: 0000000000000000 c00000000f601900 00000000ffffffff ffffffffffffffff
> > GPR16: 000000004b7dc520 0000000000000000 0000000000000000 c00000010978fea0
> > GPR20: 00000fffcca7e7a0 00000fffcca7e7a0 00000fffabf7dfd0 00000fffabf7dfd0
> > GPR24: 0000000000000000 0000000001200011 c000000000e1c0a8 c000000000648ed4
> > GPR28: 0000000000000000 c0000001096e4900 c000000000ca0458 c00000010725d400
> > NIP [c00000000008d0f4] .copy_process+0x310/0xf40
> > LR [c00000000008d0d0] .copy_process+0x2ec/0xf40
> > Call Trace:
> > [c00000010978fb80] [c00000000008d0d0] .copy_process+0x2ec/0xf40 (unreliable)
> > [c00000010978fc80] [c00000000008deb4] .do_fork+0x190/0x3cc
> > [c00000010978fdc0] [c000000000011ef4] .sys_clone+0x58/0x70
> > [c00000010978fe30] [c0000000000087f0] .ppc_clone+0x8/0xc
> > Instruction dump:
> > 419e0010 7fe3fb78 480774cd 60000000 801f0014 e93f0008 7800b842 39290080
> > 78004800 60000042 901f0014 38004000<7d6048a8> 7d6b0078 7d6049ad 40c2fff4
> >
> > Kernel version 2.6.34-rc3-git3 works fine.
>
> Should this read 2.6.35-rc3-git3?
>
> If so, there's only about 20 commits in:
> 5904b3b81d2516..984bc9601f64fd
>
> The likely fs related candidates are from Christoph and Nick Piggin
> (added to CC)
>
> No commits relating to POWER6 or PPC.

Not sure what's happening here. The first warning looks like some mutex
corruption, but it doesn't have a stack trace (these are 2 seperate
dumps, right? ie. the copy_process stack doesn't relate to the mutex
warning?) So I don't have much idea.

If it is reproducable, can you try getting a better stack trace, or
better yet, even bisecting if there is just a small window?

Thanks,
Nick

2010-07-01 18:25:43

by Maciej Rutecki

[permalink] [raw]
Subject: Re: Oops while running fs_racer test on a POWER6 box against latest git

On środa, 30 czerwca 2010 o 13:22:27 divya wrote:
> While running fs_racer test from LTP on a POWER6 box against latest
> git(2.6.35-rc3-git4 - commitid 984bc9601f64fd) came across the following
> warning followed by multiple oops.
>

I created a Bugzilla entry at
https://bugzilla.kernel.org/show_bug.cgi?id=16324
for your bug report, please add your address to the CC list in there, thanks!


--
Maciej Rutecki
http://www.maciek.unixy.pl

2010-07-02 01:36:43

by Michael Neuling

[permalink] [raw]
Subject: Re: Oops while running fs_racer test on a POWER6 box against latest git

In message <20100701105907.GK22976@laptop> you wrote:
> On Thu, Jul 01, 2010 at 03:04:54PM +1000, Michael Neuling wrote:
> > > While running fs_racer test from LTP on a POWER6 box against latest git(2
.6.3
> > 5-rc3-git4 - commitid 984bc9601f64fd)
> > > came across the following warning followed by multiple oops.
> > >
> > > ------------[ cut here ]------------
> > >
> > > Badness at kernel/mutex-debug.c:64
> > > NIP: c0000000000be9e8 LR: c0000000000be9cc CTR: 0000000000000000
> > > REGS: c00000010be8f6f0 TRAP: 0700 Not tainted (2.6.35-rc3-git4-autotes
t)
> > > MSR: 8000000000029032<EE,ME,CE,IR,DR> CR: 24224422 XER: 00000012
> > > TASK = c00000010727cf00[8211] 'fs_racer_file_c' THREAD: c00000010be8bb50
CPU:
> > 2
> > > GPR00: 0000000000000000 c00000010be8f970 c000000000d3d798 000000000000000
1
> > > GPR04: c00000010be8fa70 c00000010be8c000 c00000010727d9f8 000000000000000
0
> > > GPR08: c0000000043042f0 c0000000016534e8 000000000000017a c000000000c29a1
c
> > > GPR12: 0000000028228424 c00000000f600500 c00000010be8fc40 000000002000000
0
> > > GPR16: fffffffffffff000 c000000109c73000 c00000010be8fc30 000000000001044
2
> > > GPR20: 0000000000000000 0000000000000000 00000000000001b6 c00000010dd1225
0
> > > GPR24: c00000000017c08c c00000010727cf00 c00000010dd12278 c00000010dd1221
0
> > > GPR28: 0000000000000001 c00000010be8c000 c000000000ca2008 c00000010be8fa7
0
> > > NIP [c0000000000be9e8] .mutex_remove_waiter+0xa4/0x130
> > > LR [c0000000000be9cc] .mutex_remove_waiter+0x88/0x130
> > > Call Trace:
> > > [c00000010be8f970] [c00000010be8fa00] 0xc00000010be8fa00 (unreliable)
> > > [c00000010be8fa00] [c00000000064a9f0] .mutex_lock_nested+0x384/0x430
> > > Instruction dump:
> > > e81f0010 e93d0000 7fa04800 41fe0028 482e96e5 60000000 2fa30000 419e0018
> > > e93e8008 80090000 2f800000 409e0008<0fe00000> e93e8000 80090000 2f80000
0
> > > Unable to handle kernel paging request for unknown fault
> > > Faulting instruction address: 0xc00000000008d0f4
> > > Oops: Kernel access of bad area, sig: 7 [#1]
> > > SMP NR_CPUS=1024 NUMA
> > > Unrecoverable FP Unavailable Exception 800 at c000000000648ed4
> > > pSeries
> > > last sysfs file: /sys/devices/system/cpu/cpu19/cache/index1/shared_cpu_ma
p
> > > Modules linked in: ipv6 fuse loop dm_mod sr_mod cdrom ibmveth sg
> > > sd_mod crc_t10dif ibmvscsic scsi_transport_srp scsi_tgt scsi_mod
> > > NIP: c00000000008d0f4 LR: c00000000008d0d0 CTR: 0000000000000000
> > > REGS: c00000010978f900 TRAP: 0600 Tainted: G W (2.6.35-rc3-gi
t4-a
> > utotest)
> > > MSR: 8000000000009032
> > > Unrecoverable FP Unavailable Exception 800 at c000000000648ed4
> > > Unrecoverable FP Unavailable Exception 800 at c000000000648ed4
> > > Unrecoverable FP Unavailable Exception 800 at c000000000648ed4
> > > Unrecoverable FP Unavailable Exception 800 at c000000000648ed4
> > > Unrecoverable FP Unavailable Exception 800 at c000000000648ed4
> > > EE,ME,IR,DR> CR: 24022442 XER: 00000012
> > > DAR: c000000000648f54, DSISR: 0000000040010000
> > > TASK = c0000001096e4900[7353] 'fs_racer_file_s' THREAD: c00000010978c000
CPU:
> > 10
> > > GPR00: 0000000000004000 c00000010978fb80 c000000000d3d798 000000000000000
1
> > > GPR04: c00000000083539e c000000001610228 0000000000000000 c0000000054c688
0
> > > GPR08: 00000000000006a5 c000000000648f54 0000000000000007 00000000049b000
0
> > > GPR12: 0000000000000000 c00000000f601900 00000000ffffffff fffffffffffffff
f
> > > GPR16: 000000004b7dc520 0000000000000000 0000000000000000 c00000010978fea
0
> > > GPR20: 00000fffcca7e7a0 00000fffcca7e7a0 00000fffabf7dfd0 00000fffabf7dfd
0
> > > GPR24: 0000000000000000 0000000001200011 c000000000e1c0a8 c000000000648ed
4
> > > GPR28: 0000000000000000 c0000001096e4900 c000000000ca0458 c00000010725d40
0
> > > NIP [c00000000008d0f4] .copy_process+0x310/0xf40
> > > LR [c00000000008d0d0] .copy_process+0x2ec/0xf40
> > > Call Trace:
> > > [c00000010978fb80] [c00000000008d0d0] .copy_process+0x2ec/0xf40 (unreliab
le)
> > > [c00000010978fc80] [c00000000008deb4] .do_fork+0x190/0x3cc
> > > [c00000010978fdc0] [c000000000011ef4] .sys_clone+0x58/0x70
> > > [c00000010978fe30] [c0000000000087f0] .ppc_clone+0x8/0xc
> > > Instruction dump:
> > > 419e0010 7fe3fb78 480774cd 60000000 801f0014 e93f0008 7800b842 39290080
> > > 78004800 60000042 901f0014 38004000<7d6048a8> 7d6b0078 7d6049ad 40c2fff
4
> > >
> > > Kernel version 2.6.34-rc3-git3 works fine.
> >
> > Should this read 2.6.35-rc3-git3?
> >
> > If so, there's only about 20 commits in:
> > 5904b3b81d2516..984bc9601f64fd
> >
> > The likely fs related candidates are from Christoph and Nick Piggin
> > (added to CC)
> >
> > No commits relating to POWER6 or PPC.
>
> Not sure what's happening here. The first warning looks like some mutex
> corruption, but it doesn't have a stack trace (these are 2 seperate
> dumps, right? ie. the copy_process stack doesn't relate to the mutex
> warning?) So I don't have much idea.
>
> If it is reproducable, can you try getting a better stack trace, or
> better yet, even bisecting if there is just a small window?

I can't reproduce the bug here on POWER6 or POWER7.

Divya, can you bisect this?

Mikey

2010-07-02 06:47:09

by divya

[permalink] [raw]
Subject: Re: Oops while running fs_racer test on a POWER6 box against latest git

On Thursday 01 July 2010 11:55 PM, Maciej Rutecki wrote:
> On środa, 30 czerwca 2010 o 13:22:27 divya wrote:
>
>> While running fs_racer test from LTP on a POWER6 box against latest
>> git(2.6.35-rc3-git4 - commitid 984bc9601f64fd) came across the following
>> warning followed by multiple oops.
>>
>>
> I created a Bugzilla entry at
> https://bugzilla.kernel.org/show_bug.cgi?id=16324
> for your bug report, please add your address to the CC list in there, thanks!
>
>
>
Here I find a cleaner back trace while running fs_racer test from LTP on a POWER6
box against the latest git(2.6.35-rc3-git5 - commitid 980019d74e4b242)

Badness at kernel/mutex-debug.c:64
BUG: key (null) not in .data!
NIP: c0000000000be9e8 LR: c0000000000be9cc CTR: 0000000000000000
REGS: c00000010bb176f0 TRAP: 0700 Not tainted (2.6.35-rc3-git5-autotest)
BUG: key 00000000000001d8 not in .data!
BUG: key 00000000000001e0 not in .data!
BUG: key 00000000000001e8 not in .data!
MSR: 8000000000029032
Unable to handle kernel paging request for data at address 0x00000028
Faulting instruction address: 0xc0000000003ad0ec
Oops: Kernel access of bad area, sig: 11 [#1]
SMP NR_CPUS=1024 NUMA pSeries
last sysfs file: /sys/devices/system/cpu/cpu19/cache/index1/shared_cpu_map
Page fault in user mode with in_atomic() = 1 mm = c00000010943e600
Modules linked in:
NIP = fff9e98fc40 MSR = 800000004001d032
ipv6 fuse loop
Unable to handle kernel paging request for unknown fault
dm_mod
Faulting instruction address: 0xc00000000008d0f4
sr_mod ibmveth cdrom sg sd_mod crc_t10dif ibmvscsic scsi_transport_srp scsi_tgt scsi_mod
NIP: c0000000003ad0ec LR: c00000000064c3b0 CTR: c0000000003a6eb0
REGS: c000000109b4f610 TRAP: 0300 Not tainted (2.6.35-rc3-git5-autotest)
MSR: 8000000000009032<EE,ME,IR,DR> CR: 88004484 XER: 00000001
DAR: 0000000000000028, DSISR: 0000000040010000
TASK = c000000109a98600[7403] 'mkdir' THREAD: c000000109b4c000 CPU: 19
GPR00: 0000000080000013 c000000109b4f890 c000000000d3d798 0000000000000028
GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000000000001
GPR08: 0000000000000000 0000000000000028 c000000000189f2c c000000109a98600
GPR12: 0000000024004424 c00000000f602f80 00000000000041ff 0000000000000001
GPR16: 0000000000000002 c00000010d8304c0 c000000109b4fb44 0000000000000000
GPR20: c00000010df77908 fffffffffffff000 0000000000010000 00000000000041ff
GPR24: c00000010df77758 c000000109fa1800 c00000010df77908 c0000000ff236600
GPR28: 0000000000000028 0000000000000040 c000000000ca7b38 c000000000189f2c
NIP [c0000000003ad0ec] .do_raw_spin_trylock+0x10/0x48
LR [c00000000064c3b0] ._raw_spin_lock+0x50/0xa4
Call Trace:
[c000000109b4f890] [c00000000064c3a4] ._raw_spin_lock+0x44/0xa4 (unreliable)
[c000000109b4f920] [c000000000189f2c] .new_inode+0x4c/0xe4
[c000000109b4f9b0] [c0000000002257fc] .ext3_new_inode+0x84/0xb70
[c000000109b4fad0] [c00000000022f1ec] .ext3_mkdir+0x130/0x438
[c000000109b4fbe0] [c00000000017adb4] .vfs_mkdir+0xb8/0x160
[c000000109b4fc80] [c00000000017e52c] .SyS_mkdirat+0xb0/0x114
[c000000109b4fdc0] [c00000000017a730] .SyS_mkdir+0x1c/0x30
[c000000109b4fe30] [c0000000000085b4] syscall_exit+0x0/0x40
Instruction dump:
eb41ffd0 7c0803a6 eb61ffd8 eb81ffe0 eba1ffe8 ebc1fff0 ebe1fff8 4e800020
38000000 7c691b78 980d0214 800d0008<7d601829> 2c0b0000 40c20010 7c00192d
Oops: Weird page fault, sig: 11 [#2]

Pls let me know if this back trace would help in analyzing further.
Meanwhile I shall do a git bisect and send the inputs.

Thanks
Divya


2010-07-09 06:58:09

by divya

[permalink] [raw]
Subject: Re: Oops while running fs_racer test on a POWER6 box against latest git

On Friday 02 July 2010 12:16 PM, divya wrote:
> On Thursday 01 July 2010 11:55 PM, Maciej Rutecki wrote:
>> On środa, 30 czerwca 2010 o 13:22:27 divya wrote:
>>> While running fs_racer test from LTP on a POWER6 box against latest
>>> git(2.6.35-rc3-git4 - commitid 984bc9601f64fd) came across the
>>> following
>>> warning followed by multiple oops.
>>>
>> I created a Bugzilla entry at
>> https://bugzilla.kernel.org/show_bug.cgi?id=16324
>> for your bug report, please add your address to the CC list in there,
>> thanks!
>>
>>
> Here I find a cleaner back trace while running fs_racer test from LTP
> on a POWER6
> box against the latest git(2.6.35-rc3-git5 - commitid 980019d74e4b242)
>
> Badness at kernel/mutex-debug.c:64
> BUG: key (null) not in .data!
> NIP: c0000000000be9e8 LR: c0000000000be9cc CTR: 0000000000000000
> REGS: c00000010bb176f0 TRAP: 0700 Not tainted
> (2.6.35-rc3-git5-autotest)
> BUG: key 00000000000001d8 not in .data!
> BUG: key 00000000000001e0 not in .data!
> BUG: key 00000000000001e8 not in .data!
> MSR: 8000000000029032
> Unable to handle kernel paging request for data at address 0x00000028
> Faulting instruction address: 0xc0000000003ad0ec
> Oops: Kernel access of bad area, sig: 11 [#1]
> SMP NR_CPUS=1024 NUMA pSeries
> last sysfs file:
> /sys/devices/system/cpu/cpu19/cache/index1/shared_cpu_map
> Page fault in user mode with in_atomic() = 1 mm = c00000010943e600
> Modules linked in:
> NIP = fff9e98fc40 MSR = 800000004001d032
> ipv6 fuse loop
> Unable to handle kernel paging request for unknown fault
> dm_mod
> Faulting instruction address: 0xc00000000008d0f4
> sr_mod ibmveth cdrom sg sd_mod crc_t10dif ibmvscsic
> scsi_transport_srp scsi_tgt scsi_mod
> NIP: c0000000003ad0ec LR: c00000000064c3b0 CTR: c0000000003a6eb0
> REGS: c000000109b4f610 TRAP: 0300 Not tainted
> (2.6.35-rc3-git5-autotest)
> MSR: 8000000000009032<EE,ME,IR,DR> CR: 88004484 XER: 00000001
> DAR: 0000000000000028, DSISR: 0000000040010000
> TASK = c000000109a98600[7403] 'mkdir' THREAD: c000000109b4c000 CPU: 19
> GPR00: 0000000080000013 c000000109b4f890 c000000000d3d798
> 0000000000000028
> GPR04: 0000000000000000 0000000000000000 0000000000000000
> 0000000000000001
> GPR08: 0000000000000000 0000000000000028 c000000000189f2c
> c000000109a98600
> GPR12: 0000000024004424 c00000000f602f80 00000000000041ff
> 0000000000000001
> GPR16: 0000000000000002 c00000010d8304c0 c000000109b4fb44
> 0000000000000000
> GPR20: c00000010df77908 fffffffffffff000 0000000000010000
> 00000000000041ff
> GPR24: c00000010df77758 c000000109fa1800 c00000010df77908
> c0000000ff236600
> GPR28: 0000000000000028 0000000000000040 c000000000ca7b38
> c000000000189f2c
> NIP [c0000000003ad0ec] .do_raw_spin_trylock+0x10/0x48
> LR [c00000000064c3b0] ._raw_spin_lock+0x50/0xa4
> Call Trace:
> [c000000109b4f890] [c00000000064c3a4] ._raw_spin_lock+0x44/0xa4
> (unreliable)
> [c000000109b4f920] [c000000000189f2c] .new_inode+0x4c/0xe4
> [c000000109b4f9b0] [c0000000002257fc] .ext3_new_inode+0x84/0xb70
> [c000000109b4fad0] [c00000000022f1ec] .ext3_mkdir+0x130/0x438
> [c000000109b4fbe0] [c00000000017adb4] .vfs_mkdir+0xb8/0x160
> [c000000109b4fc80] [c00000000017e52c] .SyS_mkdirat+0xb0/0x114
> [c000000109b4fdc0] [c00000000017a730] .SyS_mkdir+0x1c/0x30
> [c000000109b4fe30] [c0000000000085b4] syscall_exit+0x0/0x40
> Instruction dump:
> eb41ffd0 7c0803a6 eb61ffd8 eb81ffe0 eba1ffe8 ebc1fff0 ebe1fff8 4e800020
> 38000000 7c691b78 980d0214 800d0008<7d601829> 2c0b0000 40c20010 7c00192d
> Oops: Weird page fault, sig: 11 [#2]
>
> Pls let me know if this back trace would help in analyzing further.
> Meanwhile I shall do a git bisect and send the inputs.
>
> Thanks
> Divya
>
>
>
Hi All,

From the git bisect,seems like the commit 57439f878afafefad8836ebf5c49da2a0a746105 is the corrupt for the above issue.

Thanks
Divya

2010-07-09 07:34:19

by Jens Axboe

[permalink] [raw]
Subject: Re: Oops while running fs_racer test on a POWER6 box against latest git

On 2010-07-09 08:57, divya wrote:
> On Friday 02 July 2010 12:16 PM, divya wrote:
>> On Thursday 01 July 2010 11:55 PM, Maciej Rutecki wrote:
>>> On środa, 30 czerwca 2010 o 13:22:27 divya wrote:
>>>> While running fs_racer test from LTP on a POWER6 box against latest
>>>> git(2.6.35-rc3-git4 - commitid 984bc9601f64fd) came across the
>>>> following
>>>> warning followed by multiple oops.
>>>>
>>> I created a Bugzilla entry at
>>> https://bugzilla.kernel.org/show_bug.cgi?id=16324
>>> for your bug report, please add your address to the CC list in there,
>>> thanks!
>>>
>>>
>> Here I find a cleaner back trace while running fs_racer test from LTP
>> on a POWER6
>> box against the latest git(2.6.35-rc3-git5 - commitid 980019d74e4b242)
>>
>> Badness at kernel/mutex-debug.c:64
>> BUG: key (null) not in .data!
>> NIP: c0000000000be9e8 LR: c0000000000be9cc CTR: 0000000000000000
>> REGS: c00000010bb176f0 TRAP: 0700 Not tainted
>> (2.6.35-rc3-git5-autotest)
>> BUG: key 00000000000001d8 not in .data!
>> BUG: key 00000000000001e0 not in .data!
>> BUG: key 00000000000001e8 not in .data!
>> MSR: 8000000000029032
>> Unable to handle kernel paging request for data at address 0x00000028
>> Faulting instruction address: 0xc0000000003ad0ec
>> Oops: Kernel access of bad area, sig: 11 [#1]
>> SMP NR_CPUS=1024 NUMA pSeries
>> last sysfs file:
>> /sys/devices/system/cpu/cpu19/cache/index1/shared_cpu_map
>> Page fault in user mode with in_atomic() = 1 mm = c00000010943e600
>> Modules linked in:
>> NIP = fff9e98fc40 MSR = 800000004001d032
>> ipv6 fuse loop
>> Unable to handle kernel paging request for unknown fault
>> dm_mod
>> Faulting instruction address: 0xc00000000008d0f4
>> sr_mod ibmveth cdrom sg sd_mod crc_t10dif ibmvscsic
>> scsi_transport_srp scsi_tgt scsi_mod
>> NIP: c0000000003ad0ec LR: c00000000064c3b0 CTR: c0000000003a6eb0
>> REGS: c000000109b4f610 TRAP: 0300 Not tainted
>> (2.6.35-rc3-git5-autotest)
>> MSR: 8000000000009032<EE,ME,IR,DR> CR: 88004484 XER: 00000001
>> DAR: 0000000000000028, DSISR: 0000000040010000
>> TASK = c000000109a98600[7403] 'mkdir' THREAD: c000000109b4c000 CPU: 19
>> GPR00: 0000000080000013 c000000109b4f890 c000000000d3d798
>> 0000000000000028
>> GPR04: 0000000000000000 0000000000000000 0000000000000000
>> 0000000000000001
>> GPR08: 0000000000000000 0000000000000028 c000000000189f2c
>> c000000109a98600
>> GPR12: 0000000024004424 c00000000f602f80 00000000000041ff
>> 0000000000000001
>> GPR16: 0000000000000002 c00000010d8304c0 c000000109b4fb44
>> 0000000000000000
>> GPR20: c00000010df77908 fffffffffffff000 0000000000010000
>> 00000000000041ff
>> GPR24: c00000010df77758 c000000109fa1800 c00000010df77908
>> c0000000ff236600
>> GPR28: 0000000000000028 0000000000000040 c000000000ca7b38
>> c000000000189f2c
>> NIP [c0000000003ad0ec] .do_raw_spin_trylock+0x10/0x48
>> LR [c00000000064c3b0] ._raw_spin_lock+0x50/0xa4
>> Call Trace:
>> [c000000109b4f890] [c00000000064c3a4] ._raw_spin_lock+0x44/0xa4
>> (unreliable)
>> [c000000109b4f920] [c000000000189f2c] .new_inode+0x4c/0xe4
>> [c000000109b4f9b0] [c0000000002257fc] .ext3_new_inode+0x84/0xb70
>> [c000000109b4fad0] [c00000000022f1ec] .ext3_mkdir+0x130/0x438
>> [c000000109b4fbe0] [c00000000017adb4] .vfs_mkdir+0xb8/0x160
>> [c000000109b4fc80] [c00000000017e52c] .SyS_mkdirat+0xb0/0x114
>> [c000000109b4fdc0] [c00000000017a730] .SyS_mkdir+0x1c/0x30
>> [c000000109b4fe30] [c0000000000085b4] syscall_exit+0x0/0x40
>> Instruction dump:
>> eb41ffd0 7c0803a6 eb61ffd8 eb81ffe0 eba1ffe8 ebc1fff0 ebe1fff8 4e800020
>> 38000000 7c691b78 980d0214 800d0008<7d601829> 2c0b0000 40c20010 7c00192d
>> Oops: Weird page fault, sig: 11 [#2]
>>
>> Pls let me know if this back trace would help in analyzing further.
>> Meanwhile I shall do a git bisect and send the inputs.
>>
>> Thanks
>> Divya
>>
>>
>>
> Hi All,
>
> From the git bisect,seems like the commit
> 57439f878afafefad8836ebf5c49da2a0a746105 is the corrupt for the above
> issue.

CC'ing Nick and Al.

--
Jens Axboe

2010-07-09 08:40:27

by Nick Piggin

[permalink] [raw]
Subject: Re: Oops while running fs_racer test on a POWER6 box against latest git

On Fri, Jul 09, 2010 at 09:34:16AM +0200, Jens Axboe wrote:
> On 2010-07-09 08:57, divya wrote:
> > On Friday 02 July 2010 12:16 PM, divya wrote:
> >> On Thursday 01 July 2010 11:55 PM, Maciej Rutecki wrote:
> >>> On środa, 30 czerwca 2010 o 13:22:27 divya wrote:
> >>>> While running fs_racer test from LTP on a POWER6 box against latest
> >>>> git(2.6.35-rc3-git4 - commitid 984bc9601f64fd) came across the
> >>>> following
> >>>> warning followed by multiple oops.
> >>>>
> >>> I created a Bugzilla entry at
> >>> https://bugzilla.kernel.org/show_bug.cgi?id=16324
> >>> for your bug report, please add your address to the CC list in there,
> >>> thanks!
> >>>
> >>>
> >> Here I find a cleaner back trace while running fs_racer test from LTP
> >> on a POWER6
> >> box against the latest git(2.6.35-rc3-git5 - commitid 980019d74e4b242)
> >>
> >> Badness at kernel/mutex-debug.c:64
> >> BUG: key (null) not in .data!
> >> NIP: c0000000000be9e8 LR: c0000000000be9cc CTR: 0000000000000000
> >> REGS: c00000010bb176f0 TRAP: 0700 Not tainted
> >> (2.6.35-rc3-git5-autotest)
> >> BUG: key 00000000000001d8 not in .data!
> >> BUG: key 00000000000001e0 not in .data!
> >> BUG: key 00000000000001e8 not in .data!
> >> MSR: 8000000000029032
> >> Unable to handle kernel paging request for data at address 0x00000028
> >> Faulting instruction address: 0xc0000000003ad0ec
> >> Oops: Kernel access of bad area, sig: 11 [#1]
> >> SMP NR_CPUS=1024 NUMA pSeries
> >> last sysfs file:
> >> /sys/devices/system/cpu/cpu19/cache/index1/shared_cpu_map
> >> Page fault in user mode with in_atomic() = 1 mm = c00000010943e600
> >> Modules linked in:
> >> NIP = fff9e98fc40 MSR = 800000004001d032
> >> ipv6 fuse loop
> >> Unable to handle kernel paging request for unknown fault
> >> dm_mod
> >> Faulting instruction address: 0xc00000000008d0f4
> >> sr_mod ibmveth cdrom sg sd_mod crc_t10dif ibmvscsic
> >> scsi_transport_srp scsi_tgt scsi_mod
> >> NIP: c0000000003ad0ec LR: c00000000064c3b0 CTR: c0000000003a6eb0
> >> REGS: c000000109b4f610 TRAP: 0300 Not tainted
> >> (2.6.35-rc3-git5-autotest)
> >> MSR: 8000000000009032<EE,ME,IR,DR> CR: 88004484 XER: 00000001
> >> DAR: 0000000000000028, DSISR: 0000000040010000
> >> TASK = c000000109a98600[7403] 'mkdir' THREAD: c000000109b4c000 CPU: 19
> >> GPR00: 0000000080000013 c000000109b4f890 c000000000d3d798
> >> 0000000000000028
> >> GPR04: 0000000000000000 0000000000000000 0000000000000000
> >> 0000000000000001
> >> GPR08: 0000000000000000 0000000000000028 c000000000189f2c
>> c000000109a98600
> >> GPR12: 0000000024004424 c00000000f602f80 00000000000041ff
> >> 0000000000000001
> >> GPR16: 0000000000000002 c00000010d8304c0 c000000109b4fb44
> >> 0000000000000000
> >> GPR20: c00000010df77908 fffffffffffff000 0000000000010000
> >> 00000000000041ff
> >> GPR24: c00000010df77758 c000000109fa1800 c00000010df77908
> >> c0000000ff236600
> >> GPR28: 0000000000000028 0000000000000040 c000000000ca7b38
> >> c000000000189f2c
> >> NIP [c0000000003ad0ec] .do_raw_spin_trylock+0x10/0x48
> >> LR [c00000000064c3b0] ._raw_spin_lock+0x50/0xa4
> >> Call Trace:
> >> [c000000109b4f890] [c00000000064c3a4] ._raw_spin_lock+0x44/0xa4
> >> (unreliable)
> >> [c000000109b4f920] [c000000000189f2c] .new_inode+0x4c/0xe4
> >> [c000000109b4f9b0] [c0000000002257fc] .ext3_new_inode+0x84/0xb70
> >> [c000000109b4fad0] [c00000000022f1ec] .ext3_mkdir+0x130/0x438
> >> [c000000109b4fbe0] [c00000000017adb4] .vfs_mkdir+0xb8/0x160
> >> [c000000109b4fc80] [c00000000017e52c] .SyS_mkdirat+0xb0/0x114
> >> [c000000109b4fdc0] [c00000000017a730] .SyS_mkdir+0x1c/0x30
> >> [c000000109b4fe30] [c0000000000085b4] syscall_exit+0x0/0x40
> >> Instruction dump:
> >> eb41ffd0 7c0803a6 eb61ffd8 eb81ffe0 eba1ffe8 ebc1fff0 ebe1fff8 4e800020
> >> 38000000 7c691b78 980d0214 800d0008<7d601829> 2c0b0000 40c20010 7c00192d
> >> Oops: Weird page fault, sig: 11 [#2]
> >>
> >> Pls let me know if this back trace would help in analyzing further.
> >> Meanwhile I shall do a git bisect and send the inputs.

The call stack for Badness at kernel/mutex-debug.c:64 (or whatever
explodes first) would be handy. This one seems jumbled still. What
spinlock is in the trace? inode_lock? That would indicate some random
corruption or breakage in the lock debugging.

> >>
> >> Thanks
> >> Divya
> >>
> >>
> >>
> > Hi All,
> >
> > From the git bisect,seems like the commit
> > 57439f878afafefad8836ebf5c49da2a0a746105 is the corrupt for the above
> > issue.

Call me blind but I can't see the problem. Are you sure this commit
breaks it?