LinuxLists.cc - [REGRESSION] nfsd crashing with 3.6.0-rc7 on PowerPC

2012-09-28 01:56:07

Subject: [REGRESSION] nfsd crashing with 3.6.0-rc7 on PowerPC

Howdy,

While running 3.6.0-rcX I am having a few issues with nfsd on my PPC970 based system. For some reason every time I actually end up accessing an NFS share on it, it crashes away at random points. It looks a lot like corrupted pointers in all logs. I also can't reproduce the oopses without nfsd in the game. Doing the same workload that crashes over NFS locally on the box (git clone -ls) works just fine.

The mount was done simply without parameters:

lychee:/home/agraf/release on /abuild/agraf/autotest_e500/lychee type nfs (rw,addr=10.10.1.189)

My exports on the host is also quite simple:

/home/agraf/release *(async,rw)

Below are OOPS excerpts from different rc's I tried. All of them crashed - all the way up to current Linus' master branch. I haven't cross-checked, but I don't remember any such behavior from pre-3.6 releases.

Alex

############################################################################
# 3.6.0-rc1 #
############################################################################

Oops: Kernel access of bad area, sig: 11 [#1]
PREEMPT SMP NR_CPUS=32 Maple
Modules linked in: nfsd autofs4 exportfs binfmt_misc tg3 uninorth_agp agpgart hwmon
NIP: c00000000016a330 LR: c000000000608794 CTR: c00000000063fe78
REGS: c000000071b832f0 TRAP: 0300 Not tainted (3.6.0-rc1-00220-gb645f8b)
MSR: 9000000000009032 <SF,HV,EE,ME,IR,DR,RI> CR: 22002044 XER: 20000000
SOFTE: 1
DAR: 4e375f30f9fae38f, DSISR: 40000000
TASK = c00000007b3a0000[6061] 'nfsd' THREAD: c000000071b80000 CPU: 3
GPR00: 0000000000000000 c000000071b83570 c000000000b8e2a8 0000000000000280
GPR04: 00000000000102d0 c000000000609844 c00000000065418c c000000000ec55e0
GPR08: 000000000036ac63 0000000000477000 c000000000a91758 c000000000a4e5e0
GPR12: 0000000022000048 c00000000fff2680 0000000000010000 0000000000000010
GPR16: 0000000000000000 0000000000000001 00000000000005a8 0000000000000d10
GPR20: 0000000000008000 c000000071d39608 0000000000000d10 00000000000002f0
GPR24: c000000000609844 00000000000102d0 0000000000000280 0000000000000280
GPR28: c000000000609844 4e375f30f9fae38f c000000000adb9c8 c00000007b002b00
NIP [c00000000016a330] .__kmalloc_track_caller+0x120/0x2ac
LR [c000000000608794] .__kmalloc_reserve+0x44/0xbc
Call Trace:
[c000000071b83570] [c00000000016a370] .__kmalloc_track_caller+0x160/0x2ac (unreliable)
[c000000071b83620] [c000000000608794] .__kmalloc_reserve+0x44/0xbc
[c000000071b836c0] [c000000000609844] .__alloc_skb+0xb8/0x1d0
[c000000071b83780] [c00000000065418c] .sk_stream_alloc_skb+0x48/0x154
[c000000071b83810] [c000000000655308] .tcp_sendpage+0x1d0/0x7a0
[c000000071b83920] [c00000000067b864] .inet_sendpage+0x100/0x158
[c000000071b839d0] [c0000000005fb294] .kernel_sendpage+0x7c/0xc8
[c000000071b83a70] [c0000000006b72b4] .svc_send_common+0xc8/0x1a8
[c000000071b83b40] [c0000000006b74c8] .svc_sendto+0x134/0x15c
[c000000071b83c40] [c0000000006b7590] .svc_tcp_sendto+0x3c/0xc0
[c000000071b83cd0] [c0000000006c46dc] .svc_send+0xb0/0x118
[c000000071b83d70] [c0000000006b41a0] .svc_process+0x784/0x7c0
[c000000071b83e40] [d000000003032e34] .nfsd+0x138/0x1ec [nfsd]
[c000000071b83ed0] [c00000000009d050] .kthread+0xb0/0xbc
[c000000071b83f90] [c00000000001e9b0] .kernel_thread+0x54/0x70
Instruction dump:
2bbf0010 41fd000c 7fe3fb78 48000180 e92d0040 e97f0000 7ceb4a14 e9070008
7fab482a 2fbd0000 419e0034 e81f0022 <7f7d002a> 38000000 886d01f2 980d01f2
---[ end trace 9af22fc4dfe9499b ]---

Unable to handle kernel paging request for data at address 0x4e375f30f9fae38f
Faulting instruction address: 0xc00000000016a330
Oops: Kernel access of bad area, sig: 11 [#2]
PREEMPT SMP NR_CPUS=32 Maple
Modules linked in: nfsd autofs4 exportfs binfmt_misc tg3 uninorth_agp agpgart hwmon
NIP: c00000000016a330 LR: c000000000608794 CTR: c0000000006a1b7c
REGS: c000000079e1b300 TRAP: 0300 Tainted: G D (3.6.0-rc1-00220-gb645f8b)
MSR: 9000000000009032 <SF,HV,EE,ME,IR,DR,RI> CR: 24082444 XER: 20000000
SOFTE: 1
DAR: 4e375f30f9fae38f, DSISR: 40000000
TASK = c000000073d80000[10885] 'dhcpcd' THREAD: c000000079e18000 CPU: 3
GPR00: 0000000000000000 c000000079e1b580 c000000000b8e2a8 0000000000000300
GPR04: 00000000000106d0 c000000000609844 c000000000601c60 c000000000ec55e0
GPR08: 000000000036ac63 0000000000477000 c000000000a91758 c000000000a4e5e0
GPR12: 0000000000000000 c00000000fff2680 00000000201e6ad8 c000000079e1b930
GPR16: 0000000000000158 0000000000000000 0000000000000001 0000000000000004
GPR20: c000000079e1b818 c0000000711350f8 c0000000711350c0 0000000000000000
GPR24: c000000000609844 00000000000106d0 0000000000000300 0000000000000300
GPR28: c000000000609844 4e375f30f9fae38f c000000000adb9c8 c00000007b002b00
NIP [c00000000016a330] .__kmalloc_track_caller+0x120/0x2ac
LR [c000000000608794] .__kmalloc_reserve+0x44/0xbc
Call Trace:
[c000000079e1b630] [c000000000608794] .__kmalloc_reserve+0x44/0xbc
[c000000079e1b6d0] [c000000000609844] .__alloc_skb+0xb8/0x1d0
[c000000079e1b790] [c000000000601c60] .sock_alloc_send_pskb+0x124/0x410
[c000000079e1b8c0] [c0000000006a246c] .packet_sendmsg+0x8f0/0xc48
[c000000079e1ba10] [c0000000005fc024] .sock_sendmsg+0xbc/0x100
[c000000079e1bc10] [c0000000005fca54] .SyS_sendto+0xf4/0x13c
[c000000079e1bd90] [c00000000063321c] .compat_sys_socketcall+0x16c/0x288
[c000000079e1be30] [c0000000000098e0] syscall_exit+0x0/0x94
Instruction dump:
2bbf0010 41fd000c 7fe3fb78 48000180 e92d0040 e97f0000 7ceb4a14 e9070008
7fab482a 2fbd0000 419e0034 e81f0022 <7f7d002a> 38000000 886d01f2 980d01f2
---[ end trace 9af22fc4dfe9499c ]---

Unable to handle kernel paging request for data at address 0x4e375f30f9fae38f
Faulting instruction address: 0xc00000000016a330
Oops: Kernel access of bad area, sig: 11 [#3]
PREEMPT SMP NR_CPUS=32 Maple
Modules linked in: nfsd autofs4 exportfs binfmt_misc tg3 uninorth_agp agpgart hwmon
NIP: c00000000016a330 LR: c000000000608794 CTR: 0000000000000000
REGS: c000000079e5b290 TRAP: 0300 Tainted: G D (3.6.0-rc1-00220-gb645f8b)
MSR: 9000000000009032 <SF,HV,EE,ME,IR,DR,RI> CR: 24024244 XER: 20000000
SOFTE: 1
DAR: 4e375f30f9fae38f, DSISR: 40000000
TASK = c000000079bc1a30[4552] 'console-kit-dae' THREAD: c000000079e58000 CPU: 3
GPR00: 0000000000000000 c000000079e5b510 c000000000b8e2a8 0000000000000280
GPR04: 00000000000106d0 c000000000609844 c000000000601c60 c000000000ec55e0
GPR08: 000000000036ac63 0000000000477000 c000000000a91758 c000000000a4e5e0
GPR12: 100000000000d032 c00000000fff2680 0000000010146d80 c000000079e5b8c0
GPR16: 0000000000000094 0000000000000000 0000000000000001 0000000000000004
GPR20: c000000079e5b7a8 c000000077e02af8 c000000077e02ac0 0000000000000000
GPR24: c000000000609844 00000000000106d0 0000000000000280 0000000000000280
GPR28: c000000000609844 4e375f30f9fae38f c000000000adb9c8 c00000007b002b00
NIP [c00000000016a330] .__kmalloc_track_caller+0x120/0x2ac
LR [c000000000608794] .__kmalloc_reserve+0x44/0xbc
Call Trace:
[c000000079e5b510] [c000000079e58080] 0xc000000079e58080 (unreliable)
[c000000079e5b5c0] [c000000000608794] .__kmalloc_reserve+0x44/0xbc
[c000000079e5b660] [c000000000609844] .__alloc_skb+0xb8/0x1d0
[c000000079e5b720] [c000000000601c60] .sock_alloc_send_pskb+0x124/0x410
[c000000079e5b850] [c00000000069e284] .unix_stream_sendmsg+0x190/0x4e0
[c000000079e5b970] [c0000000005fb130] .sock_aio_write+0x16c/0x1a4
[c000000079e5baa0] [c000000000172bcc] .do_sync_readv_writev+0xb4/0x124
[c000000079e5bc40] [c0000000001c9070] .compat_do_readv_writev+0xf4/0x21c
[c000000079e5bd80] [c0000000001c932c] .compat_sys_writev+0x8c/0xe0
[c000000079e5be30] [c0000000000098e0] syscall_exit+0x0/0x94
Instruction dump:
2bbf0010 41fd000c 7fe3fb78 48000180 e92d0040 e97f0000 7ceb4a14 e9070008
7fab482a 2fbd0000 419e0034 e81f0022 <7f7d002a> 38000000 886d01f2 980d01f2
---[ end trace 9af22fc4dfe9499d ]---

Unable to handle kernel paging request for data at address 0x4e375f30f9fae38f
Faulting instruction address: 0xc00000000016a330
Oops: Kernel access of bad area, sig: 11 [#4]
PREEMPT SMP NR_CPUS=32 Maple
Modules linked in: nfsd autofs4 exportfs binfmt_misc tg3 uninorth_agp agpgart hwmon
NIP: c00000000016a330 LR: c000000000608794 CTR: c00000000063fe78
REGS: c000000077e2b470 TRAP: 0300 Tainted: G D (3.6.0-rc1-00220-gb645f8b)
MSR: 9000000000009032 <SF,HV,EE,ME,IR,DR,RI> CR: 24000084 XER: 20000000
SOFTE: 1
DAR: 4e375f30f9fae38f, DSISR: 40000000
TASK = c000000072df1a30[24367] 'kworker/3:2' THREAD: c000000077e28000 CPU: 3
GPR00: 0000000000000000 c000000077e2b6f0 c000000000b8e2a8 0000000000000280
GPR04: 0000000000010220 c000000000609844 c000000000665550 c000000000ec55e0
GPR08: 000000000036ac63 0000000000477000 c000000000a91758 c000000000a4e5e0
GPR12: c00000007b0c7c70 c00000000fff2680 0000000000051b63 0000000000051a41
GPR16: 0000000000051b5b 0000000000d00000 c00000000090f038 c00000000092eca7
GPR20: 0000000000000001 c000000077e2be58 c000000000ec6c05 0000000000000000
GPR24: c000000000609844 0000000000010220 0000000000000280 0000000000000280
GPR28: c000000000609844 4e375f30f9fae38f c000000000adb9c8 c00000007b002b00
NIP [c00000000016a330] .__kmalloc_track_caller+0x120/0x2ac
LR [c000000000608794] .__kmalloc_reserve+0x44/0xbc
Call Trace:
[c000000077e2b6f0] [c000000077e2b770] 0xc000000077e2b770 (unreliable)
[c000000077e2b7a0] [c000000000608794] .__kmalloc_reserve+0x44/0xbc
[c000000077e2b840] [c000000000609844] .__alloc_skb+0xb8/0x1d0
[c000000077e2b900] [c000000000665550] .tcp_send_fin+0x90/0x210
[c000000077e2b990] [c000000000652f84] .tcp_close+0x260/0x4f4
[c000000077e2ba30] [c00000000067c1fc] .inet_release+0xd0/0xf4
[c000000077e2bac0] [c0000000005ff0d0] .sock_release+0x4c/0xf0
[c000000077e2bb50] [c0000000006ac94c] .xs_reset_transport+0x9c/0xc4
[c000000077e2bbf0] [c0000000006ac98c] .xs_close+0x18/0x88
[c000000077e2bc80] [c0000000006a9500] .xprt_autoclose+0x40/0xbc
[c000000077e2bd10] [c000000000094360] .process_one_work+0x34c/0x56c
[c000000077e2bdd0] [c000000000094b88] .worker_thread+0x32c/0x4b8
[c000000077e2bed0] [c00000000009d050] .kthread+0xb0/0xbc
[c000000077e2bf90] [c00000000001e9b0] .kernel_thread+0x54/0x70
Instruction dump:
2bbf0010 41fd000c 7fe3fb78 48000180 e92d0040 e97f0000 7ceb4a14 e9070008
7fab482a 2fbd0000 419e0034 e81f0022 <7f7d002a> 38000000 886d01f2 980d01f2
---[ end trace 9af22fc4dfe9499e ]---

Unable to handle kernel paging request for data at address 0xfffffffffffffff8
Faulting instruction address: 0xc00000000009c9e8
Oops: Kernel access of bad area, sig: 11 [#5]
PREEMPT SMP NR_CPUS=32 Maple
Modules linked in: nfsd autofs4 exportfs binfmt_misc tg3 uninorth_agp agpgart hwmon
NIP: c00000000009c9e8 LR: c000000000097cf0 CTR: c0000000000b4ea0
REGS: c000000077e2ac70 TRAP: 0300 Tainted: G D (3.6.0-rc1-00220-gb645f8b)
MSR: 9000000000009032 <SF,HV,EE,ME,IR,DR,RI> CR: 44000042 XER: 00000000
SOFTE: 0
DAR: fffffffffffffff8, DSISR: 40010000
TASK = c000000072df1a30[24367] 'kworker/3:2' THREAD: c000000077e28000 CPU: 3
GPR00: c0000000006d1c3c c000000077e2aef0 c000000000b8e2a8 c000000072df1a30
GPR04: 0000000000000003 0000000000000001 0000000000000000 c000000000a50370
GPR08: 000000898953db8d 0000000000000000 0000000000000000 0000000000000001
GPR12: c00000007b0c7c70 c00000000fff2680 0000000000051b63 0000000000051a41
GPR16: 0000000000051b5b 0000000000000001 0000000000000004 c000000077e2aff0
GPR20: c00000000007b7a0 c000000077e28080 0000000000000000 c000000077e28000
GPR24: 0000000000000003 c000000000ec4800 c000000072df1cd0 c000000072df1db8
GPR28: 0000000000000003 c000000000ec4800 c000000000ad6338 c000000072df1a30
NIP [c00000000009c9e8] .kthread_data+0x4/0xc
LR [c000000000097cf0] .wq_worker_sleeping+0x20/0xd0
Call Trace:
[c000000077e2aef0] [c0000000000afac4] .dequeue_task+0xd0/0xf4 (unreliable)
[c000000077e2af80] [c0000000006d1c3c] .__schedule+0x1c4/0x888
[c000000077e2b1d0] [c00000000007b7a0] .do_exit+0x8b4/0x8c0
[c000000077e2b2d0] [c00000000001c310] .die+0x3b4/0x3e4
[c000000077e2b380] [c000000000033014] .bad_page_fault+0xf0/0x10c
[c000000077e2b400] [c000000000005c04] handle_page_fault+0x2c/0x30
--- Exception: 300 at .__kmalloc_track_caller+0x120/0x2ac
LR = .__kmalloc_reserve+0x44/0xbc
[c000000077e2b6f0] [c000000077e2b770] 0xc000000077e2b770 (unreliable)
[c000000077e2b7a0] [c000000000608794] .__kmalloc_reserve+0x44/0xbc
[c000000077e2b840] [c000000000609844] .__alloc_skb+0xb8/0x1d0
[c000000077e2b900] [c000000000665550] .tcp_send_fin+0x90/0x210
[c000000077e2b990] [c000000000652f84] .tcp_close+0x260/0x4f4
[c000000077e2ba30] [c00000000067c1fc] .inet_release+0xd0/0xf4
[c000000077e2bac0] [c0000000005ff0d0] .sock_release+0x4c/0xf0
[c000000077e2bb50] [c0000000006ac94c] .xs_reset_transport+0x9c/0xc4
[c000000077e2bbf0] [c0000000006ac98c] .xs_close+0x18/0x88
[c000000077e2bc80] [c0000000006a9500] .xprt_autoclose+0x40/0xbc
[c000000077e2bd10] [c000000000094360] .process_one_work+0x34c/0x56c
[c000000077e2bdd0] [c000000000094b88] .worker_thread+0x32c/0x4b8
[c000000077e2bed0] [c00000000009d050] .kthread+0xb0/0xbc
[c000000077e2bf90] [c00000000001e9b0] .kernel_thread+0x54/0x70
Instruction dump:
e8010010 eb61ffd8 eb81ffe0 eba1ffe8 ebe1fff8 7c0803a6 4e800020 e92d01c8
e9290340 e869fff2 4e800020 e9230340 <e869fff8> 4e800020 38600000 4e800020
---[ end trace 9af22fc4dfe9499f ]---

Fixing recursive fault but reboot is needed!

############################################################################
# 3.6.0-rc3 #
############################################################################

Oops: Kernel access of bad area, sig: 11 [#1]
PREEMPT SMP NR_CPUS=32 Maple
Modules linked in: nfsd autofs4 exportfs binfmt_misc tg3 uninorth_agp agpgart hwmon
NIP: c00000000016a570 LR: c000000000608fe4 CTR: c000000000640bc4
REGS: c0000000787e7170 TRAP: 0300 Not tainted (3.6.0-rc3-00363-g7d3c5fb-dirty)
MSR: 9000000000009032 <SF,HV,EE,ME,IR,DR,RI> CR: 22002044 XER: 20000000
SOFTE: 1
DAR: 4146dc88b8105444, DSISR: 40000000
TASK = c000000078613460[6321] 'nfsd' THREAD: c0000000787e4000 CPU: 2
GPR00: 0000000000000000 c0000000787e73f0 c000000000b90428 0000000000000280
GPR04: 0000000000010220 c00000000060a094 c000000000654f74 c000000000e855e0
GPR08: 000000000016e882 0000000000435000 c000000000a93758 c000000000a505e0
GPR12: 0000000042002042 c00000000fff1f00 0000000000000000 0000000000000000
GPR16: fffffffffffffff6 0000000000000000 0000000000000001 0000000000010000
GPR20: 00000000000005a8 0000000000000003 c00000007b755bf8 c00000007b755c08
GPR24: c00000000060a094 0000000000010220 0000000000000280 0000000000000280
GPR28: c00000000060a094 4146dc88b8105444 c000000000add9f8 c00000007b002b00
NIP [c00000000016a570] .__kmalloc_track_caller+0x120/0x2ac
LR [c000000000608fe4] .__kmalloc_reserve+0x44/0xbc
Call Trace:
[c0000000787e74a0] [c000000000608fe4] .__kmalloc_reserve+0x44/0xbc
[c0000000787e7540] [c00000000060a094] .__alloc_skb+0xb8/0x1d0
[c0000000787e7600] [c000000000654f74] .sk_stream_alloc_skb+0x48/0x154
[c0000000787e7690] [c0000000006660a4] .tcp_write_xmit+0x89c/0xb8c
[c0000000787e7790] [c000000000666408] .__tcp_push_pending_frames+0x40/0xb0
[c0000000787e7810] [c000000000656624] .tcp_sendpage+0x704/0x7a0
[c0000000787e7920] [c00000000067c744] .inet_sendpage+0x100/0x158
[c0000000787e79d0] [c0000000005fbad0] .kernel_sendpage+0x7c/0xc8
[c0000000787e7a70] [c0000000006b81f0] .svc_send_common+0xc8/0x1a8
[c0000000787e7b40] [c0000000006b8404] .svc_sendto+0x134/0x15c
[c0000000787e7c40] [c0000000006b84cc] .svc_tcp_sendto+0x3c/0xc0
[c0000000787e7cd0] [c0000000006c55f8] .svc_send+0xb8/0x128
[c0000000787e7d70] [c0000000006b50e4] .svc_process+0x784/0x7c0
[c0000000787e7e40] [d000000003033e34] .nfsd+0x138/0x1ec [nfsd]
[c0000000787e7ed0] [c00000000009d090] .kthread+0xb0/0xbc
[c0000000787e7f90] [c00000000001e9b0] .kernel_thread+0x54/0x70
Instruction dump:
2bbf0010 41fd000c 7fe3fb78 48000180 e92d0040 e97f0000 7ceb4a14 e9070008
7fab482a 2fbd0000 419e0034 e81f0022 <7f7d002a> 38000000 886d01f2 980d01f2
---[ end trace 64544462a57c5991 ]---

Unable to handle kernel paging request for data at address 0x4146dc88b8105444
Faulting instruction address: 0xc00000000016acc8
Oops: Kernel access of bad area, sig: 11 [#2]
PREEMPT SMP NR_CPUS=32 Maple
Modules linked in: nfsd autofs4 exportfs binfmt_misc tg3 uninorth_agp agpgart hwmon
NIP: c00000000016acc8 LR: c0000000001afe14 CTR: 0000000000000000
REGS: c00000007836eb50 TRAP: 0300 Tainted: G D (3.6.0-rc3-00363-g7d3c5fb-dirty)
MSR: 9000000000009032 <SF,HV,EE,ME,IR,DR,RI> CR: 24000084 XER: 20000000
SOFTE: 1
DAR: 4146dc88b8105444, DSISR: 40000000
TASK = c000000078eace90[6322] 'nfsd' THREAD: c00000007836c000 CPU: 2
GPR00: 0000000000000000 c00000007836edd0 c000000000b90428 c00000007b002b00
GPR04: 0000000000011280 c00000007836ef80 c00000007b004860 c000000000e855e0
GPR08: 000000000016e882 0000000000435000 c000000000a93758 c000000000a505e0
GPR12: 0000000024000084 c00000000fff1f00 c00000007836f298 c000000000b2d0d8
GPR16: 0000000000000000 c00000007836f150 000000000000000c c00000007a2980e0
GPR20: c00000007a550000 00000000008473f0 0000000000001000 000000000000000c
GPR24: c0000000001afe14 0000000000011280 00000000000000d0 c00000007b002b00
GPR28: c00000007b004860 4146dc88b8105444 c000000000add9f8 c00000007836ef80
NIP [c00000000016acc8] .kmem_cache_alloc+0x5c/0x1ec
LR [c0000000001afe14] .bvec_alloc_bs+0xe0/0x124
Call Trace:
[c00000007836edd0] [c00000007836ee60] 0xc00000007836ee60 (unreliable)
[c00000007836ee80] [c0000000001afe14] .bvec_alloc_bs+0xe0/0x124
[c00000007836ef10] [c0000000001afef4] .bio_alloc_bioset+0x9c/0x114
[c00000007836efc0] [c0000000001afffc] .bio_alloc+0x24/0x48
[c00000007836f040] [c0000000001b6260] .mpage_alloc+0x3c/0xbc
[c00000007836f0e0] [c0000000001b70e0] .do_mpage_readpage+0x508/0x668
[c00000007836f220] [c0000000001b73bc] .mpage_readpages+0xfc/0x198
[c00000007836f380] [c0000000002260dc] .ext3_readpages+0x28/0x40
[c00000007836f400] [c00000000012e544] .__do_page_cache_readahead+0x1b8/0x2c4
[c00000007836f520] [c00000000012e678] .ra_submit+0x28/0x38
[c00000007836f590] [c0000000001a62c0] .__generic_file_splice_read+0x260/0x4d4
[c00000007836f840] [c0000000001a6590] .generic_file_splice_read+0x5c/0xac
[c00000007836f8d0] [c0000000001a4544] .do_splice_to+0xb0/0xdc
[c00000007836f970] [c0000000001a4e68] .splice_direct_to_actor+0xd0/0x230
[c00000007836fa50] [d000000003038f0c] .nfsd_vfs_read+0xa4/0x1b0 [nfsd]
[c00000007836fb30] [d000000003039a7c] .nfsd_read+0x290/0x35c [nfsd]
[c00000007836fc30] [d000000003041f38] .nfsd3_proc_read+0xd4/0x12c [nfsd]
[c00000007836fcd0] [d000000003033528] .nfsd_dispatch+0xd4/0x1fc [nfsd]
[c00000007836fd70] [c0000000006b4e54] .svc_process+0x4f4/0x7c0
[c00000007836fe40] [d000000003033e34] .nfsd+0x138/0x1ec [nfsd]
[c00000007836fed0] [c00000000009d090] .kthread+0xb0/0xbc
[c00000007836ff90] [c00000000001e9b0] .kernel_thread+0x54/0x70
Instruction dump:
f821ff51 ebc2b608 7c7b1b78 7c992378 e92d0040 e97b0000 7ceb4a14 e9070008
7fab482a 2fbd0000 419e0034 e81b0022 <7f9d002a> 38000000 886d01f2 980d01f2
---[ end trace 64544462a57c5992 ]---

Unable to handle kernel paging request for data at address 0x4146dc88b8105444
Faulting instruction address: 0xc00000000016a570
Oops: Kernel access of bad area, sig: 11 [#3]
PREEMPT SMP NR_CPUS=32 Maple
Modules linked in: nfsd autofs4 exportfs binfmt_misc tg3 uninorth_agp agpgart hwmon
NIP: c00000000016a570 LR: c000000000608fe4 CTR: c0000000006a2ab8
REGS: c000000077fe3300 TRAP: 0300 Tainted: G D (3.6.0-rc3-00363-g7d3c5fb-dirty)
MSR: 9000000000009032 <SF,HV,EE,ME,IR,DR,RI> CR: 24082444 XER: 20000000
SOFTE: 1
DAR: 4146dc88b8105444, DSISR: 40000000
TASK = c000000077ccb460[7508] 'dhcpcd' THREAD: c000000077fe0000 CPU: 2
GPR00: 0000000000000000 c000000077fe3580 c000000000b90428 0000000000000300
GPR04: 00000000000106d0 c00000000060a094 c0000000006024a4 c000000000e855e0
GPR08: 000000000016e882 0000000000435000 c000000000a93758 c000000000a505e0
GPR12: 0000000000000000 c00000000fff1f00 0000000020107ad8 c000000077fe3930
GPR16: 0000000000000158 0000000000000000 0000000000000001 0000000000000004
GPR20: c000000077fe3818 c00000007869e0f8 c00000007869e0c0 0000000000000000
GPR24: c00000000060a094 00000000000106d0 0000000000000300 0000000000000300
GPR28: c00000000060a094 4146dc88b8105444 c000000000add9f8 c00000007b002b00
NIP [c00000000016a570] .__kmalloc_track_caller+0x120/0x2ac
LR [c000000000608fe4] .__kmalloc_reserve+0x44/0xbc
Call Trace:
[c000000077fe3630] [c000000000608fe4] .__kmalloc_reserve+0x44/0xbc
[c000000077fe36d0] [c00000000060a094] .__alloc_skb+0xb8/0x1d0
[c000000077fe3790] [c0000000006024a4] .sock_alloc_send_pskb+0x124/0x410
[c000000077fe38c0] [c0000000006a33a8] .packet_sendmsg+0x8f0/0xc48
[c000000077fe3a10] [c0000000005fc860] .sock_sendmsg+0xbc/0x100
[c000000077fe3c10] [c0000000005fd290] .SyS_sendto+0xf4/0x13c
[c000000077fe3d90] [c000000000633f08] .compat_sys_socketcall+0x16c/0x288
[c000000077fe3e30] [c0000000000098e0] syscall_exit+0x0/0x94
Instruction dump:
2bbf0010 41fd000c 7fe3fb78 48000180 e92d0040 e97f0000 7ceb4a14 e9070008
7fab482a 2fbd0000 419e0034 e81f0022 <7f7d002a> 38000000 886d01f2 980d01f2
---[ end trace 64544462a57c5993 ]---

Unable to handle kernel paging request for data at address 0x4146dc88b8105444
Faulting instruction address: 0xc00000000016a570
Oops: Kernel access of bad area, sig: 11 [#4]
PREEMPT SMP NR_CPUS=32 Maple
Modules linked in: nfsd autofs4 exportfs binfmt_misc tg3 uninorth_agp agpgart hwmon
NIP: c00000000016a570 LR: c000000000608fe4 CTR: 0000000000000000
REGS: c000000077f4b290 TRAP: 0300 Tainted: G D (3.6.0-rc3-00363-g7d3c5fb-dirty)
MSR: 9000000000009032 <SF,HV,EE,ME,IR,DR,RI> CR: 24024244 XER: 20000000
SOFTE: 1
DAR: 4146dc88b8105444, DSISR: 40000000
TASK = c00000007b348000[4536] 'console-kit-dae' THREAD: c000000077f48000 CPU: 2
GPR00: 0000000000000000 c000000077f4b510 c000000000b90428 0000000000000280
GPR04: 00000000000106d0 c00000000060a094 c0000000006024a4 c000000000e855e0
GPR08: 000000000016e882 0000000000435000 c000000000a93758 c000000000a505e0
GPR12: 100000000000d032 c00000000fff1f00 00000000102f4db0 c000000077f4b8c0
GPR16: 0000000000000094 0000000000000000 0000000000000001 0000000000000004
GPR20: c000000077f4b7a8 c0000000782a0ef8 c0000000782a0ec0 0000000000000000
GPR24: c00000000060a094 00000000000106d0 0000000000000280 0000000000000280
GPR28: c00000000060a094 4146dc88b8105444 c000000000add9f8 c00000007b002b00
NIP [c00000000016a570] .__kmalloc_track_caller+0x120/0x2ac
LR [c000000000608fe4] .__kmalloc_reserve+0x44/0xbc
Call Trace:
[c000000077f4b510] [c000000077f48080] 0xc000000077f48080 (unreliable)
[c000000077f4b5c0] [c000000000608fe4] .__kmalloc_reserve+0x44/0xbc
[c000000077f4b660] [c00000000060a094] .__alloc_skb+0xb8/0x1d0
[c000000077f4b720] [c0000000006024a4] .sock_alloc_send_pskb+0x124/0x410
[c000000077f4b850] [c00000000069f1e8] .unix_stream_sendmsg+0x190/0x4e0
[c000000077f4b970] [c0000000005fb96c] .sock_aio_write+0x16c/0x1a4
[c000000077f4baa0] [c000000000172e40] .do_sync_readv_writev+0xb4/0x124
[c000000077f4bc40] [c0000000001c9354] .compat_do_readv_writev+0xf4/0x21c
[c000000077f4bd80] [c0000000001c9618] .compat_sys_writev+0x94/0xf0
[c000000077f4be30] [c0000000000098e0] syscall_exit+0x0/0x94
Instruction dump:
2bbf0010 41fd000c 7fe3fb78 48000180 e92d0040 e97f0000 7ceb4a14 e9070008
7fab482a 2fbd0000 419e0034 e81f0022 <7f7d002a> 38000000 886d01f2 980d01f2
---[ end trace 64544462a57c5994 ]---

Unable to handle kernel paging request for data at address 0x4146dc88b8105444
Faulting instruction address: 0xc00000000016acc8
Oops: Kernel access of bad area, sig: 11 [#5]
PREEMPT SMP NR_CPUS=32 Maple
Modules linked in: nfsd autofs4 exportfs binfmt_misc tg3 uninorth_agp agpgart hwmon
NIP: c00000000016acc8 LR: c000000000072738 CTR: 0000000000000000
REGS: c00000007b0ab3a0 TRAP: 0300 Tainted: G D (3.6.0-rc3-00363-g7d3c5fb-dirty)
MSR: 9000000000009032 <SF,HV,EE,ME,IR,DR,RI> CR: 24002024 XER: 00000000
SOFTE: 1
DAR: 4146dc88b8105444, DSISR: 40000000
TASK = c00000007b099a30[6] 'kworker/u:0' THREAD: c00000007b0a8000 CPU: 2
GPR00: 0000000000000000 c00000007b0ab620 c000000000b90428 c00000007b002b00
GPR04: 00000000000080d0 0000000000000000 0000000000000000 c000000000e855e0
GPR08: 000000000016e882 0000000000435000 c000000000a93758 c000000000a505e0
GPR12: 0000000084002022 c00000000fff1f00 0000000000051b63 0000000000051a41
GPR16: 0000000000000000 0000000000000000 c00000007b0ab990 c00000007b0abc10
GPR20: 0000000000000000 c0000000000134f4 c0000000780bd140 0000000000000000
GPR24: c000000000072738 00000000000080d0 c000000000cb7b80 c00000007b002b00
GPR28: c0000000780bce90 4146dc88b8105444 c000000000add9f8 c000000000a61468
NIP [c00000000016acc8] .kmem_cache_alloc+0x5c/0x1ec
LR [c000000000072738] .copy_process+0x634/0x1114
Call Trace:
[c00000007b0ab620] [c00000000016ad48] .kmem_cache_alloc+0xdc/0x1ec (unreliable)
[c00000007b0ab6d0] [c000000000072738] .copy_process+0x634/0x1114
[c00000007b0ab7d0] [c000000000073510] .do_fork+0x118/0x400
[c00000007b0ab8b0] [c000000000012d14] .sys_clone+0x58/0x70
[c00000007b0ab920] [c000000000009b08] .ppc_clone+0x8/0xc
--- Exception: c00 at .kernel_thread+0x28/0x70
LR = .__call_usermodehelper+0x50/0x11c
[c00000007b0abc10] [c00000007b0abc90] 0xc00000007b0abc90 (unreliable)
[c00000007b0abc80] [c00000007b0abd10] 0xc00000007b0abd10
[c00000007b0abd10] [c000000000094370] .process_one_work+0x34c/0x56c
[c00000007b0abdd0] [c000000000094b98] .worker_thread+0x32c/0x4b8
[c00000007b0abed0] [c00000000009d090] .kthread+0xb0/0xbc
[c00000007b0abf90] [c00000000001e9b0] .kernel_thread+0x54/0x70
Instruction dump:
f821ff51 ebc2b608 7c7b1b78 7c992378 e92d0040 e97b0000 7ceb4a14 e9070008
7fab482a 2fbd0000 419e0034 e81b0022 <7f9d002a> 38000000 886d01f2 980d01f2
---[ end trace 64544462a57c5995 ]---

Unable to handle kernel paging request for data at address 0xfffffffffffffff8

############################################################################
# 3.6.0-rc5 #
############################################################################

Oops: Kernel access of bad area, sig: 11 [#1]
PREEMPT SMP NR_CPUS=32 Maple
Modules linked in: nfsd autofs4 exportfs binfmt_misc tg3 hwmon uninorth_agp agpgart
NIP: c0000000001b6a40 LR: c0000000001ade18 CTR: c0000000001b69b0
REGS: c00000000ffdf600 TRAP: 0300 Not tainted (3.6.0-rc5-00153-g518cd6a)
MSR: 9000000000009032 <SF,HV,EE,ME,IR,DR,RI> CR: 28ff4f48 XER: 00000000
SOFTE: 1
DAR: e0d668aeabc0e0eb, DSISR: 40000000
TASK = c00000007b0bb460[0] 'swapper/3' THREAD: c00000007b0c4000 CPU: 3
GPR00: 0000000000000000 c00000000ffdf880 c000000000b90460 e0d668aeabc0e0eb
GPR04: 0000000000000000 0000000000040000 0000000000040000 000000000000ba41
GPR08: 0000000000000200 0000000000000000 0000000000000001 c0000000786d3800
GPR12: 0000000048ffff48 c00000000fff2680 c00000007b0c7f90 0000000000000000
GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000200040
GPR20: c000000000a4f680 0000000000000003 c00000007872b170 0000000000000000
GPR24: 0000000000000000 0000000000040000 0000000002000000 0000000000000002
GPR28: 0000000000000008 c00000007929ce00 c000000000ae86f0 c0000000786d3be0
NIP [c0000000001b6a40] .mpage_end_io+0x90/0x150
LR [c0000000001ade18] .bio_endio+0x64/0x78
Call Trace:
[c00000000ffdf880] [c00000007872b170] 0xc00000007872b170 (unreliable)
[c00000000ffdf920] [c0000000001ade18] .bio_endio+0x64/0x78
[c00000000ffdf990] [c0000000003617b4] .req_bio_endio+0xe0/0x104
[c00000000ffdfa20] [c000000000361a64] .blk_update_request+0x28c/0x568
[c00000000ffdfae0] [c000000000361d68] .blk_update_bidi_request+0x28/0x98
[c00000000ffdfb70] [c00000000036396c] .blk_end_bidi_request+0x2c/0x8c
[c00000000ffdfc10] [c0000000004944e8] .scsi_io_completion+0x22c/0x5b8
[c00000000ffdfce0] [c00000000048c258] .scsi_finish_command+0xfc/0x120
[c00000000ffdfd80] [c0000000004949b4] .scsi_softirq_done+0x138/0x15c
[c00000000ffdfe10] [c00000000036a3ac] .blk_done_softirq+0xc4/0xf8
[c00000000ffdfeb0] [c00000000007ed50] .__do_softirq+0x1c8/0x398
[c00000000ffdff90] [c00000000001e6c8] .call_do_softirq+0x14/0x24
[c00000007b0c78f0] [c00000000000f51c] .do_softirq+0xac/0x154
[c00000007b0c7990] [c00000000007e7e4] .irq_exit+0x94/0x14c
[c00000007b0c7a20] [c00000000000f0b8] .do_IRQ+0x278/0x340
[c00000007b0c7ae0] [c000000000003b78] hardware_interrupt_common+0x178/0x180
--- Exception: 501 at .arch_local_irq_restore+0x74/0x98
LR = .arch_local_irq_restore+0x74/0x98
[c00000007b0c7dd0] [c00000000000f438] .arch_local_irq_restore+0x60/0x98 (unreliable)
[c00000007b0c7e40] [c000000000015144] .cpu_idle+0x18c/0x2bc
[c00000007b0c7ee0] [c0000000006da7ac] .start_secondary+0x378/0x384
[c00000007b0c7f90] [c0000000000093f0] .start_secondary_prolog+0x10/0x14
Instruction dump:
3bfffff0 7fbf0040 419c0014 e81f0000 2fa00000 419e0008 7c0001ec e81d0020
780907e1 4082004c 4192001c 7c2004ac <7c0018a8> 7c00e378 7c0019ad 40c2fff4
---[ end trace 21d19d3ed703ad1c ]---

Kernel panic - not syncing: Fatal exception in interrupt
Rebooting in 9 seconds..

############################################################################
# 3.6.0-rc7 #
############################################################################

Oops: Kernel access of bad area, sig: 11 [#1]
PREEMPT SMP NR_CPUS=32 Maple
Modules linked in: nfsd autofs4 exportfs binfmt_misc uninorth_agp tg3 agpgart hwmon
NIP: c00000000016a5bc LR: c00000000060910c CTR: c000000000640ca8
REGS: c00000007918b170 TRAP: 0300 Not tainted (3.6.0-rc7-00128-ge333037)
MSR: 9000000000009032 <SF,HV,EE,ME,IR,DR,RI> CR: 22002048 XER: 20000000
SOFTE: 1
DAR: ce66ff186f457e7b, DSISR: 40010000
TASK = c000000079f18000[5953] 'nfsd' THREAD: c000000079188000 CPU: 3
GPR00: 0000000000000000 c00000007918b3f0 c000000000b914b0 0000000000000280
GPR04: 0000000000010220 c00000000060a1bc c000000000654f30 c000000000ec55e0
GPR08: 0000000000035f23 0000000000474000 c000000000a94758 c000000000a515e0
GPR12: 0000000042002048 c00000000fff2680 0000000000000000 0000000000000000
GPR16: fffffffffffffff6 0000000000000000 0000000000000001 0000000000010000
GPR20: 00000000000005a8 0000000000000003 c00000007b5962f8 c00000007b596308
GPR24: c00000000060a1bc 0000000000010220 0000000000000280 0000000000000280
GPR28: c00000000060a1bc ce66ff186f457e7b c000000000ade9d0 c00000007b002b00
NIP [c00000000016a5bc] .__kmalloc_track_caller+0x120/0x2ac
LR [c00000000060910c] .__kmalloc_reserve+0x44/0xbc
Call Trace:
[c00000007918b3f0] [c00000000016a63c] .__kmalloc_track_caller+0x1a0/0x2ac (unreliable)
[c00000007918b4a0] [c00000000060910c] .__kmalloc_reserve+0x44/0xbc
[c00000007918b540] [c00000000060a1bc] .__alloc_skb+0xb8/0x1d0
[c00000007918b600] [c000000000654f30] .sk_stream_alloc_skb+0x48/0x154
[c00000007918b690] [c000000000666070] .tcp_write_xmit+0x89c/0xb8c
[c00000007918b790] [c0000000006663d4] .__tcp_push_pending_frames+0x40/0xb0
[c00000007918b810] [c0000000006565e0] .tcp_sendpage+0x704/0x7a0
[c00000007918b920] [c00000000067c788] .inet_sendpage+0x100/0x158
[c00000007918b9d0] [c0000000005fbbac] .kernel_sendpage+0x7c/0xc8
[c00000007918ba70] [c0000000006b8328] .svc_send_common+0xc8/0x1a8
[c00000007918bb40] [c0000000006b853c] .svc_sendto+0x134/0x15c
[c00000007918bc40] [c0000000006b8604] .svc_tcp_sendto+0x3c/0xc0
[c00000007918bcd0] [c0000000006c5730] .svc_send+0xb8/0x128
[c00000007918bd70] [c0000000006b521c] .svc_process+0x784/0x7c0
[c00000007918be40] [d000000003033e34] .nfsd+0x138/0x1ec [nfsd]
[c00000007918bed0] [c00000000009d040] .kthread+0xb0/0xbc
[c00000007918bf90] [c00000000001e9d0] .kernel_thread+0x54/0x70
Instruction dump:
2bbf0010 41fd000c 7fe3fb78 48000180 e92d0040 e97f0000 7ceb4a14 e9070008
7fab482a 2fbd0000 419e0034 e81f0022 <7f7d002a> 38000000 886d01f2 980d01f2
---[ end trace fe59661bbe31413f ]---

Unable to handle kernel paging request for data at address 0xce66ff186f457e7b
Faulting instruction address: 0xc00000000016a5bc
Oops: Kernel access of bad area, sig: 11 [#2]
PREEMPT SMP NR_CPUS=32 Maple
Modules linked in: nfsd autofs4 exportfs binfmt_misc uninorth_agp tg3 agpgart hwmon
NIP: c00000000016a5bc LR: c00000000060910c CTR: c0000000006a2b68
REGS: c00000007b6c3300 TRAP: 0300 Tainted: G D (3.6.0-rc7-00128-ge333037)
MSR: 9000000000009032 <SF,HV,EE,ME,IR,DR,RI> CR: 24082448 XER: 20000000
SOFTE: 1
DAR: ce66ff186f457e7b, DSISR: 40010000
TASK = c0000000797a3460[5195] 'dhcpcd' THREAD: c00000007b6c0000 CPU: 3
GPR00: 0000000000000000 c00000007b6c3580 c000000000b914b0 0000000000000300
GPR04: 00000000000106d0 c00000000060a1bc c000000000602580 c000000000ec55e0
GPR08: 0000000000035f63 0000000000474000 c000000000a94758 c000000000a515e0
GPR12: 0000000000000000 c00000000fff2680 0000000020170ad8 c00000007b6c3930
GPR16: 0000000000000158 0000000000000000 0000000000000001 0000000000000004
GPR20: c00000007b6c3818 c0000000782c88f8 c0000000782c88c0 0000000000000000
GPR24: c00000000060a1bc 00000000000106d0 0000000000000300 0000000000000300
GPR28: c00000000060a1bc ce66ff186f457e7b c000000000ade9d0 c00000007b002b00
NIP [c00000000016a5bc] .__kmalloc_track_caller+0x120/0x2ac
LR [c00000000060910c] .__kmalloc_reserve+0x44/0xbc
Call Trace:
[c00000007b6c3630] [c00000000060910c] .__kmalloc_reserve+0x44/0xbc
[c00000007b6c36d0] [c00000000060a1bc] .__alloc_skb+0xb8/0x1d0
[c00000007b6c3790] [c000000000602580] .sock_alloc_send_pskb+0x124/0x410
[c00000007b6c38c0] [c0000000006a3458] .packet_sendmsg+0x8f0/0xc48
[c00000007b6c3a10] [c0000000005fc93c] .sock_sendmsg+0xbc/0x100
[c00000007b6c3c10] [c0000000005fd36c] .SyS_sendto+0xf4/0x13c
[c00000007b6c3d90] [c000000000633fc4] .compat_sys_socketcall+0x16c/0x288
[c00000007b6c3e30] [c0000000000098e0] syscall_exit+0x0/0x94
Instruction dump:
2bbf0010 41fd000c 7fe3fb78 48000180 e92d0040 e97f0000 7ceb4a14 e9070008
7fab482a 2fbd0000 419e0034 e81f0022 <7f7d002a> 38000000 886d01f2 980d01f2
---[ end trace fe59661bbe314140 ]---

2012-09-28 15:10:51

by J. Bruce Fields

[permalink] [raw]

Subject: Re: [REGRESSION] nfsd crashing with 3.6.0-rc7 on PowerPC

On Fri, Sep 28, 2012 at 04:19:55AM +0200, Alexander Graf wrote:
>
> On 28.09.2012, at 04:04, Linus Torvalds wrote:
>
> > On Thu, Sep 27, 2012 at 6:55 PM, Alexander Graf <[email protected]> wrote:
> >>
> >> Below are OOPS excerpts from different rc's I tried. All of them crashed - all the way up to current Linus' master branch. I haven't cross-checked, but I don't remember any such behavior from pre-3.6 releases.
> >
> > Since you seem to be able to reproduce it easily (and apparently
> > reliably), any chance you could just bisect it?
> >
> > Since I assume v3.5 is fine, and apparently -rc1 is already busted, a simple
> >
> > git bisect start
> > git bisect good v3.5
> > git bisect bad v3.6-rc1
> >
> > will get you started on your adventure..
>
> Heh, will give it a try :). The thing really does look quite bisectable.
>
>
> It might take a few hours though - the machine isn't exactly fast by today's standards and it's getting late here. But I'll keep you updated.

I doubt it's anything special about that workload, but just for kicks I
tried a "git clone -ls" (cloning my linux tree to another directory on
the same nfs filesystem), with server on 3.6.0-rc7, and didn't see
anything interesting (just an xfs lockdep warning that looks like this
one jlayton already reported:
http://oss.sgi.com/archives/xfs/2012-09/msg00088.html
)

Any (even partial) bisection results would certainly be useful, thanks.

--b.

2012-09-28 02:20:01

by Alexander Graf

[permalink] [raw]

Subject: Re: [REGRESSION] nfsd crashing with 3.6.0-rc7 on PowerPC

On 28.09.2012, at 04:04, Linus Torvalds wrote:

> On Thu, Sep 27, 2012 at 6:55 PM, Alexander Graf <[email protected]> wrote:
>>
>> Below are OOPS excerpts from different rc's I tried. All of them crashed - all the way up to current Linus' master branch. I haven't cross-checked, but I don't remember any such behavior from pre-3.6 releases.
>
> Since you seem to be able to reproduce it easily (and apparently
> reliably), any chance you could just bisect it?
>
> Since I assume v3.5 is fine, and apparently -rc1 is already busted, a simple
>
> git bisect start
> git bisect good v3.5
> git bisect bad v3.6-rc1
>
> will get you started on your adventure..

Heh, will give it a try :). The thing really does look quite bisectable.

It might take a few hours though - the machine isn't exactly fast by today's standards and it's getting late here. But I'll keep you updated.

Alex

2012-09-28 15:35:04

by Alexander Graf

[permalink] [raw]

Subject: Re: [REGRESSION] nfsd crashing with 3.6.0-rc7 on PowerPC

2012-09-28 02:04:54

by Linus Torvalds

[permalink] [raw]

Subject: Re: [REGRESSION] nfsd crashing with 3.6.0-rc7 on PowerPC

On Thu, Sep 27, 2012 at 6:55 PM, Alexander Graf <[email protected]> wrote:
>
> Below are OOPS excerpts from different rc's I tried. All of them crashed - all the way up to current Linus' master branch. I haven't cross-checked, but I don't remember any such behavior from pre-3.6 releases.

Since you seem to be able to reproduce it easily (and apparently
reliably), any chance you could just bisect it?

Since I assume v3.5 is fine, and apparently -rc1 is already busted, a simple

git bisect start
git bisect good v3.5
git bisect bad v3.6-rc1

will get you started on your adventure..

Linus

2012-10-02 22:31:16

by Alexander Graf

[permalink] [raw]

Subject: Re: [REGRESSION] nfsd crashing with 3.6.0-rc7 on PowerPC

On 03.10.2012, at 00:17, Nishanth Aravamudan wrote:

> On 02.10.2012 [23:47:39 +0200], Alexander Graf wrote:
>>
>> On 02.10.2012, at 23:43, Nishanth Aravamudan wrote:
>>
>>> Hi Ben,
>>>
>>> On 02.10.2012 [10:58:29 +1000], Benjamin Herrenschmidt wrote:
>>>> On Mon, 2012-10-01 at 16:03 +0200, Alexander Graf wrote:
>>>>> Phew. Here we go :). It looks to be more of a PPC specific problem
>>>>> than it appeared as at first:
>>>>
>>>> Ok, so I suspect the problem is the pushing down of the locks which
>>>> breaks with iommu backends that have a separate flush callback. In
>>>> that case, the flush moves out of the allocator lock.
>>>>
>>>> Now we do call flush before we return, still, but it becomes racy
>>>> I suspect, but somebody needs to give it a closer look. I'm hoping
>>>> Anton or Nish will later today.
>>>
>>> Started looking into this. If your suspicion were accurate, wouldn't the
>>> bisection have stopped at 0e4bc95d87394364f408627067238453830bdbf3
>>> ("powerpc/iommu: Reduce spinlock coverage in iommu_alloc and
>>> iommu_free")?
>>>
>>> Alex, the error is reproducible, right?
>>
>> Yes. I'm having a hard time to figure out if the reason my U4 based G5
>> Mac crashes and fails reading data is the same since I don't have a
>> serial connection there, but I assume so.
>
> Ok, great, thanks. Yeah, that would imply (I think) that the I would
> have thought the lock pushdown in the above commit (or even in one of
> the others in Anton's series) would have been the real source if it was
> a lock-based race. But that's just my first sniff at what Ben was
> suggesting. Still reading/understanding the code.
>
>>> Does it go away by reverting
>>> that commit against mainline? Just trying to narrow down my focus.
>>
>> The patch doesn't revert that easily. Mind to provide a revert patch
>> so I can try?
>
> The following at least builds on defconfig here:

Yes. With that patch applied, things work for me again.

Alex

2012-10-02 22:17:48

by Nishanth Aravamudan

[permalink] [raw]

Subject: Re: [REGRESSION] nfsd crashing with 3.6.0-rc7 on PowerPC

On 02.10.2012 [23:47:39 +0200], Alexander Graf wrote:
>
> On 02.10.2012, at 23:43, Nishanth Aravamudan wrote:
>
> > Hi Ben,
> >
> > On 02.10.2012 [10:58:29 +1000], Benjamin Herrenschmidt wrote:
> >> On Mon, 2012-10-01 at 16:03 +0200, Alexander Graf wrote:
> >>> Phew. Here we go :). It looks to be more of a PPC specific problem
> >>> than it appeared as at first:
> >>
> >> Ok, so I suspect the problem is the pushing down of the locks which
> >> breaks with iommu backends that have a separate flush callback. In
> >> that case, the flush moves out of the allocator lock.
> >>
> >> Now we do call flush before we return, still, but it becomes racy
> >> I suspect, but somebody needs to give it a closer look. I'm hoping
> >> Anton or Nish will later today.
> >
> > Started looking into this. If your suspicion were accurate, wouldn't the
> > bisection have stopped at 0e4bc95d87394364f408627067238453830bdbf3
> > ("powerpc/iommu: Reduce spinlock coverage in iommu_alloc and
> > iommu_free")?
> >
> > Alex, the error is reproducible, right?
>
> Yes. I'm having a hard time to figure out if the reason my U4 based G5
> Mac crashes and fails reading data is the same since I don't have a
> serial connection there, but I assume so.

Ok, great, thanks. Yeah, that would imply (I think) that the I would
have thought the lock pushdown in the above commit (or even in one of
the others in Anton's series) would have been the real source if it was
a lock-based race. But that's just my first sniff at what Ben was
suggesting. Still reading/understanding the code.

> > Does it go away by reverting
> > that commit against mainline? Just trying to narrow down my focus.
>
> The patch doesn't revert that easily. Mind to provide a revert patch
> so I can try?

The following at least builds on defconfig here:

diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h
index cbfe678..957a83f 100644
--- a/arch/powerpc/include/asm/iommu.h
+++ b/arch/powerpc/include/asm/iommu.h
@@ -53,16 +53,6 @@ static __inline__ __attribute_const__ int get_iommu_order(unsigned long size)
*/
#define IOMAP_MAX_ORDER 13

-#define IOMMU_POOL_HASHBITS 2
-#define IOMMU_NR_POOLS (1 << IOMMU_POOL_HASHBITS)
-
-struct iommu_pool {
- unsigned long start;
- unsigned long end;
- unsigned long hint;
- spinlock_t lock;
-} ____cacheline_aligned_in_smp;
-
struct iommu_table {
unsigned long it_busno; /* Bus number this table belongs to */
unsigned long it_size; /* Size of iommu table in entries */
@@ -71,10 +61,10 @@ struct iommu_table {
unsigned long it_index; /* which iommu table this is */
unsigned long it_type; /* type: PCI or Virtual Bus */
unsigned long it_blocksize; /* Entries in each block (cacheline) */
- unsigned long poolsize;
- unsigned long nr_pools;
- struct iommu_pool large_pool;
- struct iommu_pool pools[IOMMU_NR_POOLS];
+ unsigned long it_hint; /* Hint for next alloc */
+ unsigned long it_largehint; /* Hint for large allocs */
+ unsigned long it_halfpoint; /* Breaking point for small/large allocs */
+ spinlock_t it_lock; /* Protects it_map */
unsigned long *it_map; /* A simple allocation bitmap for now */
};

diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
index ff5a6ce..9a31f3c 100644
--- a/arch/powerpc/kernel/iommu.c
+++ b/arch/powerpc/kernel/iommu.c
@@ -62,26 +62,6 @@ static int __init setup_iommu(char *str)

__setup("iommu=", setup_iommu);

-static DEFINE_PER_CPU(unsigned int, iommu_pool_hash);
-
-/*
- * We precalculate the hash to avoid doing it on every allocation.
- *
- * The hash is important to spread CPUs across all the pools. For example,
- * on a POWER7 with 4 way SMT we want interrupts on the primary threads and
- * with 4 pools all primary threads would map to the same pool.
- */
-static int __init setup_iommu_pool_hash(void)
-{
- unsigned int i;
-
- for_each_possible_cpu(i)
- per_cpu(iommu_pool_hash, i) = hash_32(i, IOMMU_POOL_HASHBITS);
-
- return 0;
-}
-subsys_initcall(setup_iommu_pool_hash);
-
#ifdef CONFIG_FAIL_IOMMU

static DECLARE_FAULT_ATTR(fail_iommu);
@@ -184,8 +164,6 @@ static unsigned long iommu_range_alloc(struct device *dev,
unsigned long align_mask;
unsigned long boundary_size;
unsigned long flags;
- unsigned int pool_nr;
- struct iommu_pool *pool;

align_mask = 0xffffffffffffffffl >> (64 - align_order);

@@ -201,46 +179,38 @@ static unsigned long iommu_range_alloc(struct device *dev,
if (should_fail_iommu(dev))
return DMA_ERROR_CODE;

- /*
- * We don't need to disable preemption here because any CPU can
- * safely use any IOMMU pool.
- */
- pool_nr = __raw_get_cpu_var(iommu_pool_hash) & (tbl->nr_pools - 1);
-
- if (largealloc)
- pool = &(tbl->large_pool);
- else
- pool = &(tbl->pools[pool_nr]);
-
- spin_lock_irqsave(&(pool->lock), flags);
+ spin_lock_irqsave(&(tbl->it_lock), flags);

-again:
- if ((pass == 0) && handle && *handle)
+ if (handle && *handle)
start = *handle;
else
- start = pool->hint;
+ start = largealloc ? tbl->it_largehint : tbl->it_hint;

- limit = pool->end;
+ /* Use only half of the table for small allocs (15 pages or less) */
+ limit = largealloc ? tbl->it_size : tbl->it_halfpoint;
+
+ if (largealloc && start < tbl->it_halfpoint)
+ start = tbl->it_halfpoint;

/* The case below can happen if we have a small segment appended
* to a large, or when the previous alloc was at the very end of
* the available space. If so, go back to the initial start.
*/
if (start >= limit)
- start = pool->start;
+ start = largealloc ? tbl->it_largehint : tbl->it_hint;
+
+ again:

if (limit + tbl->it_offset > mask) {
limit = mask - tbl->it_offset + 1;
/* If we're constrained on address range, first try
* at the masked hint to avoid O(n) search complexity,
- * but on second pass, start at 0 in pool 0.
+ * but on second pass, start at 0.
*/
- if ((start & mask) >= limit || pass > 0) {
- pool = &(tbl->pools[0]);
- start = pool->start;
- } else {
+ if ((start & mask) >= limit || pass > 0)
+ start = 0;
+ else
start &= mask;
- }
}

if (dev)
@@ -254,25 +224,17 @@ again:
tbl->it_offset, boundary_size >> IOMMU_PAGE_SHIFT,
align_mask);
if (n == -1) {
- if (likely(pass == 0)) {
- /* First try the pool from the start */
- pool->hint = pool->start;
- pass++;
- goto again;
-
- } else if (pass <= tbl->nr_pools) {
- /* Now try scanning all the other pools */
- spin_unlock(&(pool->lock));
- pool_nr = (pool_nr + 1) & (tbl->nr_pools - 1);
- pool = &tbl->pools[pool_nr];
- spin_lock(&(pool->lock));
- pool->hint = pool->start;
+ if (likely(pass < 2)) {
+ /* First failure, just rescan the half of the table.
+ * Second failure, rescan the other half of the table.
+ */
+ start = (largealloc ^ pass) ? tbl->it_halfpoint : 0;
+ limit = pass ? tbl->it_size : limit;
pass++;
goto again;
-
} else {
- /* Give up */
- spin_unlock_irqrestore(&(pool->lock), flags);
+ /* Third failure, give up */
+ spin_unlock_irqrestore(&(tbl->it_lock), flags);
return DMA_ERROR_CODE;
}
}
@@ -282,10 +244,10 @@ again:
/* Bump the hint to a new block for small allocs. */
if (largealloc) {
/* Don't bump to new block to avoid fragmentation */
- pool->hint = end;
+ tbl->it_largehint = end;
} else {
/* Overflow will be taken care of at the next allocation */
- pool->hint = (end + tbl->it_blocksize - 1) &
+ tbl->it_hint = (end + tbl->it_blocksize - 1) &
~(tbl->it_blocksize - 1);
}

@@ -293,8 +255,7 @@ again:
if (handle)
*handle = end;

- spin_unlock_irqrestore(&(pool->lock), flags);
-
+ spin_unlock_irqrestore(&(tbl->it_lock), flags);
return n;
}

@@ -369,45 +330,23 @@ static bool iommu_free_check(struct iommu_table *tbl, dma_addr_t dma_addr,
return true;
}

-static struct iommu_pool *get_pool(struct iommu_table *tbl,
- unsigned long entry)
-{
- struct iommu_pool *p;
- unsigned long largepool_start = tbl->large_pool.start;
-
- /* The large pool is the last pool at the top of the table */
- if (entry >= largepool_start) {
- p = &tbl->large_pool;
- } else {
- unsigned int pool_nr = entry / tbl->poolsize;
-
- BUG_ON(pool_nr > tbl->nr_pools);
- p = &tbl->pools[pool_nr];
- }
-
- return p;
-}
-
static void __iommu_free(struct iommu_table *tbl, dma_addr_t dma_addr,
unsigned int npages)
{
unsigned long entry, free_entry;
unsigned long flags;
- struct iommu_pool *pool;

entry = dma_addr >> IOMMU_PAGE_SHIFT;
free_entry = entry - tbl->it_offset;

- pool = get_pool(tbl, free_entry);
-
if (!iommu_free_check(tbl, dma_addr, npages))
return;

ppc_md.tce_free(tbl, entry, npages);

- spin_lock_irqsave(&(pool->lock), flags);
+ spin_lock_irqsave(&(tbl->it_lock), flags);
bitmap_clear(tbl->it_map, free_entry, npages);
- spin_unlock_irqrestore(&(pool->lock), flags);
+ spin_unlock_irqrestore(&(tbl->it_lock), flags);
}

static void iommu_free(struct iommu_table *tbl, dma_addr_t dma_addr,
@@ -649,8 +588,9 @@ struct iommu_table *iommu_init_table(struct iommu_table *tbl, int nid)
unsigned long sz;
static int welcomed = 0;
struct page *page;
- unsigned int i;
- struct iommu_pool *p;
+
+ /* Set aside 1/4 of the table for large allocations. */
+ tbl->it_halfpoint = tbl->it_size * 3 / 4;

/* number of bytes needed for the bitmap */
sz = (tbl->it_size + 7) >> 3;
@@ -669,28 +609,9 @@ struct iommu_table *iommu_init_table(struct iommu_table *tbl, int nid)
if (tbl->it_offset == 0)
set_bit(0, tbl->it_map);

- /* We only split the IOMMU table if we have 1GB or more of space */
- if ((tbl->it_size << IOMMU_PAGE_SHIFT) >= (1UL * 1024 * 1024 * 1024))
- tbl->nr_pools = IOMMU_NR_POOLS;
- else
- tbl->nr_pools = 1;
-
- /* We reserve the top 1/4 of the table for large allocations */
- tbl->poolsize = (tbl->it_size * 3 / 4) / tbl->nr_pools;
-
- for (i = 0; i < tbl->nr_pools; i++) {
- p = &tbl->pools[i];
- spin_lock_init(&(p->lock));
- p->start = tbl->poolsize * i;
- p->hint = p->start;
- p->end = p->start + tbl->poolsize;
- }
-
- p = &tbl->large_pool;
- spin_lock_init(&(p->lock));
- p->start = tbl->poolsize * i;
- p->hint = p->start;
- p->end = tbl->it_size;
+ tbl->it_hint = 0;
+ tbl->it_largehint = tbl->it_halfpoint;
+ spin_lock_init(&tbl->it_lock);

iommu_table_clear(tbl);

diff --git a/arch/powerpc/platforms/cell/iommu.c b/arch/powerpc/platforms/cell/iommu.c
index dca2136..b673200 100644
--- a/arch/powerpc/platforms/cell/iommu.c
+++ b/arch/powerpc/platforms/cell/iommu.c
@@ -518,6 +518,7 @@ cell_iommu_setup_window(struct cbe_iommu *iommu, struct device_node *np,
__set_bit(0, window->table.it_map);
tce_build_cell(&window->table, window->table.it_offset, 1,
(unsigned long)iommu->pad_page, DMA_TO_DEVICE, NULL);
+ window->table.it_hint = window->table.it_blocksize;

return window;
}

2012-10-01 14:03:19

by Alexander Graf

[permalink] [raw]

Subject: Re: [REGRESSION] nfsd crashing with 3.6.0-rc7 on PowerPC

On 28.09.2012, at 17:10, J. Bruce Fields wrote:

> On Fri, Sep 28, 2012 at 04:19:55AM +0200, Alexander Graf wrote:
>>
>> On 28.09.2012, at 04:04, Linus Torvalds wrote:
>>
>>> On Thu, Sep 27, 2012 at 6:55 PM, Alexander Graf <[email protected]> wrote:
>>>>
>>>> Below are OOPS excerpts from different rc's I tried. All of them crashed - all the way up to current Linus' master branch. I haven't cross-checked, but I don't remember any such behavior from pre-3.6 releases.
>>>
>>> Since you seem to be able to reproduce it easily (and apparently
>>> reliably), any chance you could just bisect it?
>>>
>>> Since I assume v3.5 is fine, and apparently -rc1 is already busted, a simple
>>>
>>> git bisect start
>>> git bisect good v3.5
>>> git bisect bad v3.6-rc1
>>>
>>> will get you started on your adventure..
>>
>> Heh, will give it a try :). The thing really does look quite bisectable.
>>
>>
>> It might take a few hours though - the machine isn't exactly fast by today's standards and it's getting late here. But I'll keep you updated.
>
> I doubt it's anything special about that workload, but just for kicks I
> tried a "git clone -ls" (cloning my linux tree to another directory on
> the same nfs filesystem), with server on 3.6.0-rc7, and didn't see
> anything interesting (just an xfs lockdep warning that looks like this
> one jlayton already reported:
> http://oss.sgi.com/archives/xfs/2012-09/msg00088.html
> )
>
> Any (even partial) bisection results would certainly be useful, thanks.

Phew. Here we go :). It looks to be more of a PPC specific problem than it appeared as at first:

b4c3a8729ae57b4f84d661e16a192f828eca1d03 is first bad commit
commit b4c3a8729ae57b4f84d661e16a192f828eca1d03
Author: Anton Blanchard <[email protected]>
Date: Thu Jun 7 18:14:48 2012 +0000

powerpc/iommu: Implement IOMMU pools to improve multiqueue adapter performance

At the moment all queues in a multiqueue adapter will serialise
against the IOMMU table lock. This is proving to be a big issue,
especially with 10Gbit ethernet.

This patch creates 4 pools and tries to spread the load across
them. If the table is under 1GB in size we revert back to the
original behaviour of 1 pool and 1 largealloc pool.

We create a hash to map CPUs to pools. Since we prefer interrupts to
be affinitised to primary CPUs, without some form of hashing we are
very likely to end up using the same pool. As an example, POWER7
has 4 way SMT and with 4 pools all primary threads will map to the
same pool.

The largealloc pool is reduced from 1/2 to 1/4 of the space to
partially offset the overhead of breaking the table up into pools.

Some performance numbers were obtained with a Chelsio T3 adapter on
two POWER7 boxes, running a 100 session TCP round robin test.

Performance improved 69% with this patch applied.

Signed-off-by: Anton Blanchard <[email protected]>
Signed-off-by: Benjamin Herrenschmidt <[email protected]>

:040000 040000 039ae3cbdcfded9c6b13e58a3fc67609f1b587b0 6755a8c4a690cc80dcf834d1127f21db925476d6 M arch

Alex

2012-10-02 21:43:39

by Nishanth Aravamudan

[permalink] [raw]

Subject: Re: [REGRESSION] nfsd crashing with 3.6.0-rc7 on PowerPC

Hi Ben,

On 02.10.2012 [10:58:29 +1000], Benjamin Herrenschmidt wrote:
> On Mon, 2012-10-01 at 16:03 +0200, Alexander Graf wrote:
> > Phew. Here we go :). It looks to be more of a PPC specific problem
> > than it appeared as at first:
>
> Ok, so I suspect the problem is the pushing down of the locks which
> breaks with iommu backends that have a separate flush callback. In
> that case, the flush moves out of the allocator lock.
>
> Now we do call flush before we return, still, but it becomes racy
> I suspect, but somebody needs to give it a closer look. I'm hoping
> Anton or Nish will later today.

Started looking into this. If your suspicion were accurate, wouldn't the
bisection have stopped at 0e4bc95d87394364f408627067238453830bdbf3
("powerpc/iommu: Reduce spinlock coverage in iommu_alloc and
iommu_free")?

Alex, the error is reproducible, right? Does it go away by reverting
that commit against mainline? Just trying to narrow down my focus.

Thanks,
Nish

2012-10-01 15:21:29

by J. Bruce Fields

[permalink] [raw]

Subject: Re: [REGRESSION] nfsd crashing with 3.6.0-rc7 on PowerPC

On Mon, Oct 01, 2012 at 04:03:06PM +0200, Alexander Graf wrote:
>
> On 28.09.2012, at 17:10, J. Bruce Fields wrote:
>
> > On Fri, Sep 28, 2012 at 04:19:55AM +0200, Alexander Graf wrote:
> >>
> >> On 28.09.2012, at 04:04, Linus Torvalds wrote:
> >>
> >>> On Thu, Sep 27, 2012 at 6:55 PM, Alexander Graf <[email protected]> wrote:
> >>>>
> >>>> Below are OOPS excerpts from different rc's I tried. All of them crashed - all the way up to current Linus' master branch. I haven't cross-checked, but I don't remember any such behavior from pre-3.6 releases.
> >>>
> >>> Since you seem to be able to reproduce it easily (and apparently
> >>> reliably), any chance you could just bisect it?
> >>>
> >>> Since I assume v3.5 is fine, and apparently -rc1 is already busted, a simple
> >>>
> >>> git bisect start
> >>> git bisect good v3.5
> >>> git bisect bad v3.6-rc1
> >>>
> >>> will get you started on your adventure..
> >>
> >> Heh, will give it a try :). The thing really does look quite bisectable.
> >>
> >>
> >> It might take a few hours though - the machine isn't exactly fast by today's standards and it's getting late here. But I'll keep you updated.
> >
> > I doubt it's anything special about that workload, but just for kicks I
> > tried a "git clone -ls" (cloning my linux tree to another directory on
> > the same nfs filesystem), with server on 3.6.0-rc7, and didn't see
> > anything interesting (just an xfs lockdep warning that looks like this
> > one jlayton already reported:
> > http://oss.sgi.com/archives/xfs/2012-09/msg00088.html
> > )
> >
> > Any (even partial) bisection results would certainly be useful, thanks.
>
> Phew. Here we go :). It looks to be more of a PPC specific problem than it appeared as at first:

Yep, thanks--I'll assume this is somebody else's problem until somebody
tells me otherwise!

--b.

>
>
> b4c3a8729ae57b4f84d661e16a192f828eca1d03 is first bad commit
> commit b4c3a8729ae57b4f84d661e16a192f828eca1d03
> Author: Anton Blanchard <[email protected]>
> Date: Thu Jun 7 18:14:48 2012 +0000
>
> powerpc/iommu: Implement IOMMU pools to improve multiqueue adapter performance
>
> At the moment all queues in a multiqueue adapter will serialise
> against the IOMMU table lock. This is proving to be a big issue,
> especially with 10Gbit ethernet.
>
> This patch creates 4 pools and tries to spread the load across
> them. If the table is under 1GB in size we revert back to the
> original behaviour of 1 pool and 1 largealloc pool.
>
> We create a hash to map CPUs to pools. Since we prefer interrupts to
> be affinitised to primary CPUs, without some form of hashing we are
> very likely to end up using the same pool. As an example, POWER7
> has 4 way SMT and with 4 pools all primary threads will map to the
> same pool.
>
> The largealloc pool is reduced from 1/2 to 1/4 of the space to
> partially offset the overhead of breaking the table up into pools.
>
> Some performance numbers were obtained with a Chelsio T3 adapter on
> two POWER7 boxes, running a 100 session TCP round robin test.
>
> Performance improved 69% with this patch applied.
>
> Signed-off-by: Anton Blanchard <[email protected]>
> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
>
> :040000 040000 039ae3cbdcfded9c6b13e58a3fc67609f1b587b0 6755a8c4a690cc80dcf834d1127f21db925476d6 M arch
>
>
> Alex
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2012-10-04 10:54:58

by Alexander Graf

[permalink] [raw]

Subject: Re: [PATCH] powerpc/iommu: Fix multiple issues with IOMMU pools code

Hi Anton,

On 04.10.2012, at 06:57, Anton Blanchard wrote:

>
> Hi Alex,
>
> Looks to be a preempt issue with the iommu pools code. I did find a
> couple more bugs along the way too.
>
> Anton
> --
>
> There are a number of issues in the recent IOMMU pools code:
>
> - On a preempt kernel we might switch CPUs in the middle of building
> a scatter gather list. When this happens the handle hint passed in
> no longer falls within the local CPU's pool. Check for this and
> fall back to the pool hint.
>
> - We were missing a spin_unlock/spin_lock in one spot where we
> switch pools.
>
> - We need to provide locking around dart_tlb_invalidate_all and
> dart_tlb_invalidate_one now that the global lock is gone.
>
> Reported-by: Alexander Graf <[email protected]>
> Signed-off-by: Anton Blanchard <[email protected]>
> ---
>
> There is still an issue with the lazy u3 flushing, but I wanted
> to get this out for testing.

Yup. It fixes the nfs problem on my U4 based machine.

Tested-by: Alexander Graf <[email protected]>

Alex

2012-10-04 00:26:13

by Anton Blanchard

[permalink] [raw]

Subject: Re: [REGRESSION] nfsd crashing with 3.6.0-rc7 on PowerPC

Hi,

> Yes. With that patch applied, things work for me again.

Thanks Alex, Nish. We can reproduce this on one of our Biminis, looking
into it now.

Anton

2012-10-02 00:59:57

by Benjamin Herrenschmidt

[permalink] [raw]

Subject: Re: [REGRESSION] nfsd crashing with 3.6.0-rc7 on PowerPC

On Mon, 2012-10-01 at 16:03 +0200, Alexander Graf wrote:
> Phew. Here we go :). It looks to be more of a PPC specific problem than it appeared as at first:

Ok, so I suspect the problem is the pushing down of the locks which
breaks with iommu backends that have a separate flush callback. In
that case, the flush moves out of the allocator lock.

Now we do call flush before we return, still, but it becomes racy
I suspect, but somebody needs to give it a closer look. I'm hoping
Anton or Nish will later today.

Cheers,
Ben.

>
> b4c3a8729ae57b4f84d661e16a192f828eca1d03 is first bad commit
> commit b4c3a8729ae57b4f84d661e16a192f828eca1d03
> Author: Anton Blanchard <[email protected]>
> Date: Thu Jun 7 18:14:48 2012 +0000
>
> powerpc/iommu: Implement IOMMU pools to improve multiqueue adapter performance
>
> At the moment all queues in a multiqueue adapter will serialise
> against the IOMMU table lock. This is proving to be a big issue,
> especially with 10Gbit ethernet.
>
> This patch creates 4 pools and tries to spread the load across
> them. If the table is under 1GB in size we revert back to the
> original behaviour of 1 pool and 1 largealloc pool.
>
> We create a hash to map CPUs to pools. Since we prefer interrupts to
> be affinitised to primary CPUs, without some form of hashing we are
> very likely to end up using the same pool. As an example, POWER7
> has 4 way SMT and with 4 pools all primary threads will map to the
> same pool.
>
> The largealloc pool is reduced from 1/2 to 1/4 of the space to
> partially offset the overhead of breaking the table up into pools.
>
> Some performance numbers were obtained with a Chelsio T3 adapter on
> two POWER7 boxes, running a 100 session TCP round robin test.
>
> Performance improved 69% with this patch applied.
>
> Signed-off-by: Anton Blanchard <[email protected]>
> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
>
> :040000 040000 039ae3cbdcfded9c6b13e58a3fc67609f1b587b0 6755a8c4a690cc80dcf834d1127f21db925476d6 M arch
>
>
> Alex

2012-10-04 04:57:22

by Anton Blanchard

[permalink] [raw]

Subject: [PATCH] powerpc/iommu: Fix multiple issues with IOMMU pools code

Hi Alex,

Looks to be a preempt issue with the iommu pools code. I did find a
couple more bugs along the way too.

Anton
--

There are a number of issues in the recent IOMMU pools code:

- On a preempt kernel we might switch CPUs in the middle of building
a scatter gather list. When this happens the handle hint passed in
no longer falls within the local CPU's pool. Check for this and
fall back to the pool hint.

- We were missing a spin_unlock/spin_lock in one spot where we
switch pools.

- We need to provide locking around dart_tlb_invalidate_all and
dart_tlb_invalidate_one now that the global lock is gone.

Reported-by: Alexander Graf <[email protected]>
Signed-off-by: Anton Blanchard <[email protected]>
---

There is still an issue with the lazy u3 flushing, but I wanted
to get this out for testing.

Index: b/arch/powerpc/kernel/iommu.c
===================================================================
--- a/arch/powerpc/kernel/iommu.c
+++ b/arch/powerpc/kernel/iommu.c
@@ -215,7 +215,8 @@ static unsigned long iommu_range_alloc(s
spin_lock_irqsave(&(pool->lock), flags);

again:
- if ((pass == 0) && handle && *handle)
+ if ((pass == 0) && handle && *handle &&
+ (*handle >= pool->start) && (*handle < pool->end))
start = *handle;
else
start = pool->hint;
@@ -236,7 +237,9 @@ again:
* but on second pass, start at 0 in pool 0.
*/
if ((start & mask) >= limit || pass > 0) {
+ spin_unlock(&(pool->lock));
pool = &(tbl->pools[0]);
+ spin_lock(&(pool->lock));
start = pool->start;
} else {
start &= mask;
Index: b/arch/powerpc/sysdev/dart_iommu.c
===================================================================
--- a/arch/powerpc/sysdev/dart_iommu.c
+++ b/arch/powerpc/sysdev/dart_iommu.c
@@ -74,11 +74,16 @@ static int dart_is_u4;

#define DBG(...)

+static DEFINE_SPINLOCK(invalidate_lock);
+
static inline void dart_tlb_invalidate_all(void)
{
unsigned long l = 0;
unsigned int reg, inv_bit;
unsigned long limit;
+ unsigned long flags;
+
+ spin_lock_irqsave(&invalidate_lock, flags);

DBG("dart: flush\n");

@@ -111,12 +116,17 @@ retry:
panic("DART: TLB did not flush after waiting a long "
"time. Buggy U3 ?");
}
+
+ spin_unlock_irqrestore(&invalidate_lock, flags);
}

static inline void dart_tlb_invalidate_one(unsigned long bus_rpn)
{
unsigned int reg;
unsigned int l, limit;
+ unsigned long flags;
+
+ spin_lock_irqsave(&invalidate_lock, flags);

reg = DART_CNTL_U4_ENABLE | DART_CNTL_U4_IONE |
(bus_rpn & DART_CNTL_U4_IONE_MASK);
@@ -138,6 +148,8 @@ wait_more:
panic("DART: TLB did not flush after waiting a long "
"time. Buggy U4 ?");
}
+
+ spin_unlock_irqrestore(&invalidate_lock, flags);
}

static void dart_flush(struct iommu_table *tbl)

2012-10-02 21:47:53

by Alexander Graf

[permalink] [raw]

Subject: Re: [REGRESSION] nfsd crashing with 3.6.0-rc7 on PowerPC

On 02.10.2012, at 23:43, Nishanth Aravamudan wrote:

> Hi Ben,
>
> On 02.10.2012 [10:58:29 +1000], Benjamin Herrenschmidt wrote:
>> On Mon, 2012-10-01 at 16:03 +0200, Alexander Graf wrote:
>>> Phew. Here we go :). It looks to be more of a PPC specific problem
>>> than it appeared as at first:
>>
>> Ok, so I suspect the problem is the pushing down of the locks which
>> breaks with iommu backends that have a separate flush callback. In
>> that case, the flush moves out of the allocator lock.
>>
>> Now we do call flush before we return, still, but it becomes racy
>> I suspect, but somebody needs to give it a closer look. I'm hoping
>> Anton or Nish will later today.
>
> Started looking into this. If your suspicion were accurate, wouldn't the
> bisection have stopped at 0e4bc95d87394364f408627067238453830bdbf3
> ("powerpc/iommu: Reduce spinlock coverage in iommu_alloc and
> iommu_free")?
>
> Alex, the error is reproducible, right?

Yes. I'm having a hard time to figure out if the reason my U4 based G5 Mac crashes and fails reading data is the same since I don't have a serial connection there, but I assume so.

> Does it go away by reverting
> that commit against mainline? Just trying to narrow down my focus.

The patch doesn't revert that easily. Mind to provide a revert patch so I can try?

Alex