2013-04-15 08:29:47

by Joakim Tjernlund

[permalink] [raw]
Subject: NFS loop on 3.4.39

I get som NFS loop which generates lots of (from tcpdump):
9:48:43.156252 IP 192.168.201.44.nfs > 172.20.4.10.con: Flags [.], ack
8406616, win 32885, options [nop,nop,TS val 43795665 ecr 57419706], length
0
09:48:43.156258 IP 192.168.201.44.nfs > 172.20.4.10.con: Flags [.], ack
8426888, win 32885, options [nop,nop,TS val 43795665 ecr 57419706], length
0
09:48:43.156669 IP 192.168.201.44.nfs > 172.20.4.10.con: Flags [.], ack
8445712, win 32885, options [nop,nop,TS val 43795665 ecr 57419706], length
0
09:48:43.156675 IP 192.168.201.44.nfs > 172.20.4.10.con: Flags [.], ack
8459880, win 32883, options [nop,nop,TS val 43795665 ecr 57419706], length
0
09:48:43.156679 IP 192.168.201.44.nfs > 172.20.4.10.2889942938: reply ok
52 getattr ERROR: unk 10024
09:48:43.156704 IP 172.20.4.10.2923497370 > 192.168.201.44.nfs: 2892
getattr fh 0,0/22
09:48:43.156712 IP 172.20.4.10.con > 192.168.201.44.nfs: Flags [.], seq
8462776:8467120, ack 9577, win 23907, options [nop,nop,TS val 57419706 ecr
43795665], length 4344
09:48:43.156719 IP 172.20.4.10.con > 192.168.201.44.nfs: Flags [.], seq
8467120:8471464, ack 9577, win 23907, options [nop,nop,TS val 57419706 ecr
43795665], length 4344
09:48:43.156725 IP 172.20.4.10.con > 192.168.201.44.nfs: Flags [.], seq
8471464:8475808, ack 9577, win 23907, options [nop,nop,TS val 57419706 ecr
43795665], length 4344
09:48:43.156731 IP 172.20.4.10.con > 192.168.201.44.nfs: Flags [.], seq
8475808:8480152, ack 9577, win 23907, options [nop,nop,TS val 57419706 ecr
43795665], length 4344
09:48:43.156745 IP 172.20.4.10.con > 192.168.201.44.nfs: Flags [.], seq
8480152:8484496, ack 9577, win 23907, options [nop,nop,TS val 57419706 ecr
43795665], length 4344
09:48:43.156748 IP 172.20.4.10.con > 192.168.201.44.nfs: Flags [.], seq
8484496:8487392, ack 9577, win 23907, options [nop,nop,TS val 57419706 ecr
43795665], length 2896
09:48:43.156751 IP 172.20.4.10.con > 192.168.201.44.nfs: Flags [.], seq
8487392:8491736, ack 9577, win 23907, options [nop,nop,TS val 57419706 ecr
43795665], length 4344
09:48:43.156755 IP 172.20.4.10.con > 192.168.201.44.nfs: Flags [.], seq
8491736:8496080, ack 9577, win 23907, options [nop,nop,TS val 57419706 ecr
43795665], length 4344
09:48:43.156758 IP 172.20.4.10.con > 192.168.201.44.nfs: Flags [.], seq
8496080:8500424, ack 9577, win 23907, options [nop,nop,TS val 57419706 ecr
43795665], length 4344
09:48:43.156764 IP 172.20.4.10.con > 192.168.201.44.nfs: Flags [.], seq
8500424:8504768, ack 9577, win 23907, options [nop,nop,TS val 57419706 ecr
43795665], length 4344
09:48:43.156767 IP 172.20.4.10.con > 192.168.201.44.nfs: Flags [.], seq
8504768:8509112, ack 9577, win 23907, options [nop,nop,TS val 57419706 ecr
43795665], length 4344
09:48:43.156770 IP 172.20.4.10.con > 192.168.201.44.nfs: Flags [.], seq
8509112:8512008, ack 9577, win 23907, options [nop,nop,TS val 57419706 ecr
43795665], length 2896
09:48:43.156773 IP 172.20.4.10.con > 192.168.201.44.nfs: Flags [.], seq
8512008:8516352, ack 9577, win 23907, options [nop,nop,TS val 57419706 ecr
43795665], length 4344
09:48:43.156778 IP 172.20.4.10.con > 192.168.201.44.nfs: Flags [.], seq
8516352:8520696, ack 9577, win 23907, options [nop,nop,TS val 57419706 ecr
43795665], length 4344
09:48:43.156956 IP 192.168.201.44.nfs > 172.20.4.10.2906720154: reply ok
52 getattr ERROR: unk 10024
09:48:43.156969 IP 172.20.4.10.con > 192.168.201.44.nfs: Flags [.], seq
8520696:8525040, ack 9577, win 23907, options [nop,nop,TS val 57419706 ecr
43795665], length 4344
09:48:43.156975 IP 172.20.4.10.con > 192.168.201.44.nfs: Flags [.], seq
8525040:8529384, ack 9577, win 23907, options [nop,nop,TS val 57419706 ecr
43795665], length 4344
09:48:43.156980 IP 172.20.4.10.con > 192.168.201.44.nfs: Flags [.], seq
8529384:8533728, ack 9577, win 23907, options [nop,nop,TS val 57419706 ecr
43795665], length 4344
09:48:43.156984 IP 172.20.4.10.con > 192.168.201.44.nfs: Flags [.], seq
8533728:8536624, ack 9577, win 23907, options [nop,nop,TS val 57419706 ecr
43795665], length 2896
09:48:43.156988 IP 172.20.4.10.con > 192.168.201.44.nfs: Flags [.], seq
8536624:8540968, ack 9577, win 23907, options [nop,nop,TS val 57419706 ecr
43795665], length 4344
09:48:43.156994 IP 172.20.4.10.con > 192.168.201.44.nfs: Flags [.], seq
8540968:8545312, ack 9577, win 23907, options [nop,nop,TS val 57419706 ecr
43795665], length 4344
09:48:43.156999 IP 172.20.4.10.con > 192.168.201.44.nfs: Flags [.], seq
8545312:8549656, ack 9577, win 23907, options [nop,nop,TS val 57419706 ecr
43795665], length 4344
09:48:43.157209 IP 192.168.201.44.nfs > 172.20.4.10.con: Flags [.], ack
8464224, win 32872, options [nop,nop,TS val 43795665 ecr 57419706], length
0
09:48:43.157219 IP 172.20.4.10.con > 192.168.201.44.nfs: Flags [.], seq
8549656:8554000, ack 9577, win 23907, options [nop,nop,TS val 57419706 ecr
43795665], length 4344
09:48:43.157225 IP 172.20.4.10.con > 192.168.201.44.nfs: Flags [.], seq
8554000:8558344, ack 9577, win 23907, options [nop,nop,TS val 57419706 ecr
43795665], length 4344
09:48:43.157230 IP 172.20.4.10.con > 192.168.201.44.nfs: Flags [P.], seq
8558344:8558424, ack 9577, win 23907, options [nop,nop,TS val 57419706 ecr
43795665], length 80
09:48:43.157235 IP 172.20.4.10.2940274586 > 192.168.201.44.nfs: 532
getattr fh 0,0/22
09:48:43.157460 IP 192.168.201.44.nfs > 172.20.4.10.con: Flags [.], ack
8485944, win 32885, options [nop,nop,TS val 43795665 ecr 57419706], length
0
09:48:43.157709 IP 192.168.201.44.nfs > 172.20.4.10.con: Flags [.], ack
8504768, win 32885, options [nop,nop,TS val 43795665 ecr 57419706], length
0
09:48:43.157715 IP 192.168.201.44.nfs > 172.20.4.10.con: Flags [.], ack
8526488, win 32885, options [nop,nop,TS val 43795665 ecr 57419706], length
0
09:48:43.158126 IP 192.168.201.44.nfs > 172.20.4.10.con: Flags [.], ack
8543864, win 32885, options [nop,nop,TS val 43795665 ecr 57419706], length
0
09:48:43.158133 IP 192.168.201.44.nfs > 172.20.4.10.con: Flags [.], ack
8558960, win 32883, options [nop,nop,TS val 43795665 ecr 57419706], length
0
09:48:43.158138 IP 192.168.201.44.nfs > 172.20.4.10.2923497370: reply ok
52 getattr ERROR: unk 10024
09:48:43.158143 IP 192.168.201.44.nfs > 172.20.4.10.2940274586: reply ok
52 getattr ERROR: unk 10024
09:48:43.158149 IP 172.20.4.10.con > 192.168.201.44.nfs: Flags [.], ack
9745, win 23907, options [nop,nop,TS val 57419706 ecr 43795665], length 0
09:48:43.158174 IP 172.20.4.10.2957051802 > 192.168.201.44.nfs: 2892
getattr fh 0,0/22
09:48:43.158192 IP 172.20.4.10.con > 192.168.201.44.nfs: Flags [.], seq
8561856:8566200, ack 9745, win 23907, options [nop,nop,TS val 57419706 ecr
43795665], length 4344
09:48:43.158197 IP 172.20.4.10.con > 192.168.201.44.nfs: Flags [.], seq
8566200:8570544, ack 9745, win 23907, options [nop,nop,TS val 57419706 ecr
43795665], length 4344
09:48:43.158204 IP 172.20.4.10.con > 192.168.201.44.nfs: Flags [.], seq
8570544:8574888, ack 9745, win 23907, options [nop,nop,TS val 57419706 ecr
43795665], length 4344
09:48:43.158210 IP 172.20.4.10.con > 192.168.201.44.nfs: Flags [.], seq
8574888:8579232, ack 9745, win 23907, options [nop,nop,TS val 57419706 ecr
43795665], length 4344
09:48:43.158216 IP 172.20.4.10.con > 192.168.201.44.nfs: Flags [.], seq
8579232:8583576, ack 9745, win 23907, options [nop,nop,TS val 57419706 ecr
43795665], length 4344
09:48:43.158222 IP 172.20.4.10.con > 192.168.201.44.nfs: Flags [.], seq
8583576:8586472, ack 9745, win 23907, options [nop,nop,TS val 57419706 ecr
43795665], length 2896
09:48:43.158226 IP 172.20.4.10.con > 192.168.201.44.nfs: Flags [.], seq
8586472:8590816, ack 9745, win 23907, options [nop,nop,TS val 57419706 ecr
43795665], length 4344
09:48:43.158233 IP 172.20.4.10.con > 192.168.201.44.nfs: Flags [.], seq
8590816:8595160, ack 9745, win 23907, options [nop,nop,TS val 57419706 ecr
43795665], length 4344
09:48:43.158239 IP 172.20.4.10.con > 192.168.201.44.nfs: Flags [.], seq
8595160:8599504, ack 9745, win 23907, options [nop,nop,TS val 57419706 ecr
43795665], length 4344

I wonder if this is a variant on:
https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/fs/nfs/nfs4proc.c?h=linux-3.6.y&id=9fa2b82e5592a7aa7a63b7f6a32c5aa9e580643a

which does seem to be in the 3.4 stable branch?

Jocke


2013-05-17 13:18:57

by Joakim Tjernlund

[permalink] [raw]
Subject: Re: NFS loop on 3.4.39

Back on this old problem now with some more input. Upgraded my client to
3.8.13(server is on 3.4.44)
and got the NFS4ERR_OLD_STATEID ping pong effect.
Firefox was locked with process state "Dl" and I got this dmesg:

INFO: task mozStorage #2:6355 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
mozStorage #2 D f5619280 0 6355 1 0x00000000
f446d440 00000086 f446d65c f5619280 f3745300 f5619280 ef0da810 c17e4280
f3745300 c15a897e c17e4280 c17e4280 ef0da810 c17e4280 ef0da810 c17e4280
f3745300 f3745300 c17e4280 c17e4280 ef0da810 c17e4280 f46124f8 f46124f8
Call Trace:
[<c15a897e>] ? __schedule+0x23e/0x660
[<c1088d70>] ? ktime_get_ts+0xf0/0x120
[<c15a907e>] ? io_schedule+0x6e/0xb0
[<c10c35f5>] ? sleep_on_page+0x5/0x10
[<c15a75b5>] ? __wait_on_bit+0x45/0x70
[<c10c35f0>] ? __lock_page+0x80/0x80
[<c10c383d>] ? wait_on_page_bit+0x8d/0xa0
[<c106ec10>] ? autoremove_wake_function+0x40/0x40
[<c10c3a7c>] ? grab_cache_page_write_begin+0x9c/0xe0
[<c106ec10>] ? autoremove_wake_function+0x40/0x40
[<c11aea2b>] ? nfs_write_begin+0x5b/0x1c0
[<c11ae81f>] ? nfs_write_end+0x4f/0x200
[<c10c2e27>] ? generic_file_buffered_write+0xe7/0x230
[<c10c478e>] ? __generic_file_aio_write+0x27e/0x500
[<c15a7fe8>] ? __mutex_lock_slowpath+0x1d8/0x270
[<c10c4a84>] ? generic_file_aio_write+0x74/0xf0
[<c11aecc2>] ? nfs_file_write+0x92/0x1c0
[<c1101a6d>] ? do_sync_write+0xcd/0x110
[<c1052809>] ? kmap_atomic_prot+0xd9/0x100
[<c11021fa>] ? rw_verify_area+0x6a/0x130
[<c11019a0>] ? wait_on_retry_sync_kiocb+0x50/0x50
[<c1102560>] ? vfs_write+0xa0/0x140
[<c11018f3>] ? vfs_llseek+0x33/0x40
[<c1102801>] ? sys_write+0x41/0x80
[<c15aa4d8>] ? sysenter_do_call+0x12/0x28
[<c15a0000>] ? identify_cpu+0xc0/0x36c
INFO: task Gecko_IOThread:10655 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Gecko_IOThread D 00000000 0 10655 6336 0x00000004
f44a3400 00000086 ece3dea0 00000000 00000000 c10cb15b ef6fe030 c17e4280
f30c17c0 ece3de60 c17e4280 c17e4280 ef6fe030 c17e4280 c10cb284 00000002
0000000e 00000000 00000000 ffffffff 00000000 00000000 f4434638 f4434638
Call Trace:
[<c10cb15b>] ? tag_pages_for_writeback+0x5b/0xa0
[<c10cb284>] ? write_cache_pages+0xe4/0x340
[<c1088d70>] ? ktime_get_ts+0xf0/0x120
[<c15a907e>] ? io_schedule+0x6e/0xb0
[<c10c35f5>] ? sleep_on_page+0x5/0x10
[<c15a75b5>] ? __wait_on_bit+0x45/0x70
[<c10c35f0>] ? __lock_page+0x80/0x80
[<c10c383d>] ? wait_on_page_bit+0x8d/0xa0
[<c106ec10>] ? autoremove_wake_function+0x40/0x40
[<c10c3eae>] ? filemap_fdatawait_range+0xde/0x170
[<c10c42cb>] ? filemap_write_and_wait_range+0x7b/0x90
[<c11ae336>] ? nfs_file_fsync+0x36/0xe0
[<c11ae300>] ? nfs_lock+0x150/0x150
[<c1125458>] ? vfs_fsync+0x48/0x60
[<c10fffee>] ? filp_close+0x2e/0x80
[<c1100094>] ? sys_close+0x54/0xa0
[<c15aa4d8>] ? sysenter_do_call+0x12/0x28
INFO: task Gecko_IOThread:10691 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Gecko_IOThread D 00000000 0 10691 6336 0x00000000
f44a3400 00000086 d0247ea0 00000000 00000000 c10cb15b ef317400 c17e4280
f31590c0 d0247e60 c17e4280 c17e4280 ef317400 c17e4280 c10cb284 00000002
0000000e 00000000 00000000 ffffffff 00000000 00000000 f4434ae8 f4434ae8
Call Trace:
[<c10cb15b>] ? tag_pages_for_writeback+0x5b/0xa0
[<c10cb284>] ? write_cache_pages+0xe4/0x340
[<c1088d70>] ? ktime_get_ts+0xf0/0x120
[<c15a907e>] ? io_schedule+0x6e/0xb0
[<c10c35f5>] ? sleep_on_page+0x5/0x10
[<c15a75b5>] ? __wait_on_bit+0x45/0x70
[<c10c35f0>] ? __lock_page+0x80/0x80
[<c10c383d>] ? wait_on_page_bit+0x8d/0xa0
[<c106ec10>] ? autoremove_wake_function+0x40/0x40
[<c10c3eae>] ? filemap_fdatawait_range+0xde/0x170
[<c10c42cb>] ? filemap_write_and_wait_range+0x7b/0x90
[<c11ae336>] ? nfs_file_fsync+0x36/0xe0
[<c11ae300>] ? nfs_lock+0x150/0x150
[<c1125458>] ? vfs_fsync+0x48/0x60
[<c10fffee>] ? filp_close+0x2e/0x80
[<c1100094>] ? sys_close+0x54/0xa0
[<c15aa4d8>] ? sysenter_do_call+0x12/0x28
INFO: task mozStorage #2:6355 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
mozStorage #2 D f5619280 0 6355 1 0x00000004
f446d440 00000086 f446d65c f5619280 f3745300 f5619280 ef0da810 c17e4280
f3745300 c15a897e c17e4280 c17e4280 ef0da810 c17e4280 ef0da810 c17e4280
f3745300 f3745300 c17e4280 c17e4280 ef0da810 c17e4280 f46124f8 f46124f8
Call Trace:
[<c15a897e>] ? __schedule+0x23e/0x660
[<c1088d70>] ? ktime_get_ts+0xf0/0x120
[<c15a907e>] ? io_schedule+0x6e/0xb0
[<c10c35f5>] ? sleep_on_page+0x5/0x10
[<c15a75b5>] ? __wait_on_bit+0x45/0x70
[<c10c35f0>] ? __lock_page+0x80/0x80
[<c10c383d>] ? wait_on_page_bit+0x8d/0xa0
[<c106ec10>] ? autoremove_wake_function+0x40/0x40
[<c10c3a7c>] ? grab_cache_page_write_begin+0x9c/0xe0
[<c106ec10>] ? autoremove_wake_function+0x40/0x40
[<c11aea2b>] ? nfs_write_begin+0x5b/0x1c0
[<c11ae81f>] ? nfs_write_end+0x4f/0x200
[<c10c2e27>] ? generic_file_buffered_write+0xe7/0x230
[<c10c478e>] ? __generic_file_aio_write+0x27e/0x500
[<c15a7fe8>] ? __mutex_lock_slowpath+0x1d8/0x270
[<c10c4a84>] ? generic_file_aio_write+0x74/0xf0
[<c11aecc2>] ? nfs_file_write+0x92/0x1c0
[<c1101a6d>] ? do_sync_write+0xcd/0x110
[<c1052809>] ? kmap_atomic_prot+0xd9/0x100
[<c11021fa>] ? rw_verify_area+0x6a/0x130
[<c11019a0>] ? wait_on_retry_sync_kiocb+0x50/0x50
[<c1102560>] ? vfs_write+0xa0/0x140
[<c11018f3>] ? vfs_llseek+0x33/0x40
[<c1102801>] ? sys_write+0x41/0x80
[<c15aa4d8>] ? sysenter_do_call+0x12/0x28
[<c15a0000>] ? identify_cpu+0xc0/0x36c
INFO: task Gecko_IOThread:10655 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Gecko_IOThread D 00000000 0 10655 6355 0x00000004
f44a3400 00000086 ece3dea0 00000000 00000000 c10cb15b ef6fe030 c17e4280
f30c17c0 ece3de60 c17e4280 c17e4280 ef6fe030 c17e4280 c10cb284 00000002
0000000e 00000000 00000000 ffffffff 00000000 00000000 f4434638 f4434638
Call Trace:
[<c10cb15b>] ? tag_pages_for_writeback+0x5b/0xa0
[<c10cb284>] ? write_cache_pages+0xe4/0x340
[<c1088d70>] ? ktime_get_ts+0xf0/0x120
[<c15a907e>] ? io_schedule+0x6e/0xb0
[<c10c35f5>] ? sleep_on_page+0x5/0x10
[<c15a75b5>] ? __wait_on_bit+0x45/0x70
[<c10c35f0>] ? __lock_page+0x80/0x80
[<c10c383d>] ? wait_on_page_bit+0x8d/0xa0
[<c106ec10>] ? autoremove_wake_function+0x40/0x40
[<c10c3eae>] ? filemap_fdatawait_range+0xde/0x170
[<c10c42cb>] ? filemap_write_and_wait_range+0x7b/0x90
[<c11ae336>] ? nfs_file_fsync+0x36/0xe0
[<c11ae300>] ? nfs_lock+0x150/0x150
[<c1125458>] ? vfs_fsync+0x48/0x60
[<c10fffee>] ? filp_close+0x2e/0x80
[<c1100094>] ? sys_close+0x54/0xa0
[<c15aa4d8>] ? sysenter_do_call+0x12/0x28
INFO: task Gecko_IOThread:10691 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Gecko_IOThread D 00000000 0 10691 6355 0x00000000
f44a3400 00000086 d0247ea0 00000000 00000000 c10cb15b ef317400 c17e4280
f31590c0 d0247e60 c17e4280 c17e4280 ef317400 c17e4280 c10cb284 00000002
0000000e 00000000 00000000 ffffffff 00000000 00000000 f4434ae8 f4434ae8
Call Trace:
[<c10cb15b>] ? tag_pages_for_writeback+0x5b/0xa0
[<c10cb284>] ? write_cache_pages+0xe4/0x340
[<c1088d70>] ? ktime_get_ts+0xf0/0x120
[<c15a907e>] ? io_schedule+0x6e/0xb0
[<c10c35f5>] ? sleep_on_page+0x5/0x10
[<c15a75b5>] ? __wait_on_bit+0x45/0x70
[<c10c35f0>] ? __lock_page+0x80/0x80
[<c10c383d>] ? wait_on_page_bit+0x8d/0xa0
[<c106ec10>] ? autoremove_wake_function+0x40/0x40
[<c10c3eae>] ? filemap_fdatawait_range+0xde/0x170
[<c10c42cb>] ? filemap_write_and_wait_range+0x7b/0x90
[<c11ae336>] ? nfs_file_fsync+0x36/0xe0
[<c11ae300>] ? nfs_lock+0x150/0x150
[<c1125458>] ? vfs_fsync+0x48/0x60
[<c10fffee>] ? filp_close+0x2e/0x80
[<c1100094>] ? sys_close+0x54/0xa0
[<c15aa4d8>] ? sysenter_do_call+0x12/0x28
INFO: task mozStorage #2:6355 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
mozStorage #2 D f5619280 0 6355 1 0x00000004
f446d440 00000086 f446d65c f5619280 f3745300 f5619280 ef0da810 c17e4280
f3745300 c15a897e c17e4280 c17e4280 ef0da810 c17e4280 ef0da810 c17e4280
f3745300 f3745300 c17e4280 c17e4280 ef0da810 c17e4280 f46124f8 f46124f8
Call Trace:
[<c15a897e>] ? __schedule+0x23e/0x660
[<c1088d70>] ? ktime_get_ts+0xf0/0x120
[<c15a907e>] ? io_schedule+0x6e/0xb0
[<c10c35f5>] ? sleep_on_page+0x5/0x10
[<c15a75b5>] ? __wait_on_bit+0x45/0x70
[<c10c35f0>] ? __lock_page+0x80/0x80
[<c10c383d>] ? wait_on_page_bit+0x8d/0xa0
[<c106ec10>] ? autoremove_wake_function+0x40/0x40
[<c10c3a7c>] ? grab_cache_page_write_begin+0x9c/0xe0
[<c106ec10>] ? autoremove_wake_function+0x40/0x40
[<c11aea2b>] ? nfs_write_begin+0x5b/0x1c0
[<c11ae81f>] ? nfs_write_end+0x4f/0x200
[<c10c2e27>] ? generic_file_buffered_write+0xe7/0x230
[<c10c478e>] ? __generic_file_aio_write+0x27e/0x500
[<c15a7fe8>] ? __mutex_lock_slowpath+0x1d8/0x270
[<c10c4a84>] ? generic_file_aio_write+0x74/0xf0
[<c11aecc2>] ? nfs_file_write+0x92/0x1c0
[<c1101a6d>] ? do_sync_write+0xcd/0x110
[<c1052809>] ? kmap_atomic_prot+0xd9/0x100
[<c11021fa>] ? rw_verify_area+0x6a/0x130
[<c11019a0>] ? wait_on_retry_sync_kiocb+0x50/0x50
[<c1102560>] ? vfs_write+0xa0/0x140
[<c11018f3>] ? vfs_llseek+0x33/0x40
[<c1102801>] ? sys_write+0x41/0x80
[<c15aa4d8>] ? sysenter_do_call+0x12/0x28
[<c15a0000>] ? identify_cpu+0xc0/0x36c
INFO: task Gecko_IOThread:10655 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Gecko_IOThread D 00000000 0 10655 6355 0x00000004
f44a3400 00000086 ece3dea0 00000000 00000000 c10cb15b ef6fe030 c17e4280
f30c17c0 ece3de60 c17e4280 c17e4280 ef6fe030 c17e4280 c10cb284 00000002
0000000e 00000000 00000000 ffffffff 00000000 00000000 f4434638 f4434638
Call Trace:
[<c10cb15b>] ? tag_pages_for_writeback+0x5b/0xa0
[<c10cb284>] ? write_cache_pages+0xe4/0x340
[<c1088d70>] ? ktime_get_ts+0xf0/0x120
[<c15a907e>] ? io_schedule+0x6e/0xb0
[<c10c35f5>] ? sleep_on_page+0x5/0x10
[<c15a75b5>] ? __wait_on_bit+0x45/0x70
[<c10c35f0>] ? __lock_page+0x80/0x80
[<c10c383d>] ? wait_on_page_bit+0x8d/0xa0
[<c106ec10>] ? autoremove_wake_function+0x40/0x40
[<c10c3eae>] ? filemap_fdatawait_range+0xde/0x170
[<c10c42cb>] ? filemap_write_and_wait_range+0x7b/0x90
[<c11ae336>] ? nfs_file_fsync+0x36/0xe0
[<c11ae300>] ? nfs_lock+0x150/0x150
[<c1125458>] ? vfs_fsync+0x48/0x60
[<c10fffee>] ? filp_close+0x2e/0x80
[<c1100094>] ? sys_close+0x54/0xa0
[<c15aa4d8>] ? sysenter_do_call+0x12/0x28
INFO: task Gecko_IOThread:10691 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Gecko_IOThread D 00000000 0 10691 6355 0x00000000
f44a3400 00000086 d0247ea0 00000000 00000000 c10cb15b ef317400 c17e4280
f31590c0 d0247e60 c17e4280 c17e4280 ef317400 c17e4280 c10cb284 00000002
0000000e 00000000 00000000 ffffffff 00000000 00000000 f4434ae8 f4434ae8
Call Trace:
[<c10cb15b>] ? tag_pages_for_writeback+0x5b/0xa0
[<c10cb284>] ? write_cache_pages+0xe4/0x340
[<c1088d70>] ? ktime_get_ts+0xf0/0x120
[<c15a907e>] ? io_schedule+0x6e/0xb0
[<c10c35f5>] ? sleep_on_page+0x5/0x10
[<c15a75b5>] ? __wait_on_bit+0x45/0x70
[<c10c35f0>] ? __lock_page+0x80/0x80
[<c10c383d>] ? wait_on_page_bit+0x8d/0xa0
[<c106ec10>] ? autoremove_wake_function+0x40/0x40
[<c10c3eae>] ? filemap_fdatawait_range+0xde/0x170
[<c10c42cb>] ? filemap_write_and_wait_range+0x7b/0x90
[<c11ae336>] ? nfs_file_fsync+0x36/0xe0
[<c11ae300>] ? nfs_lock+0x150/0x150
[<c1125458>] ? vfs_fsync+0x48/0x60
[<c10fffee>] ? filp_close+0x2e/0x80
[<c1100094>] ? sys_close+0x54/0xa0
[<c15aa4d8>] ? sysenter_do_call+0x12/0x28
nfsd: last server has exited, flushing export cache
NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory
NFSD: starting 90-second grace period
device eth0 entered promiscuous mode
device eth0 left promiscuous mode
INFO: task mozStorage #2:6355 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
mozStorage #2 D f5619280 0 6355 1 0x00000004
f446d440 00000086 f446d65c f5619280 f3745300 f5619280 ef0da810 c17e4280
f3745300 c15a897e c17e4280 c17e4280 ef0da810 c17e4280 ef0da810 c17e4280
f3745300 f3745300 c17e4280 c17e4280 ef0da810 c17e4280 f46124f8 f46124f8
Call Trace:
[<c15a897e>] ? __schedule+0x23e/0x660
[<c1088d70>] ? ktime_get_ts+0xf0/0x120
[<c15a907e>] ? io_schedule+0x6e/0xb0
[<c10c35f5>] ? sleep_on_page+0x5/0x10
[<c15a75b5>] ? __wait_on_bit+0x45/0x70
[<c10c35f0>] ? __lock_page+0x80/0x80
[<c10c383d>] ? wait_on_page_bit+0x8d/0xa0
[<c106ec10>] ? autoremove_wake_function+0x40/0x40
[<c10c3a7c>] ? grab_cache_page_write_begin+0x9c/0xe0
[<c106ec10>] ? autoremove_wake_function+0x40/0x40
[<c11aea2b>] ? nfs_write_begin+0x5b/0x1c0
[<c11ae81f>] ? nfs_write_end+0x4f/0x200
[<c10c2e27>] ? generic_file_buffered_write+0xe7/0x230
[<c10c478e>] ? __generic_file_aio_write+0x27e/0x500
[<c15a7fe8>] ? __mutex_lock_slowpath+0x1d8/0x270
[<c10c4a84>] ? generic_file_aio_write+0x74/0xf0
[<c11aecc2>] ? nfs_file_write+0x92/0x1c0
[<c1101a6d>] ? do_sync_write+0xcd/0x110
[<c1052809>] ? kmap_atomic_prot+0xd9/0x100
[<c11021fa>] ? rw_verify_area+0x6a/0x130
[<c11019a0>] ? wait_on_retry_sync_kiocb+0x50/0x50
[<c1102560>] ? vfs_write+0xa0/0x140
[<c11018f3>] ? vfs_llseek+0x33/0x40
[<c1102801>] ? sys_write+0x41/0x80
[<c15aa4d8>] ? sysenter_do_call+0x12/0x28
[<c15a0000>] ? identify_cpu+0xc0/0x36c
device eth0 entered promiscuous mode
device eth0 left promiscuous mode
device eth0 entered promiscuous mode
device eth0 left promiscuous mode

Joakim Tjernlund/Transmode wrote on 2013/04/25 18:12:29:
>
> "Myklebust, Trond" <[email protected]> wrote on 2013/04/25
17:59:01:
> >
> > On Thu, 2013-04-25 at 17:31 +0200, Joakim Tjernlund wrote:
> > > Joakim Tjernlund/Transmode wrote on 2013/04/24 15:16:26:
> > > >
> > > > "Myklebust, Trond" <[email protected]> wrote on
2013/04/23
> > > 16:18:07:
> > > > >
> > > > > On Tue, 2013-04-23 at 16:14 +0200, Joakim Tjernlund wrote:
> > > > > > "Myklebust, Trond" <[email protected]> wrote on
2013/04/23
> > > > > > 15:52:06:
> > > > > > >
> > > > > > > On Tue, 2013-04-23 at 15:38 +0200, Joakim Tjernlund wrote:
> > > > > > > > So, it happened again. Just when hitting search on
> > > bugs.gentoo.org in
> > > > > > > > firefox 17.0.3
> > > > > > > >
> > > > > > > > This time I got a NFS loop with NFS4ERR_BAD_STATEID
looping over
> > > and
> > > > > > over
> > > > > > > > again and FF was hung. Not posting the logs as it does not

> > > appear to
> > > > > > > > do any good. Nothing in dmesg either.
> > > > > > > >
> > > > > > > > Noticed this patch on the NFS list:
> > > > > > > > http://marc.info/?l=linux-nfs&m=136643651710066&w=2
> > > > > > > > I wonder if that could be a potential cure and if so,
could it
> > > be
> > > > > > > > backported to 3.4?
> > > > > > >
> > > > > > > It is in the testing branch on
> > > > > > >
> > > > > > >
http://git.linux-nfs.org/?p=trondmy/linux-nfs.git;a=summary
> > > > > > >
> > > > > > > if you want to try it out. I'm not planning on backporting
> > > anything that
> > > > > > > hasn't been labelled with a Cc: stable in that branch.
> > > > > >
> > > > > > Well, we won't use tip of linus tree in production so there is
> > > > > > little point to use your testing branch. However it looks like
a
> > > trivial
> > > > > > backport so I can test it on my client easily.
> > > > >
> > > > > The point of testing would not be to discover if you can use
Linus'
> > > tree
> > > > > in production, but rather to see if the problem is already fixed
> > > > > upstream. If it is, we can bisect to figure out which patch is
the
> > > fix.
> > > > >
> > > > > > Even the NFS server if required, is the above referenced patch
for
> > > > > > NFS client/server or both? Any chance this is the culprit?
> > > > >
> > > > > That's a client patch.
> > >
> > > > Tried 3.4.41+above nfs patch and also 3.8.8, they both have the
> > > > NFS loop problem.
> > > >
> > > > Now I am at your
> > > http://git.linux-nfs.org/?p=trondmy/linux-nfs.git;a=summary,
> > > > testing branch
> > > > With any luck the error will show soon.
> > > >
> > > > Question though the loop I see, could it be a NFS server bug ?
> > > > If so it does matter what I do on my client I guess.
> > >
> > > Ran http://git.linux-nfs.org/?p=trondmy/linux-nfs.git;a=summary,
testing
> > > branch
> > > for a day without problem.
> > >
> > > Then I backed to 3.4.41 +
> > > http://marc.info/?l=linux-nfs&m=136643651710066&w=2 +
> > > http://marc.info/?l=linux-nfs&m=136674349127504&w=2
> > > this morning, been using all day without problem. It is a good start
> > > but not conclusive yet.
> > >
> > > Is http://marc.info/?l=linux-nfs&m=136674349127504&w=2 supposed to
> > > fix my type of problem?
> >
> > No. That's a follow up patch to commit
> > 92b40e93849e29f9ca661de6442bb66282738bf7 (NFSv4: Use the open stateid
if
> > the delegation has the wrong mode).

> hmm, that commit is the first one I listed,
http://marc.info/?l=linux-nfs&m=136643651710066&w=2
> and I know that using only that one does NOT fix the problem. I was
hoping that both of them could
> be the answer?
>
> Jocke