I get som NFS loop which generates lots of (from tcpdump):
9:48:43.156252 IP 192.168.201.44.nfs > 172.20.4.10.con: Flags [.], ack
8406616, win 32885, options [nop,nop,TS val 43795665 ecr 57419706], length
0
09:48:43.156258 IP 192.168.201.44.nfs > 172.20.4.10.con: Flags [.], ack
8426888, win 32885, options [nop,nop,TS val 43795665 ecr 57419706], length
0
09:48:43.156669 IP 192.168.201.44.nfs > 172.20.4.10.con: Flags [.], ack
8445712, win 32885, options [nop,nop,TS val 43795665 ecr 57419706], length
0
09:48:43.156675 IP 192.168.201.44.nfs > 172.20.4.10.con: Flags [.], ack
8459880, win 32883, options [nop,nop,TS val 43795665 ecr 57419706], length
0
09:48:43.156679 IP 192.168.201.44.nfs > 172.20.4.10.2889942938: reply ok
52 getattr ERROR: unk 10024
09:48:43.156704 IP 172.20.4.10.2923497370 > 192.168.201.44.nfs: 2892
getattr fh 0,0/22
09:48:43.156712 IP 172.20.4.10.con > 192.168.201.44.nfs: Flags [.], seq
8462776:8467120, ack 9577, win 23907, options [nop,nop,TS val 57419706 ecr
43795665], length 4344
09:48:43.156719 IP 172.20.4.10.con > 192.168.201.44.nfs: Flags [.], seq
8467120:8471464, ack 9577, win 23907, options [nop,nop,TS val 57419706 ecr
43795665], length 4344
09:48:43.156725 IP 172.20.4.10.con > 192.168.201.44.nfs: Flags [.], seq
8471464:8475808, ack 9577, win 23907, options [nop,nop,TS val 57419706 ecr
43795665], length 4344
09:48:43.156731 IP 172.20.4.10.con > 192.168.201.44.nfs: Flags [.], seq
8475808:8480152, ack 9577, win 23907, options [nop,nop,TS val 57419706 ecr
43795665], length 4344
09:48:43.156745 IP 172.20.4.10.con > 192.168.201.44.nfs: Flags [.], seq
8480152:8484496, ack 9577, win 23907, options [nop,nop,TS val 57419706 ecr
43795665], length 4344
09:48:43.156748 IP 172.20.4.10.con > 192.168.201.44.nfs: Flags [.], seq
8484496:8487392, ack 9577, win 23907, options [nop,nop,TS val 57419706 ecr
43795665], length 2896
09:48:43.156751 IP 172.20.4.10.con > 192.168.201.44.nfs: Flags [.], seq
8487392:8491736, ack 9577, win 23907, options [nop,nop,TS val 57419706 ecr
43795665], length 4344
09:48:43.156755 IP 172.20.4.10.con > 192.168.201.44.nfs: Flags [.], seq
8491736:8496080, ack 9577, win 23907, options [nop,nop,TS val 57419706 ecr
43795665], length 4344
09:48:43.156758 IP 172.20.4.10.con > 192.168.201.44.nfs: Flags [.], seq
8496080:8500424, ack 9577, win 23907, options [nop,nop,TS val 57419706 ecr
43795665], length 4344
09:48:43.156764 IP 172.20.4.10.con > 192.168.201.44.nfs: Flags [.], seq
8500424:8504768, ack 9577, win 23907, options [nop,nop,TS val 57419706 ecr
43795665], length 4344
09:48:43.156767 IP 172.20.4.10.con > 192.168.201.44.nfs: Flags [.], seq
8504768:8509112, ack 9577, win 23907, options [nop,nop,TS val 57419706 ecr
43795665], length 4344
09:48:43.156770 IP 172.20.4.10.con > 192.168.201.44.nfs: Flags [.], seq
8509112:8512008, ack 9577, win 23907, options [nop,nop,TS val 57419706 ecr
43795665], length 2896
09:48:43.156773 IP 172.20.4.10.con > 192.168.201.44.nfs: Flags [.], seq
8512008:8516352, ack 9577, win 23907, options [nop,nop,TS val 57419706 ecr
43795665], length 4344
09:48:43.156778 IP 172.20.4.10.con > 192.168.201.44.nfs: Flags [.], seq
8516352:8520696, ack 9577, win 23907, options [nop,nop,TS val 57419706 ecr
43795665], length 4344
09:48:43.156956 IP 192.168.201.44.nfs > 172.20.4.10.2906720154: reply ok
52 getattr ERROR: unk 10024
09:48:43.156969 IP 172.20.4.10.con > 192.168.201.44.nfs: Flags [.], seq
8520696:8525040, ack 9577, win 23907, options [nop,nop,TS val 57419706 ecr
43795665], length 4344
09:48:43.156975 IP 172.20.4.10.con > 192.168.201.44.nfs: Flags [.], seq
8525040:8529384, ack 9577, win 23907, options [nop,nop,TS val 57419706 ecr
43795665], length 4344
09:48:43.156980 IP 172.20.4.10.con > 192.168.201.44.nfs: Flags [.], seq
8529384:8533728, ack 9577, win 23907, options [nop,nop,TS val 57419706 ecr
43795665], length 4344
09:48:43.156984 IP 172.20.4.10.con > 192.168.201.44.nfs: Flags [.], seq
8533728:8536624, ack 9577, win 23907, options [nop,nop,TS val 57419706 ecr
43795665], length 2896
09:48:43.156988 IP 172.20.4.10.con > 192.168.201.44.nfs: Flags [.], seq
8536624:8540968, ack 9577, win 23907, options [nop,nop,TS val 57419706 ecr
43795665], length 4344
09:48:43.156994 IP 172.20.4.10.con > 192.168.201.44.nfs: Flags [.], seq
8540968:8545312, ack 9577, win 23907, options [nop,nop,TS val 57419706 ecr
43795665], length 4344
09:48:43.156999 IP 172.20.4.10.con > 192.168.201.44.nfs: Flags [.], seq
8545312:8549656, ack 9577, win 23907, options [nop,nop,TS val 57419706 ecr
43795665], length 4344
09:48:43.157209 IP 192.168.201.44.nfs > 172.20.4.10.con: Flags [.], ack
8464224, win 32872, options [nop,nop,TS val 43795665 ecr 57419706], length
0
09:48:43.157219 IP 172.20.4.10.con > 192.168.201.44.nfs: Flags [.], seq
8549656:8554000, ack 9577, win 23907, options [nop,nop,TS val 57419706 ecr
43795665], length 4344
09:48:43.157225 IP 172.20.4.10.con > 192.168.201.44.nfs: Flags [.], seq
8554000:8558344, ack 9577, win 23907, options [nop,nop,TS val 57419706 ecr
43795665], length 4344
09:48:43.157230 IP 172.20.4.10.con > 192.168.201.44.nfs: Flags [P.], seq
8558344:8558424, ack 9577, win 23907, options [nop,nop,TS val 57419706 ecr
43795665], length 80
09:48:43.157235 IP 172.20.4.10.2940274586 > 192.168.201.44.nfs: 532
getattr fh 0,0/22
09:48:43.157460 IP 192.168.201.44.nfs > 172.20.4.10.con: Flags [.], ack
8485944, win 32885, options [nop,nop,TS val 43795665 ecr 57419706], length
0
09:48:43.157709 IP 192.168.201.44.nfs > 172.20.4.10.con: Flags [.], ack
8504768, win 32885, options [nop,nop,TS val 43795665 ecr 57419706], length
0
09:48:43.157715 IP 192.168.201.44.nfs > 172.20.4.10.con: Flags [.], ack
8526488, win 32885, options [nop,nop,TS val 43795665 ecr 57419706], length
0
09:48:43.158126 IP 192.168.201.44.nfs > 172.20.4.10.con: Flags [.], ack
8543864, win 32885, options [nop,nop,TS val 43795665 ecr 57419706], length
0
09:48:43.158133 IP 192.168.201.44.nfs > 172.20.4.10.con: Flags [.], ack
8558960, win 32883, options [nop,nop,TS val 43795665 ecr 57419706], length
0
09:48:43.158138 IP 192.168.201.44.nfs > 172.20.4.10.2923497370: reply ok
52 getattr ERROR: unk 10024
09:48:43.158143 IP 192.168.201.44.nfs > 172.20.4.10.2940274586: reply ok
52 getattr ERROR: unk 10024
09:48:43.158149 IP 172.20.4.10.con > 192.168.201.44.nfs: Flags [.], ack
9745, win 23907, options [nop,nop,TS val 57419706 ecr 43795665], length 0
09:48:43.158174 IP 172.20.4.10.2957051802 > 192.168.201.44.nfs: 2892
getattr fh 0,0/22
09:48:43.158192 IP 172.20.4.10.con > 192.168.201.44.nfs: Flags [.], seq
8561856:8566200, ack 9745, win 23907, options [nop,nop,TS val 57419706 ecr
43795665], length 4344
09:48:43.158197 IP 172.20.4.10.con > 192.168.201.44.nfs: Flags [.], seq
8566200:8570544, ack 9745, win 23907, options [nop,nop,TS val 57419706 ecr
43795665], length 4344
09:48:43.158204 IP 172.20.4.10.con > 192.168.201.44.nfs: Flags [.], seq
8570544:8574888, ack 9745, win 23907, options [nop,nop,TS val 57419706 ecr
43795665], length 4344
09:48:43.158210 IP 172.20.4.10.con > 192.168.201.44.nfs: Flags [.], seq
8574888:8579232, ack 9745, win 23907, options [nop,nop,TS val 57419706 ecr
43795665], length 4344
09:48:43.158216 IP 172.20.4.10.con > 192.168.201.44.nfs: Flags [.], seq
8579232:8583576, ack 9745, win 23907, options [nop,nop,TS val 57419706 ecr
43795665], length 4344
09:48:43.158222 IP 172.20.4.10.con > 192.168.201.44.nfs: Flags [.], seq
8583576:8586472, ack 9745, win 23907, options [nop,nop,TS val 57419706 ecr
43795665], length 2896
09:48:43.158226 IP 172.20.4.10.con > 192.168.201.44.nfs: Flags [.], seq
8586472:8590816, ack 9745, win 23907, options [nop,nop,TS val 57419706 ecr
43795665], length 4344
09:48:43.158233 IP 172.20.4.10.con > 192.168.201.44.nfs: Flags [.], seq
8590816:8595160, ack 9745, win 23907, options [nop,nop,TS val 57419706 ecr
43795665], length 4344
09:48:43.158239 IP 172.20.4.10.con > 192.168.201.44.nfs: Flags [.], seq
8595160:8599504, ack 9745, win 23907, options [nop,nop,TS val 57419706 ecr
43795665], length 4344
I wonder if this is a variant on:
https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/fs/nfs/nfs4proc.c?h=linux-3.6.y&id=9fa2b82e5592a7aa7a63b7f6a32c5aa9e580643a
which does seem to be in the 3.4 stable branch?
Jocke
Back on this old problem now with some more input. Upgraded my client to
3.8.13(server is on 3.4.44)
and got the NFS4ERR_OLD_STATEID ping pong effect.
Firefox was locked with process state "Dl" and I got this dmesg:
INFO: task mozStorage #2:6355 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
mozStorage #2 D f5619280 0 6355 1 0x00000000
f446d440 00000086 f446d65c f5619280 f3745300 f5619280 ef0da810 c17e4280
f3745300 c15a897e c17e4280 c17e4280 ef0da810 c17e4280 ef0da810 c17e4280
f3745300 f3745300 c17e4280 c17e4280 ef0da810 c17e4280 f46124f8 f46124f8
Call Trace:
[<c15a897e>] ? __schedule+0x23e/0x660
[<c1088d70>] ? ktime_get_ts+0xf0/0x120
[<c15a907e>] ? io_schedule+0x6e/0xb0
[<c10c35f5>] ? sleep_on_page+0x5/0x10
[<c15a75b5>] ? __wait_on_bit+0x45/0x70
[<c10c35f0>] ? __lock_page+0x80/0x80
[<c10c383d>] ? wait_on_page_bit+0x8d/0xa0
[<c106ec10>] ? autoremove_wake_function+0x40/0x40
[<c10c3a7c>] ? grab_cache_page_write_begin+0x9c/0xe0
[<c106ec10>] ? autoremove_wake_function+0x40/0x40
[<c11aea2b>] ? nfs_write_begin+0x5b/0x1c0
[<c11ae81f>] ? nfs_write_end+0x4f/0x200
[<c10c2e27>] ? generic_file_buffered_write+0xe7/0x230
[<c10c478e>] ? __generic_file_aio_write+0x27e/0x500
[<c15a7fe8>] ? __mutex_lock_slowpath+0x1d8/0x270
[<c10c4a84>] ? generic_file_aio_write+0x74/0xf0
[<c11aecc2>] ? nfs_file_write+0x92/0x1c0
[<c1101a6d>] ? do_sync_write+0xcd/0x110
[<c1052809>] ? kmap_atomic_prot+0xd9/0x100
[<c11021fa>] ? rw_verify_area+0x6a/0x130
[<c11019a0>] ? wait_on_retry_sync_kiocb+0x50/0x50
[<c1102560>] ? vfs_write+0xa0/0x140
[<c11018f3>] ? vfs_llseek+0x33/0x40
[<c1102801>] ? sys_write+0x41/0x80
[<c15aa4d8>] ? sysenter_do_call+0x12/0x28
[<c15a0000>] ? identify_cpu+0xc0/0x36c
INFO: task Gecko_IOThread:10655 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Gecko_IOThread D 00000000 0 10655 6336 0x00000004
f44a3400 00000086 ece3dea0 00000000 00000000 c10cb15b ef6fe030 c17e4280
f30c17c0 ece3de60 c17e4280 c17e4280 ef6fe030 c17e4280 c10cb284 00000002
0000000e 00000000 00000000 ffffffff 00000000 00000000 f4434638 f4434638
Call Trace:
[<c10cb15b>] ? tag_pages_for_writeback+0x5b/0xa0
[<c10cb284>] ? write_cache_pages+0xe4/0x340
[<c1088d70>] ? ktime_get_ts+0xf0/0x120
[<c15a907e>] ? io_schedule+0x6e/0xb0
[<c10c35f5>] ? sleep_on_page+0x5/0x10
[<c15a75b5>] ? __wait_on_bit+0x45/0x70
[<c10c35f0>] ? __lock_page+0x80/0x80
[<c10c383d>] ? wait_on_page_bit+0x8d/0xa0
[<c106ec10>] ? autoremove_wake_function+0x40/0x40
[<c10c3eae>] ? filemap_fdatawait_range+0xde/0x170
[<c10c42cb>] ? filemap_write_and_wait_range+0x7b/0x90
[<c11ae336>] ? nfs_file_fsync+0x36/0xe0
[<c11ae300>] ? nfs_lock+0x150/0x150
[<c1125458>] ? vfs_fsync+0x48/0x60
[<c10fffee>] ? filp_close+0x2e/0x80
[<c1100094>] ? sys_close+0x54/0xa0
[<c15aa4d8>] ? sysenter_do_call+0x12/0x28
INFO: task Gecko_IOThread:10691 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Gecko_IOThread D 00000000 0 10691 6336 0x00000000
f44a3400 00000086 d0247ea0 00000000 00000000 c10cb15b ef317400 c17e4280
f31590c0 d0247e60 c17e4280 c17e4280 ef317400 c17e4280 c10cb284 00000002
0000000e 00000000 00000000 ffffffff 00000000 00000000 f4434ae8 f4434ae8
Call Trace:
[<c10cb15b>] ? tag_pages_for_writeback+0x5b/0xa0
[<c10cb284>] ? write_cache_pages+0xe4/0x340
[<c1088d70>] ? ktime_get_ts+0xf0/0x120
[<c15a907e>] ? io_schedule+0x6e/0xb0
[<c10c35f5>] ? sleep_on_page+0x5/0x10
[<c15a75b5>] ? __wait_on_bit+0x45/0x70
[<c10c35f0>] ? __lock_page+0x80/0x80
[<c10c383d>] ? wait_on_page_bit+0x8d/0xa0
[<c106ec10>] ? autoremove_wake_function+0x40/0x40
[<c10c3eae>] ? filemap_fdatawait_range+0xde/0x170
[<c10c42cb>] ? filemap_write_and_wait_range+0x7b/0x90
[<c11ae336>] ? nfs_file_fsync+0x36/0xe0
[<c11ae300>] ? nfs_lock+0x150/0x150
[<c1125458>] ? vfs_fsync+0x48/0x60
[<c10fffee>] ? filp_close+0x2e/0x80
[<c1100094>] ? sys_close+0x54/0xa0
[<c15aa4d8>] ? sysenter_do_call+0x12/0x28
INFO: task mozStorage #2:6355 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
mozStorage #2 D f5619280 0 6355 1 0x00000004
f446d440 00000086 f446d65c f5619280 f3745300 f5619280 ef0da810 c17e4280
f3745300 c15a897e c17e4280 c17e4280 ef0da810 c17e4280 ef0da810 c17e4280
f3745300 f3745300 c17e4280 c17e4280 ef0da810 c17e4280 f46124f8 f46124f8
Call Trace:
[<c15a897e>] ? __schedule+0x23e/0x660
[<c1088d70>] ? ktime_get_ts+0xf0/0x120
[<c15a907e>] ? io_schedule+0x6e/0xb0
[<c10c35f5>] ? sleep_on_page+0x5/0x10
[<c15a75b5>] ? __wait_on_bit+0x45/0x70
[<c10c35f0>] ? __lock_page+0x80/0x80
[<c10c383d>] ? wait_on_page_bit+0x8d/0xa0
[<c106ec10>] ? autoremove_wake_function+0x40/0x40
[<c10c3a7c>] ? grab_cache_page_write_begin+0x9c/0xe0
[<c106ec10>] ? autoremove_wake_function+0x40/0x40
[<c11aea2b>] ? nfs_write_begin+0x5b/0x1c0
[<c11ae81f>] ? nfs_write_end+0x4f/0x200
[<c10c2e27>] ? generic_file_buffered_write+0xe7/0x230
[<c10c478e>] ? __generic_file_aio_write+0x27e/0x500
[<c15a7fe8>] ? __mutex_lock_slowpath+0x1d8/0x270
[<c10c4a84>] ? generic_file_aio_write+0x74/0xf0
[<c11aecc2>] ? nfs_file_write+0x92/0x1c0
[<c1101a6d>] ? do_sync_write+0xcd/0x110
[<c1052809>] ? kmap_atomic_prot+0xd9/0x100
[<c11021fa>] ? rw_verify_area+0x6a/0x130
[<c11019a0>] ? wait_on_retry_sync_kiocb+0x50/0x50
[<c1102560>] ? vfs_write+0xa0/0x140
[<c11018f3>] ? vfs_llseek+0x33/0x40
[<c1102801>] ? sys_write+0x41/0x80
[<c15aa4d8>] ? sysenter_do_call+0x12/0x28
[<c15a0000>] ? identify_cpu+0xc0/0x36c
INFO: task Gecko_IOThread:10655 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Gecko_IOThread D 00000000 0 10655 6355 0x00000004
f44a3400 00000086 ece3dea0 00000000 00000000 c10cb15b ef6fe030 c17e4280
f30c17c0 ece3de60 c17e4280 c17e4280 ef6fe030 c17e4280 c10cb284 00000002
0000000e 00000000 00000000 ffffffff 00000000 00000000 f4434638 f4434638
Call Trace:
[<c10cb15b>] ? tag_pages_for_writeback+0x5b/0xa0
[<c10cb284>] ? write_cache_pages+0xe4/0x340
[<c1088d70>] ? ktime_get_ts+0xf0/0x120
[<c15a907e>] ? io_schedule+0x6e/0xb0
[<c10c35f5>] ? sleep_on_page+0x5/0x10
[<c15a75b5>] ? __wait_on_bit+0x45/0x70
[<c10c35f0>] ? __lock_page+0x80/0x80
[<c10c383d>] ? wait_on_page_bit+0x8d/0xa0
[<c106ec10>] ? autoremove_wake_function+0x40/0x40
[<c10c3eae>] ? filemap_fdatawait_range+0xde/0x170
[<c10c42cb>] ? filemap_write_and_wait_range+0x7b/0x90
[<c11ae336>] ? nfs_file_fsync+0x36/0xe0
[<c11ae300>] ? nfs_lock+0x150/0x150
[<c1125458>] ? vfs_fsync+0x48/0x60
[<c10fffee>] ? filp_close+0x2e/0x80
[<c1100094>] ? sys_close+0x54/0xa0
[<c15aa4d8>] ? sysenter_do_call+0x12/0x28
INFO: task Gecko_IOThread:10691 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Gecko_IOThread D 00000000 0 10691 6355 0x00000000
f44a3400 00000086 d0247ea0 00000000 00000000 c10cb15b ef317400 c17e4280
f31590c0 d0247e60 c17e4280 c17e4280 ef317400 c17e4280 c10cb284 00000002
0000000e 00000000 00000000 ffffffff 00000000 00000000 f4434ae8 f4434ae8
Call Trace:
[<c10cb15b>] ? tag_pages_for_writeback+0x5b/0xa0
[<c10cb284>] ? write_cache_pages+0xe4/0x340
[<c1088d70>] ? ktime_get_ts+0xf0/0x120
[<c15a907e>] ? io_schedule+0x6e/0xb0
[<c10c35f5>] ? sleep_on_page+0x5/0x10
[<c15a75b5>] ? __wait_on_bit+0x45/0x70
[<c10c35f0>] ? __lock_page+0x80/0x80
[<c10c383d>] ? wait_on_page_bit+0x8d/0xa0
[<c106ec10>] ? autoremove_wake_function+0x40/0x40
[<c10c3eae>] ? filemap_fdatawait_range+0xde/0x170
[<c10c42cb>] ? filemap_write_and_wait_range+0x7b/0x90
[<c11ae336>] ? nfs_file_fsync+0x36/0xe0
[<c11ae300>] ? nfs_lock+0x150/0x150
[<c1125458>] ? vfs_fsync+0x48/0x60
[<c10fffee>] ? filp_close+0x2e/0x80
[<c1100094>] ? sys_close+0x54/0xa0
[<c15aa4d8>] ? sysenter_do_call+0x12/0x28
INFO: task mozStorage #2:6355 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
mozStorage #2 D f5619280 0 6355 1 0x00000004
f446d440 00000086 f446d65c f5619280 f3745300 f5619280 ef0da810 c17e4280
f3745300 c15a897e c17e4280 c17e4280 ef0da810 c17e4280 ef0da810 c17e4280
f3745300 f3745300 c17e4280 c17e4280 ef0da810 c17e4280 f46124f8 f46124f8
Call Trace:
[<c15a897e>] ? __schedule+0x23e/0x660
[<c1088d70>] ? ktime_get_ts+0xf0/0x120
[<c15a907e>] ? io_schedule+0x6e/0xb0
[<c10c35f5>] ? sleep_on_page+0x5/0x10
[<c15a75b5>] ? __wait_on_bit+0x45/0x70
[<c10c35f0>] ? __lock_page+0x80/0x80
[<c10c383d>] ? wait_on_page_bit+0x8d/0xa0
[<c106ec10>] ? autoremove_wake_function+0x40/0x40
[<c10c3a7c>] ? grab_cache_page_write_begin+0x9c/0xe0
[<c106ec10>] ? autoremove_wake_function+0x40/0x40
[<c11aea2b>] ? nfs_write_begin+0x5b/0x1c0
[<c11ae81f>] ? nfs_write_end+0x4f/0x200
[<c10c2e27>] ? generic_file_buffered_write+0xe7/0x230
[<c10c478e>] ? __generic_file_aio_write+0x27e/0x500
[<c15a7fe8>] ? __mutex_lock_slowpath+0x1d8/0x270
[<c10c4a84>] ? generic_file_aio_write+0x74/0xf0
[<c11aecc2>] ? nfs_file_write+0x92/0x1c0
[<c1101a6d>] ? do_sync_write+0xcd/0x110
[<c1052809>] ? kmap_atomic_prot+0xd9/0x100
[<c11021fa>] ? rw_verify_area+0x6a/0x130
[<c11019a0>] ? wait_on_retry_sync_kiocb+0x50/0x50
[<c1102560>] ? vfs_write+0xa0/0x140
[<c11018f3>] ? vfs_llseek+0x33/0x40
[<c1102801>] ? sys_write+0x41/0x80
[<c15aa4d8>] ? sysenter_do_call+0x12/0x28
[<c15a0000>] ? identify_cpu+0xc0/0x36c
INFO: task Gecko_IOThread:10655 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Gecko_IOThread D 00000000 0 10655 6355 0x00000004
f44a3400 00000086 ece3dea0 00000000 00000000 c10cb15b ef6fe030 c17e4280
f30c17c0 ece3de60 c17e4280 c17e4280 ef6fe030 c17e4280 c10cb284 00000002
0000000e 00000000 00000000 ffffffff 00000000 00000000 f4434638 f4434638
Call Trace:
[<c10cb15b>] ? tag_pages_for_writeback+0x5b/0xa0
[<c10cb284>] ? write_cache_pages+0xe4/0x340
[<c1088d70>] ? ktime_get_ts+0xf0/0x120
[<c15a907e>] ? io_schedule+0x6e/0xb0
[<c10c35f5>] ? sleep_on_page+0x5/0x10
[<c15a75b5>] ? __wait_on_bit+0x45/0x70
[<c10c35f0>] ? __lock_page+0x80/0x80
[<c10c383d>] ? wait_on_page_bit+0x8d/0xa0
[<c106ec10>] ? autoremove_wake_function+0x40/0x40
[<c10c3eae>] ? filemap_fdatawait_range+0xde/0x170
[<c10c42cb>] ? filemap_write_and_wait_range+0x7b/0x90
[<c11ae336>] ? nfs_file_fsync+0x36/0xe0
[<c11ae300>] ? nfs_lock+0x150/0x150
[<c1125458>] ? vfs_fsync+0x48/0x60
[<c10fffee>] ? filp_close+0x2e/0x80
[<c1100094>] ? sys_close+0x54/0xa0
[<c15aa4d8>] ? sysenter_do_call+0x12/0x28
INFO: task Gecko_IOThread:10691 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Gecko_IOThread D 00000000 0 10691 6355 0x00000000
f44a3400 00000086 d0247ea0 00000000 00000000 c10cb15b ef317400 c17e4280
f31590c0 d0247e60 c17e4280 c17e4280 ef317400 c17e4280 c10cb284 00000002
0000000e 00000000 00000000 ffffffff 00000000 00000000 f4434ae8 f4434ae8
Call Trace:
[<c10cb15b>] ? tag_pages_for_writeback+0x5b/0xa0
[<c10cb284>] ? write_cache_pages+0xe4/0x340
[<c1088d70>] ? ktime_get_ts+0xf0/0x120
[<c15a907e>] ? io_schedule+0x6e/0xb0
[<c10c35f5>] ? sleep_on_page+0x5/0x10
[<c15a75b5>] ? __wait_on_bit+0x45/0x70
[<c10c35f0>] ? __lock_page+0x80/0x80
[<c10c383d>] ? wait_on_page_bit+0x8d/0xa0
[<c106ec10>] ? autoremove_wake_function+0x40/0x40
[<c10c3eae>] ? filemap_fdatawait_range+0xde/0x170
[<c10c42cb>] ? filemap_write_and_wait_range+0x7b/0x90
[<c11ae336>] ? nfs_file_fsync+0x36/0xe0
[<c11ae300>] ? nfs_lock+0x150/0x150
[<c1125458>] ? vfs_fsync+0x48/0x60
[<c10fffee>] ? filp_close+0x2e/0x80
[<c1100094>] ? sys_close+0x54/0xa0
[<c15aa4d8>] ? sysenter_do_call+0x12/0x28
nfsd: last server has exited, flushing export cache
NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory
NFSD: starting 90-second grace period
device eth0 entered promiscuous mode
device eth0 left promiscuous mode
INFO: task mozStorage #2:6355 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
mozStorage #2 D f5619280 0 6355 1 0x00000004
f446d440 00000086 f446d65c f5619280 f3745300 f5619280 ef0da810 c17e4280
f3745300 c15a897e c17e4280 c17e4280 ef0da810 c17e4280 ef0da810 c17e4280
f3745300 f3745300 c17e4280 c17e4280 ef0da810 c17e4280 f46124f8 f46124f8
Call Trace:
[<c15a897e>] ? __schedule+0x23e/0x660
[<c1088d70>] ? ktime_get_ts+0xf0/0x120
[<c15a907e>] ? io_schedule+0x6e/0xb0
[<c10c35f5>] ? sleep_on_page+0x5/0x10
[<c15a75b5>] ? __wait_on_bit+0x45/0x70
[<c10c35f0>] ? __lock_page+0x80/0x80
[<c10c383d>] ? wait_on_page_bit+0x8d/0xa0
[<c106ec10>] ? autoremove_wake_function+0x40/0x40
[<c10c3a7c>] ? grab_cache_page_write_begin+0x9c/0xe0
[<c106ec10>] ? autoremove_wake_function+0x40/0x40
[<c11aea2b>] ? nfs_write_begin+0x5b/0x1c0
[<c11ae81f>] ? nfs_write_end+0x4f/0x200
[<c10c2e27>] ? generic_file_buffered_write+0xe7/0x230
[<c10c478e>] ? __generic_file_aio_write+0x27e/0x500
[<c15a7fe8>] ? __mutex_lock_slowpath+0x1d8/0x270
[<c10c4a84>] ? generic_file_aio_write+0x74/0xf0
[<c11aecc2>] ? nfs_file_write+0x92/0x1c0
[<c1101a6d>] ? do_sync_write+0xcd/0x110
[<c1052809>] ? kmap_atomic_prot+0xd9/0x100
[<c11021fa>] ? rw_verify_area+0x6a/0x130
[<c11019a0>] ? wait_on_retry_sync_kiocb+0x50/0x50
[<c1102560>] ? vfs_write+0xa0/0x140
[<c11018f3>] ? vfs_llseek+0x33/0x40
[<c1102801>] ? sys_write+0x41/0x80
[<c15aa4d8>] ? sysenter_do_call+0x12/0x28
[<c15a0000>] ? identify_cpu+0xc0/0x36c
device eth0 entered promiscuous mode
device eth0 left promiscuous mode
device eth0 entered promiscuous mode
device eth0 left promiscuous mode
Joakim Tjernlund/Transmode wrote on 2013/04/25 18:12:29:
>
> "Myklebust, Trond" <[email protected]> wrote on 2013/04/25
17:59:01:
> >
> > On Thu, 2013-04-25 at 17:31 +0200, Joakim Tjernlund wrote:
> > > Joakim Tjernlund/Transmode wrote on 2013/04/24 15:16:26:
> > > >
> > > > "Myklebust, Trond" <[email protected]> wrote on
2013/04/23
> > > 16:18:07:
> > > > >
> > > > > On Tue, 2013-04-23 at 16:14 +0200, Joakim Tjernlund wrote:
> > > > > > "Myklebust, Trond" <[email protected]> wrote on
2013/04/23
> > > > > > 15:52:06:
> > > > > > >
> > > > > > > On Tue, 2013-04-23 at 15:38 +0200, Joakim Tjernlund wrote:
> > > > > > > > So, it happened again. Just when hitting search on
> > > bugs.gentoo.org in
> > > > > > > > firefox 17.0.3
> > > > > > > >
> > > > > > > > This time I got a NFS loop with NFS4ERR_BAD_STATEID
looping over
> > > and
> > > > > > over
> > > > > > > > again and FF was hung. Not posting the logs as it does not
> > > appear to
> > > > > > > > do any good. Nothing in dmesg either.
> > > > > > > >
> > > > > > > > Noticed this patch on the NFS list:
> > > > > > > > http://marc.info/?l=linux-nfs&m=136643651710066&w=2
> > > > > > > > I wonder if that could be a potential cure and if so,
could it
> > > be
> > > > > > > > backported to 3.4?
> > > > > > >
> > > > > > > It is in the testing branch on
> > > > > > >
> > > > > > >
http://git.linux-nfs.org/?p=trondmy/linux-nfs.git;a=summary
> > > > > > >
> > > > > > > if you want to try it out. I'm not planning on backporting
> > > anything that
> > > > > > > hasn't been labelled with a Cc: stable in that branch.
> > > > > >
> > > > > > Well, we won't use tip of linus tree in production so there is
> > > > > > little point to use your testing branch. However it looks like
a
> > > trivial
> > > > > > backport so I can test it on my client easily.
> > > > >
> > > > > The point of testing would not be to discover if you can use
Linus'
> > > tree
> > > > > in production, but rather to see if the problem is already fixed
> > > > > upstream. If it is, we can bisect to figure out which patch is
the
> > > fix.
> > > > >
> > > > > > Even the NFS server if required, is the above referenced patch
for
> > > > > > NFS client/server or both? Any chance this is the culprit?
> > > > >
> > > > > That's a client patch.
> > >
> > > > Tried 3.4.41+above nfs patch and also 3.8.8, they both have the
> > > > NFS loop problem.
> > > >
> > > > Now I am at your
> > > http://git.linux-nfs.org/?p=trondmy/linux-nfs.git;a=summary,
> > > > testing branch
> > > > With any luck the error will show soon.
> > > >
> > > > Question though the loop I see, could it be a NFS server bug ?
> > > > If so it does matter what I do on my client I guess.
> > >
> > > Ran http://git.linux-nfs.org/?p=trondmy/linux-nfs.git;a=summary,
testing
> > > branch
> > > for a day without problem.
> > >
> > > Then I backed to 3.4.41 +
> > > http://marc.info/?l=linux-nfs&m=136643651710066&w=2 +
> > > http://marc.info/?l=linux-nfs&m=136674349127504&w=2
> > > this morning, been using all day without problem. It is a good start
> > > but not conclusive yet.
> > >
> > > Is http://marc.info/?l=linux-nfs&m=136674349127504&w=2 supposed to
> > > fix my type of problem?
> >
> > No. That's a follow up patch to commit
> > 92b40e93849e29f9ca661de6442bb66282738bf7 (NFSv4: Use the open stateid
if
> > the delegation has the wrong mode).
> hmm, that commit is the first one I listed,
http://marc.info/?l=linux-nfs&m=136643651710066&w=2
> and I know that using only that one does NOT fix the problem. I was
hoping that both of them could
> be the answer?
>
> Jocke