2010-10-17 17:51:59

by Brian J. Murrell

[permalink] [raw]
Subject: Error: state manager failed on NFSv4 server linux with error 127

Hi.

Yesterday, on the Ubuntu 2.6.35-22-generic kernel, I got a spew of:

Oct 16 17:00:21 pc kernel: [415259.309475] Error: state manager failed on NFSv4 server linux with error 127
Oct 16 17:00:21 pc kernel: [415259.311413] Error: state manager failed on NFSv4 server linux with error 127
Oct 16 17:00:21 pc kernel: [415259.314178] Error: state manager failed on NFSv4 server linux with error 127
Oct 16 17:00:21 pc kernel: [415259.316237] Error: state manager failed on NFSv4 server linux with error 127
Oct 16 17:00:21 pc kernel: [415259.318169] Error: state manager failed on NFSv4 server linux with error 127
Oct 16 17:00:21 pc kernel: [415259.320442] Error: state manager failed on NFSv4 server linux with error 127
Oct 16 17:00:21 pc kernel: [415259.322989] Error: state manager failed on NFSv4 server linux with error 127
Oct 16 17:00:21 pc kernel: [415259.324761] Error: state manager failed on NFSv4 server linux with error 127
Oct 16 17:00:21 pc kernel: [415259.326443] Error: state manager failed on NFSv4 server linux with error 127

At the same time, many processes were blocked with stacks similar to:

[415723.293796] SysRq : Show Blocked State
[415723.293803] task PC stack pid father
[415723.293848] gnome-setting D f368bd04 0 4136 1 0x00000000
[415723.293854] f368bd14 00200086 00000002 f368bd04 f8256917 c05d89e0 c08c3700 c08c3700
[415723.293859] d209ff9f 00017a15 c08c3700 c08c3700 d209dca8 00017a15 00000000 c08c3700
[415723.293864] c08c3700 f69bcc20 00000001 f7090000 f7090000 f368bd34 f368bd78 c05c72df
[415723.293868] Call Trace:
[415723.293916] [<f8256917>] ? rpc_put_task+0x77/0x80 [sunrpc]
[415723.293924] [<c05c72df>] schedule_timeout+0x12f/0x270
[415723.293949] [<f882e3b1>] ? _nfs4_proc_access+0xf1/0x170 [nfs]
[415723.293954] [<c01589c0>] ? process_timeout+0x0/0x10
[415723.293959] [<c05c745a>] schedule_timeout_killable+0x1a/0x20
[415723.293972] [<f882c32d>] nfs4_delay+0x2d/0x70 [nfs]
[415723.293984] [<f882c50a>] nfs4_handle_exception+0xfa/0x180 [nfs]
[415723.293997] [<f882e47e>] nfs4_proc_access+0x4e/0x60 [nfs]
[415723.294006] [<f8813edd>] nfs_do_access+0x7d/0xc0 [nfs]
[415723.294015] [<f8813f98>] nfs_permission+0x78/0x1a0 [nfs]
[415723.294020] [<c022309a>] ? do_lookup+0x7a/0x1c0
[415723.294023] [<c02212ea>] exec_permission+0x2a/0x90
[415723.294027] [<c0223867>] link_path_walk+0x67/0x890
[415723.294032] [<c012ce18>] ? default_spin_lock_flags+0x8/0x10
[415723.294036] [<c02241b1>] path_walk+0x51/0xc0
[415723.294039] [<c0224339>] do_path_lookup+0x59/0x90
[415723.294042] [<c0224e81>] user_path_at+0x41/0x80
[415723.294046] [<c01f46c4>] ? handle_mm_fault+0x2d4/0x400
[415723.294050] [<c021c9da>] vfs_fstatat+0x3a/0x70
[415723.294054] [<c05cc3fd>] ? do_page_fault+0x1cd/0x440
[415723.294057] [<c021cb30>] vfs_stat+0x20/0x30
[415723.294060] [<c021cb59>] sys_stat64+0x19/0x30
[415723.294064] [<c0151519>] ? irq_exit+0x39/0x70
[415723.294068] [<c05c9a40>] ? do_device_not_available+0x0/0x60
[415723.294072] [<c0104279>] ? math_state_restore+0x39/0x60
[415723.294075] [<c05c9a8d>] ? do_device_not_available+0x4d/0x60
[415723.294078] [<c05c90a4>] syscall_call+0x7/0xb
[415723.294087] pidgin D f373fcd4 0 4191 4057 0x00000000
[415723.294091] f373fd14 00000086 00000080 f373fcd4 c460e000 ffffff81 c08c3700 c08c3700
[415723.294095] 3084e28e 00017a17 c08c3700 c08c3700 00000001 00017a17 00000000 c08c3700
[415723.294100] c08c3700 f42f2610 f7090000 f7090000 f7090000 f373fd34 f373fd78 c05c72df
[415723.294104] Call Trace:
[415723.294108] [<c05c72df>] schedule_timeout+0x12f/0x270
[415723.294123] [<f882e3b1>] ? _nfs4_proc_access+0xf1/0x170 [nfs]
[415723.294126] [<c01589c0>] ? process_timeout+0x0/0x10
[415723.294130] [<c05c745a>] schedule_timeout_killable+0x1a/0x20
[415723.294143] [<f882c32d>] nfs4_delay+0x2d/0x70 [nfs]
[415723.294155] [<f882c50a>] nfs4_handle_exception+0xfa/0x180 [nfs]
[415723.294168] [<f882e47e>] nfs4_proc_access+0x4e/0x60 [nfs]
[415723.294177] [<f8813edd>] nfs_do_access+0x7d/0xc0 [nfs]
[415723.294186] [<f8813f98>] nfs_permission+0x78/0x1a0 [nfs]
[415723.294190] [<c022309a>] ? do_lookup+0x7a/0x1c0
[415723.294193] [<c02212ea>] exec_permission+0x2a/0x90
[415723.294196] [<c0223867>] link_path_walk+0x67/0x890
[415723.294201] [<c0139665>] ? update_curr+0x175/0x2a0
[415723.294204] [<c02241b1>] path_walk+0x51/0xc0
[415723.294208] [<c0224339>] do_path_lookup+0x59/0x90
[415723.294211] [<c0224e81>] user_path_at+0x41/0x80
[415723.294216] [<c03551a3>] ? rb_insert_color+0xd3/0x110
[415723.294218] [<c05c8b6d>] ? _raw_spin_lock+0xd/0x10
[415723.294222] [<c01abfc3>] ? rcu_report_qs_rnp+0xb3/0x100
[415723.294225] [<c01ac6b7>] ? __rcu_process_callbacks+0x47/0x2d0
[415723.294228] [<c021c9da>] vfs_fstatat+0x3a/0x70
[415723.294231] [<c021cb30>] vfs_stat+0x20/0x30
[415723.294234] [<c021cb59>] sys_stat64+0x19/0x30
[415723.294239] [<c016fb56>] ? getnstimeofday+0x56/0x120
[415723.294242] [<c035a249>] ? copy_to_user+0x39/0x130
[415723.294245] [<c016fc76>] ? do_gettimeofday+0x16/0x40
[415723.294249] [<c0150146>] ? sys_gettimeofday+0x36/0x70
[415723.294252] [<c05c90a4>] syscall_call+0x7/0xb
[415723.294284] vinagre D d9423cd4 0 8801 1 0x00000000
[415723.294288] d9423d14 00200086 00000080 d9423cd4 c460c000 ffffff81 c08c3700 c08c3700
[415723.294293] 6ada7549 00017a16 c08c3700 c08c3700 00000000 00017a16 00000000 c08c3700
[415723.294297] c08c3700 efa6e580 c08fc500 c08fc500 c08fc500 d9423d34 d9423d78 c05c72df
[415723.294302] Call Trace:
[415723.294306] [<c05c72df>] schedule_timeout+0x12f/0x270
[415723.294310] [<c01589c0>] ? process_timeout+0x0/0x10
[415723.294313] [<c05c745a>] schedule_timeout_killable+0x1a/0x20
[415723.294328] [<f882c32d>] nfs4_delay+0x2d/0x70 [nfs]
[415723.294340] [<f882c50a>] nfs4_handle_exception+0xfa/0x180 [nfs]
[415723.294353] [<f882e47e>] nfs4_proc_access+0x4e/0x60 [nfs]
[415723.294362] [<f8813edd>] nfs_do_access+0x7d/0xc0 [nfs]
[415723.294371] [<f8813f98>] nfs_permission+0x78/0x1a0 [nfs]
[415723.294375] [<c022309a>] ? do_lookup+0x7a/0x1c0
[415723.294378] [<c02212ea>] exec_permission+0x2a/0x90
[415723.294381] [<c0223867>] link_path_walk+0x67/0x890
[415723.294385] [<c02241b1>] path_walk+0x51/0xc0
[415723.294388] [<c0224339>] do_path_lookup+0x59/0x90
[415723.294391] [<c0224e81>] user_path_at+0x41/0x80
[415723.294395] [<c0142a11>] ? try_to_wake_up+0xa1/0x3b0
[415723.294399] [<c01222db>] ? lapic_next_event+0x1b/0x20
[415723.294402] [<c0175802>] ? tick_dev_program_event+0x42/0x150
[415723.294406] [<c0169bd2>] ? __run_hrtimer+0x92/0x190
[415723.294409] [<c021c9da>] vfs_fstatat+0x3a/0x70
[415723.294411] [<c021cb30>] vfs_stat+0x20/0x30
[415723.294414] [<c021cb59>] sys_stat64+0x19/0x30
[415723.294418] [<c016fb56>] ? getnstimeofday+0x56/0x120
[415723.294421] [<c035a249>] ? copy_to_user+0x39/0x130
[415723.294424] [<c016fc76>] ? do_gettimeofday+0x16/0x40
[415723.294427] [<c0150146>] ? sys_gettimeofday+0x36/0x70
[415723.294430] [<c05c90a4>] syscall_call+0x7/0xb
[415723.294433] rhythmbox D f7090000 0 27043 1 0x00000000
[415723.294437] d5dc7cac 00200086 c01588cc f7090000 d5dc7c70 c05d89e0 c08c3700 c08c3700
[415723.294441] a4861998 00017a18 c08c3700 c08c3700 a485c4cb 00017a18 00000000 c08c3700
[415723.294446] c08c3700 ef950000 d5dc7cb4 d5dc7ce0 00000000 d5dc7ce8 d5dc7cb4 f8256c0c
[415723.294450] Call Trace:
[415723.294453] [<c01588cc>] ? lock_timer_base+0x2c/0x60
[415723.294473] [<f8256c0c>] rpc_wait_bit_killable+0x1c/0x40 [sunrpc]
[415723.294476] [<c05c761d>] __wait_on_bit+0x4d/0x70
[415723.294490] [<f8256bf0>] ? rpc_wait_bit_killable+0x0/0x40 [sunrpc]
[415723.294502] [<f8256bf0>] ? rpc_wait_bit_killable+0x0/0x40 [sunrpc]
[415723.294506] [<c05c76eb>] out_of_line_wait_on_bit+0xab/0xc0
[415723.294510] [<c0165e60>] ? wake_bit_function+0x0/0x50
[415723.294525] [<f825731b>] __rpc_execute+0xdb/0x250 [sunrpc]
[415723.294538] [<f8256a17>] ? rpc_init_task+0xd7/0x120 [sunrpc]
[415723.294552] [<f82574fe>] rpc_execute+0x6e/0x80 [sunrpc]
[415723.294563] [<f82509af>] rpc_run_task+0x1f/0x30 [sunrpc]
[415723.294575] [<f8250abe>] rpc_call_sync+0x3e/0x60 [sunrpc]
[415723.294591] [<f882f5e2>] _nfs4_call_sync+0x22/0x30 [nfs]
[415723.294604] [<f882e3a3>] _nfs4_proc_access+0xe3/0x170 [nfs]
[415723.294617] [<f882e469>] nfs4_proc_access+0x39/0x60 [nfs]
[415723.294626] [<f8813edd>] nfs_do_access+0x7d/0xc0 [nfs]
[415723.294635] [<f8813f98>] nfs_permission+0x78/0x1a0 [nfs]
[415723.294639] [<c022309a>] ? do_lookup+0x7a/0x1c0
[415723.294642] [<c02212ea>] exec_permission+0x2a/0x90
[415723.294645] [<c0223867>] link_path_walk+0x67/0x890
[415723.294649] [<c02241b1>] path_walk+0x51/0xc0
[415723.294652] [<c0224339>] do_path_lookup+0x59/0x90
[415723.294655] [<c0224e81>] user_path_at+0x41/0x80
[415723.294659] [<c016ffbd>] ? ktime_get_ts+0xed/0x120
[415723.294662] [<c016fb56>] ? getnstimeofday+0x56/0x120
[415723.294666] [<c024ad63>] sys_inotify_add_watch+0x63/0x100
[415723.294669] [<c0228829>] ? sys_poll+0x59/0xc0
[415723.294672] [<c05c90a4>] syscall_call+0x7/0xb
[415723.294681] clock-applet D ce1f5cd4 0 29323 1 0x00000000
[415723.294684] ce1f5d14 00200086 00000080 ce1f5cd4 f5fa6000 ffffff81 c08c3700 c08c3700
[415723.294689] e8a6c632 00017a18 c08c3700 c08c3700 00000000 00017a18 f5e67e00 c08c3700
[415723.294693] c08c3700 f68bd8d0 c08fc500 c08fc500 c08fc500 ce1f5d34 ce1f5d78 c05c72df
[415723.294698] Call Trace:
[415723.294701] [<c05c72df>] schedule_timeout+0x12f/0x270
[415723.294705] [<c01589c0>] ? process_timeout+0x0/0x10
[415723.294709] [<c05c745a>] schedule_timeout_killable+0x1a/0x20
[415723.294722] [<f882c32d>] nfs4_delay+0x2d/0x70 [nfs]
[415723.294734] [<f882c50a>] nfs4_handle_exception+0xfa/0x180 [nfs]
[415723.294747] [<f882e47e>] nfs4_proc_access+0x4e/0x60 [nfs]
[415723.294756] [<f8813edd>] nfs_do_access+0x7d/0xc0 [nfs]
[415723.294765] [<f8813f98>] nfs_permission+0x78/0x1a0 [nfs]
[415723.294770] [<c02306cf>] ? mntput_no_expire+0x1f/0xd0
[415723.294773] [<c02212ea>] exec_permission+0x2a/0x90
[415723.294776] [<c0223867>] link_path_walk+0x67/0x890
[415723.294780] [<c02241b1>] path_walk+0x51/0xc0
[415723.294783] [<c0224339>] do_path_lookup+0x59/0x90
[415723.294787] [<c0224e81>] user_path_at+0x41/0x80
[415723.294802] [<c04f5f4b>] ? net_rx_action+0x1ab/0x220
[415723.294805] [<c021c9da>] vfs_fstatat+0x3a/0x70
[415723.294809] [<c0124fc4>] ? ack_apic_level+0x64/0x1f0
[415723.294812] [<c021cb30>] vfs_stat+0x20/0x30
[415723.294815] [<c021cb59>] sys_stat64+0x19/0x30
[415723.294818] [<c016fb56>] ? getnstimeofday+0x56/0x120
[415723.294821] [<c035a249>] ? copy_to_user+0x39/0x130
[415723.294824] [<c016fc76>] ? do_gettimeofday+0x16/0x40
[415723.294828] [<c0150146>] ? sys_gettimeofday+0x36/0x70
[415723.294831] [<c05c90a4>] syscall_call+0x7/0xb
[415723.294833] evolution D f20e9cd4 0 29359 1 0x00000000
[415723.294837] f20e9d14 00000086 00000080 f20e9cd4 c4478000 ffffff81 c08c3700 c08c3700
[415723.294841] ee19e4de 00017a15 c08c3700 c08c3700 00000000 00017a15 00000000 c08c3700
[415723.294846] c08c3700 d049b2c0 c08fc500 c08fc500 c08fc500 f20e9d34 f20e9d78 c05c72df
[415723.294850] Call Trace:
[415723.294854] [<c05c72df>] schedule_timeout+0x12f/0x270
[415723.294857] [<c01589c0>] ? process_timeout+0x0/0x10
[415723.294861] [<c05c745a>] schedule_timeout_killable+0x1a/0x20
[415723.294874] [<f882c32d>] nfs4_delay+0x2d/0x70 [nfs]
[415723.294886] [<f882c50a>] nfs4_handle_exception+0xfa/0x180 [nfs]
[415723.294899] [<f882e47e>] nfs4_proc_access+0x4e/0x60 [nfs]
[415723.294908] [<f8813edd>] nfs_do_access+0x7d/0xc0 [nfs]
[415723.294917] [<f8813f98>] nfs_permission+0x78/0x1a0 [nfs]
[415723.294922] [<c022309a>] ? do_lookup+0x7a/0x1c0
[415723.294926] [<c02212ea>] exec_permission+0x2a/0x90
[415723.294929] [<c0223867>] link_path_walk+0x67/0x890
[415723.294933] [<c02241b1>] path_walk+0x51/0xc0
[415723.294936] [<c0224339>] do_path_lookup+0x59/0x90
[415723.294939] [<c0224e81>] user_path_at+0x41/0x80
[415723.294942] [<c021c9da>] vfs_fstatat+0x3a/0x70
[415723.294945] [<c021cb30>] vfs_stat+0x20/0x30
[415723.294948] [<c021cb59>] sys_stat64+0x19/0x30
[415723.294951] [<c05c90a4>] syscall_call+0x7/0xb
[415723.294954] [<c01698a8>] ? lock_hrtimer_base+0x28/0x50
[415723.294960] uptrack-upgra D e9fa5cfc 0 11865 11864 0x00000000
[415723.294964] e9fa5d0c 00000086 00000002 e9fa5cfc 00017a19 c05d89e0 c08c3700 c08c3700
[415723.294968] 1ed1c8f3 00017a19 c08c3700 c08c3700 1ed196bd 00017a19 00000000 c08c3700
[415723.294973] c08c3700 c5a558d0 00000001 e9fa5d40 00000000 e9fa5d48 e9fa5d14 f88198cc
[415723.294977] Call Trace:
[415723.294988] [<f88198cc>] nfs_wait_bit_killable+0x1c/0x40 [nfs]
[415723.294992] [<c05c761d>] __wait_on_bit+0x4d/0x70
[415723.295002] [<f88198b0>] ? nfs_wait_bit_killable+0x0/0x40 [nfs]
[415723.295011] [<f88198b0>] ? nfs_wait_bit_killable+0x0/0x40 [nfs]
[415723.295015] [<c05c76eb>] out_of_line_wait_on_bit+0xab/0xc0
[415723.295018] [<c0165e60>] ? wake_bit_function+0x0/0x50
[415723.295031] [<f882c407>] nfs4_wait_clnt_recover+0x37/0x40 [nfs]
[415723.295044] [<f882c4d3>] nfs4_handle_exception+0xc3/0x180 [nfs]
[415723.295057] [<f883053e>] nfs4_do_open+0x8e/0xf0 [nfs]
[415723.295070] [<f8830896>] nfs4_atomic_open+0x96/0x180 [nfs]
[415723.295079] [<f8813173>] nfs_atomic_lookup+0x93/0x100 [nfs]
[415723.295082] [<c05c8b6d>] ? _raw_spin_lock+0xd/0x10
[415723.295085] [<c022b797>] ? d_alloc+0x117/0x170
[415723.295089] [<c022317b>] do_lookup+0x15b/0x1c0
[415723.295092] [<c022361d>] do_last+0x24d/0x3a0
[415723.295096] [<c02251bd>] do_filp_open+0x19d/0x4c0
[415723.295101] [<c0216a65>] do_sys_open+0x55/0x150
[415723.295105] [<c0216bce>] sys_open+0x2e/0x40
[415723.295107] [<c05c90a4>] syscall_call+0x7/0xb
[415723.295110] gnome-screens D ed361cd4 0 23902 4269 0x00000000
[415723.295114] ed361d14 00000086 00000080 ed361cd4 f5fa6000 ffffff81 c08c3700 c08c3700
[415723.295118] cc6ee762 00017a17 c08c3700 c08c3700 00000001 00017a17 f5e67e00 c08c3700
[415723.295123] c08c3700 f344f230 f7090000 f7090000 f7090000 ed361d34 ed361d78 c05c72df
[415723.295127] Call Trace:
[415723.295131] [<c05c72df>] schedule_timeout+0x12f/0x270
[415723.295144] [<f882e3b1>] ? _nfs4_proc_access+0xf1/0x170 [nfs]
[415723.295147] [<c01589c0>] ? process_timeout+0x0/0x10
[415723.295151] [<c05c745a>] schedule_timeout_killable+0x1a/0x20
[415723.295164] [<f882c32d>] nfs4_delay+0x2d/0x70 [nfs]
[415723.295176] [<f882c50a>] nfs4_handle_exception+0xfa/0x180 [nfs]
[415723.295189] [<f882e47e>] nfs4_proc_access+0x4e/0x60 [nfs]
[415723.295197] [<f8813edd>] nfs_do_access+0x7d/0xc0 [nfs]
[415723.295206] [<f8813f98>] nfs_permission+0x78/0x1a0 [nfs]
[415723.295210] [<c022309a>] ? do_lookup+0x7a/0x1c0
[415723.295213] [<c02212ea>] exec_permission+0x2a/0x90
[415723.295216] [<c0223867>] link_path_walk+0x67/0x890
[415723.295220] [<c02241b1>] path_walk+0x51/0xc0
[415723.295223] [<c0224339>] do_path_lookup+0x59/0x90
[415723.295227] [<c0224e81>] user_path_at+0x41/0x80
[415723.295230] [<c0218534>] ? do_sync_read+0xa4/0xe0
[415723.295234] [<c021c9da>] vfs_fstatat+0x3a/0x70
[415723.295236] [<c021cb30>] vfs_stat+0x20/0x30
[415723.295239] [<c021cb59>] sys_stat64+0x19/0x30
[415723.295243] [<c0219f9d>] ? fput+0x1d/0x30
[415723.295246] [<c021691c>] ? filp_close+0x4c/0x80
[415723.295249] [<c02169c5>] ? sys_close+0x75/0xc0
[415723.295252] [<c05c90a4>] syscall_call+0x7/0xb
[415723.295255] lsb_release D 00000095 0 30540 30529 0x00000000
[415723.295259] da947f2c 00000086 00000000 00000095 f42b0000 c2ce14c0 c08c3700 c08c3700
[415723.295263] 33bfe72a 000179a8 c08c3700 c08c3700 00000001 000179a8 f5db4c00 c08c3700
[415723.295268] c08c3700 f1733f70 dc1d4008 cafac954 ffffffff cafac95c da947f58 c05c7bbc
[415723.295272] Call Trace:
[415723.295276] [<c05c7bbc>] __mutex_lock_killable_slowpath+0xdc/0x160
[415723.295280] [<c05cc3fd>] ? do_page_fault+0x1cd/0x440
[415723.295283] [<c05c7c74>] mutex_lock_killable+0x34/0x40
[415723.295286] [<c0227494>] vfs_readdir+0x64/0xb0
[415723.295289] [<c0227190>] ? filldir64+0x0/0xf0
[415723.295292] [<c0227549>] sys_getdents64+0x69/0xc0
[415723.295294] [<c05c90a4>] syscall_call+0x7/0xb
[415723.295302] Sched Debug Version: v0.09, 2.6.35-22-generic #33-Ubuntu

A reboot was required to resolve this.

Today, I found similar (although no errors about the state manager):

[64676.088847] SysRq : Show Blocked State
[64676.092628] task PC stack pid father
[64676.092652] pulseaudio D f36b1d34 0 3323 1 0x00000000
[64676.092652] f36b1d44 00000086 00000002 f36b1d34 f879c917 c05d89e0 c08c3700 c08c3700
[64676.092652] 8e643a96 00003acf c08c3700 c08c3700 8e6404cd 00003acf 00000000 c08c3700
[64676.092652] c08c3700 f3418000 00000001 c08fc500 c08fc500 f36b1d64 f36b1da8 c05c72df
[64676.092652] Call Trace:
[64676.092652] [<f879c917>] ? rpc_put_task+0x77/0x80 [sunrpc]
[64676.092652] [<c05c72df>] schedule_timeout+0x12f/0x270
[64676.092652] [<c01589c0>] ? process_timeout+0x0/0x10
[64676.092652] [<c05c745a>] schedule_timeout_killable+0x1a/0x20
[64676.092652] [<f8c5a32d>] nfs4_delay+0x2d/0x70 [nfs]
[64676.092652] [<f8c5a50a>] nfs4_handle_exception+0xfa/0x180 [nfs]
[64676.092652] [<f8c5c47e>] nfs4_proc_access+0x4e/0x60 [nfs]
[64676.092652] [<f8c41edd>] nfs_do_access+0x7d/0xc0 [nfs]
[64676.092652] [<f8c41f98>] nfs_permission+0x78/0x1a0 [nfs]
[64676.092652] [<c022309a>] ? do_lookup+0x7a/0x1c0
[64676.092652] [<c02212ea>] exec_permission+0x2a/0x90
[64676.092652] [<c0223867>] link_path_walk+0x67/0x890
[64676.092652] [<c02306cf>] ? mntput_no_expire+0x1f/0xd0
[64676.092652] [<c022510d>] do_filp_open+0xed/0x4c0
[64676.092652] [<f8050b98>] ? nv_napi_poll+0x148/0x2e0 [forcedeth]
[64676.092652] [<c01512cc>] ? __do_softirq+0xec/0x1b0
[64676.092652] [<c022f0fd>] ? alloc_fd+0xbd/0xf0
[64676.092652] [<c0216a65>] do_sys_open+0x55/0x150
[64676.092652] [<c05cf665>] ? do_IRQ+0x55/0xc0
[64676.092652] [<c0216bce>] sys_open+0x2e/0x40
[64676.092652] [<c05c90a4>] syscall_call+0x7/0xb
[64676.092652] rhythmbox D f3731d5c 0 3342 3227 0x00000000
[64676.092652] f3731d6c 00000086 00000002 f3731d5c f879c917 c05d89e0 c08c3700 c08c3700
[64676.092652] 6c3d30ab 00003ad2 c08c3700 c08c3700 6c3cf7c6 00003ad2 00000000 c08c3700
[64676.092652] c08c3700 f36f2610 00000001 c08fc500 c08fc500 f3731d8c f3731dd0 c05c72df
[64676.092652] Call Trace:
[64676.092652] [<f879c917>] ? rpc_put_task+0x77/0x80 [sunrpc]
[64676.092652] [<c05c72df>] schedule_timeout+0x12f/0x270
[64676.092652] [<c01589c0>] ? process_timeout+0x0/0x10
[64676.092652] [<c05c745a>] schedule_timeout_killable+0x1a/0x20
[64676.092652] [<f8c5a32d>] nfs4_delay+0x2d/0x70 [nfs]
[64676.092652] [<f8c5a50a>] nfs4_handle_exception+0xfa/0x180 [nfs]
[64676.092652] [<f8c5c47e>] nfs4_proc_access+0x4e/0x60 [nfs]
[64676.092652] [<f8c41edd>] nfs_do_access+0x7d/0xc0 [nfs]
[64676.092652] [<f8c41f98>] nfs_permission+0x78/0x1a0 [nfs]
[64676.092652] [<c022309a>] ? do_lookup+0x7a/0x1c0
[64676.092652] [<c02212ea>] exec_permission+0x2a/0x90
[64676.092652] [<c0223867>] link_path_walk+0x67/0x890
[64676.092652] [<c02241b1>] path_walk+0x51/0xc0
[64676.092652] [<c0224339>] do_path_lookup+0x59/0x90
[64676.092652] [<c0224e81>] user_path_at+0x41/0x80
[64676.092652] [<c032e626>] ? apparmor_file_permission+0x16/0x20
[64676.092652] [<c016ffbd>] ? ktime_get_ts+0xed/0x120
[64676.092652] [<c016fb56>] ? getnstimeofday+0x56/0x120
[64676.092652] [<c024ad63>] sys_inotify_add_watch+0x63/0x100
[64676.092652] [<c0228829>] ? sys_poll+0x59/0xc0
[64676.092652] [<c05c90a4>] syscall_call+0x7/0xb
[64676.092652] pidgin D f37b7d04 0 3354 3227 0x00000000
[64676.092652] f37b7d14 00000086 00000002 f37b7d04 f879c917 c05d89e0 c08c3700 c08c3700
[64676.092652] 680f2ffc 00003ad0 c08c3700 c08c3700 680efa47 00003ad0 00000000 c08c3700
[64676.092652] c08c3700 f36f58d0 00000001 c08fc500 c08fc500 f37b7d34 f37b7d78 c05c72df
[64676.092652] Call Trace:
[64676.092652] [<f879c917>] ? rpc_put_task+0x77/0x80 [sunrpc]
[64676.092652] [<c05c72df>] schedule_timeout+0x12f/0x270
[64676.092652] [<c01589c0>] ? process_timeout+0x0/0x10
[64676.092652] [<c05c745a>] schedule_timeout_killable+0x1a/0x20
[64676.092652] [<f8c5a32d>] nfs4_delay+0x2d/0x70 [nfs]
[64676.092652] [<f8c5a50a>] nfs4_handle_exception+0xfa/0x180 [nfs]
[64676.092652] [<f8c5c47e>] nfs4_proc_access+0x4e/0x60 [nfs]
[64676.092652] [<f8c41edd>] nfs_do_access+0x7d/0xc0 [nfs]
[64676.092652] [<f8c41f98>] nfs_permission+0x78/0x1a0 [nfs]
[64676.092652] [<c022309a>] ? do_lookup+0x7a/0x1c0
[64676.092652] [<c02212ea>] exec_permission+0x2a/0x90
[64676.092652] [<c0223867>] link_path_walk+0x67/0x890
[64676.092652] [<c02241b1>] path_walk+0x51/0xc0
[64676.092652] [<c0224339>] do_path_lookup+0x59/0x90
[64676.092652] [<c0224e81>] user_path_at+0x41/0x80
[64676.092652] [<c021c9da>] vfs_fstatat+0x3a/0x70
[64676.092652] [<c021cb30>] vfs_stat+0x20/0x30
[64676.092652] [<c021cb59>] sys_stat64+0x19/0x30
[64676.092652] [<c016fb56>] ? getnstimeofday+0x56/0x120
[64676.092652] [<c035a249>] ? copy_to_user+0x39/0x130
[64676.092652] [<c016fc76>] ? do_gettimeofday+0x16/0x40
[64676.092652] [<c0150146>] ? sys_gettimeofday+0x36/0x70
[64676.092652] [<c05c90a4>] syscall_call+0x7/0xb
[64676.092652] sensors-apple D f0cd9c44 0 3778 1 0x00000000
[64676.092652] f0cd9c54 00000086 00000002 f0cd9c44 f0cd9c18 c05d89e0 c08c3700 c08c3700
[64676.092652] 5929818f 00003ad2 c08c3700 c08c3700 59294ecf 00003ad2 00000000 c08c3700
[64676.092652] c08c3700 f1abd8d0 00000001 f0cd9c88 00000000 f0cd9c90 f0cd9c5c f879cc0c
[64676.092652] Call Trace:
[64676.092652] [<f879cc0c>] rpc_wait_bit_killable+0x1c/0x40 [sunrpc]
[64676.092652] [<c05c761d>] __wait_on_bit+0x4d/0x70
[64676.092652] [<f879cbf0>] ? rpc_wait_bit_killable+0x0/0x40 [sunrpc]
[64676.092652] [<f879cbf0>] ? rpc_wait_bit_killable+0x0/0x40 [sunrpc]
[64676.092652] [<c05c76eb>] out_of_line_wait_on_bit+0xab/0xc0
[64676.092652] [<c0165e60>] ? wake_bit_function+0x0/0x50
[64676.092652] [<f879d31b>] __rpc_execute+0xdb/0x250 [sunrpc]
[64676.092652] [<f879ca17>] ? rpc_init_task+0xd7/0x120 [sunrpc]
[64676.092652] [<c01449b8>] ? dequeue_entity+0x1c8/0x210
[64676.092652] [<f879d4fe>] rpc_execute+0x6e/0x80 [sunrpc]
[64676.092652] [<f87969af>] rpc_run_task+0x1f/0x30 [sunrpc]
[64676.092652] [<f8796abe>] rpc_call_sync+0x3e/0x60 [sunrpc]
[64676.092652] [<f8c5d5e2>] _nfs4_call_sync+0x22/0x30 [nfs]
[64676.092652] [<f8c5c3a3>] _nfs4_proc_access+0xe3/0x170 [nfs]
[64676.092652] [<f8c5c469>] nfs4_proc_access+0x39/0x60 [nfs]
[64676.092652] [<f8c41edd>] nfs_do_access+0x7d/0xc0 [nfs]
[64676.092652] [<f8c41f98>] nfs_permission+0x78/0x1a0 [nfs]
[64676.092652] [<c02306cf>] ? mntput_no_expire+0x1f/0xd0
[64676.092652] [<c02212ea>] exec_permission+0x2a/0x90
[64676.092652] [<c0223867>] link_path_walk+0x67/0x890
[64676.092652] [<c02241b1>] path_walk+0x51/0xc0
[64676.092652] [<c0224339>] do_path_lookup+0x59/0x90
[64676.092652] [<c0224e81>] user_path_at+0x41/0x80
[64676.092652] [<c0139665>] ? update_curr+0x175/0x2a0
[64676.092652] [<c016bf14>] ? sched_clock_local+0xa4/0x180
[64676.092652] [<c0139cb6>] ? __dequeue_entity+0x26/0x50
[64676.092652] [<c021c9da>] vfs_fstatat+0x3a/0x70
[64676.092652] [<c021cb30>] vfs_stat+0x20/0x30
[64676.092652] [<c021cb59>] sys_stat64+0x19/0x30
[64676.092652] [<c016fb56>] ? getnstimeofday+0x56/0x120
[64676.092652] [<c05c90a4>] syscall_call+0x7/0xb
[64676.092652] Sched Debug Version: v0.09, 2.6.35-22-generic #33-Ubuntu

Any ideas what's going on here? If there isn't enough info here, I'd
more than happy to gather and post whatever is needed.

Thanx,
b.


Attachments:
signature.asc (198.00 B)
This is a digitally signed message part

2010-10-30 17:53:07

by Myklebust, Trond

[permalink] [raw]
Subject: Re: Error: state manager failed on NFSv4 server linux with error 127

On Sat, 2010-10-30 at 13:41 -0400, Brian J. Murrell wrote:
> On Sun, 2010-10-17 at 14:35 -0400, Trond Myklebust wrote:
> >
> > Error 127 is EKEYEXPIRED. It means that the RPCSEC_GSS context for at
> > least one of your threads has expired.
> >
> > Sigh... It looks as if we have a really poor handling of that in the
> > recovery threads.
> >
> > Does the following patch help?
>
> Well, while I don't seem to have had any of the state manager failed
> errors, I did have a number of processes blocked in NFS4 today with the
> patched kernel running:
>
> [316999.281908] SysRq : Show Blocked State
> [316999.285576] task PC stack pid father
> [316999.285576] rhythmbox D f32d1c9c 0 12726 12614 0x00000000
<snip>
>
> Maybe the above is an unrelated problem?

The processes appeared to be waiting for state recovery to complete, but
your trace didn't contain any evidence that a state recovery thread was
running. Did you over-edit the trace output, or was this really all?

Trond

2010-10-30 18:20:04

by Myklebust, Trond

[permalink] [raw]
Subject: Re: Error: state manager failed on NFSv4 server linux with error 127

On Sat, 2010-10-30 at 13:59 -0400, Brian J. Murrell wrote:
> On Sat, 2010-10-30 at 13:52 -0400, Trond Myklebust wrote:
> >
> > The processes appeared to be waiting for state recovery to complete, but
> > your trace didn't contain any evidence that a state recovery thread was
> > running.
>
> To be clear, the trace was only that of blocked processes (echo w
> > /proc/sysrq-trigger). I'm not sure if we should have expected the
> state recovery thread to have been included (i.e. it was blocked too).
>
> > Did you over-edit the trace output, or was this really all?
>
> Other than the usual post-show-blocked-state output, yes, this really
> was all there was.
>
> What does the state recovery thread look like in the process table so
> that I can look for it next time?

It will be running a process with a name of the form "<server IP
address>-manager", and will be running the function
nfs4_run_state_manager().

BTW: Do you have the following patches applied?
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git&a=commitdiff&h=b0ed9dbc24f1fd912b2dd08b995153cafc1d5b1c
and
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git&a=commitdiff&h=ae1007d37e00144b72906a4bdc47d517ae91bcc1

They are not yet in the stable kernel series, AFAIK, but are slated to
be merged soon.

Cheers
Trond

2010-10-30 17:59:25

by Brian J. Murrell

[permalink] [raw]
Subject: Re: Error: state manager failed on NFSv4 server linux with error 127

On Sat, 2010-10-30 at 13:52 -0400, Trond Myklebust wrote:
>
> The processes appeared to be waiting for state recovery to complete, but
> your trace didn't contain any evidence that a state recovery thread was
> running.

To be clear, the trace was only that of blocked processes (echo w
> /proc/sysrq-trigger). I'm not sure if we should have expected the
state recovery thread to have been included (i.e. it was blocked too).

> Did you over-edit the trace output, or was this really all?

Other than the usual post-show-blocked-state output, yes, this really
was all there was.

What does the state recovery thread look like in the process table so
that I can look for it next time?

b.




Attachments:
signature.asc (198.00 B)
This is a digitally signed message part

2010-10-30 19:54:00

by Brian J. Murrell

[permalink] [raw]
Subject: Re: Error: state manager failed on NFSv4 server linux with error 127

On Sat, 2010-10-30 at 14:19 -0400, Trond Myklebust wrote:
>
> It will be running a process with a name of the form "<server IP
> address>-manager", and will be running the function
> nfs4_run_state_manager().

I will keep an eye out for that the next time I see this problem.

> BTW: Do you have the following patches applied?
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git&a=commitdiff&h=b0ed9dbc24f1fd912b2dd08b995153cafc1d5b1c
> and
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git&a=commitdiff&h=ae1007d37e00144b72906a4bdc47d517ae91bcc1

Nope.

> They are not yet in the stable kernel series,

Which would be why I don't have them. I am working with a 2.6.35
kernel.

> AFAIK, but are slated to
> be merged soon.

I take that mentioning them in this context means they would be good to
have? Safe to apply to 2.6.35.4ish kernel?

b.


Attachments:
signature.asc (198.00 B)
This is a digitally signed message part

2010-10-30 22:23:08

by Myklebust, Trond

[permalink] [raw]
Subject: Re: Error: state manager failed on NFSv4 server linux with error 127

On Sat, 2010-10-30 at 17:46 -0400, Brian J. Murrell wrote:
> On Sat, 2010-10-30 at 17:41 -0400, Trond Myklebust wrote:
> >
> > There are 2 cases which can trigger recovery: server reboot, and network
> > partition (i.e. a networking fault that causes the client to be unable
> > to contact the server in time in order to renew its lease).
>
> Yes, seems two cases which I would suspect also.
>
> > If none of the above apply,
>
> None should be applicable. Of course I could never know if the network
> "blipped" (but highly doubt even that happened) but there should not
> have been an outtage long enough to prevent a lease renewal.
>
> > then we need to look at whether it is the
> > client or the server that is screwed up.
>
> Yes, fair enough. I suppose having more than 1 client using the same
> server could be sufficient to determine fault at the client or server?
> Or is there a more direct route?

How about something like the following patch? It should allow you to log
only the errors that lead to a state recovery situation.

Cheers
Trond

------------------------------------------------------------------------------

2010-10-30 20:24:51

by Trond Myklebust

[permalink] [raw]
Subject: Re: Error: state manager failed on NFSv4 server linux with error 127

On Sat, 2010-10-30 at 15:53 -0400, Brian J. Murrell wrote:
> On Sat, 2010-10-30 at 14:19 -0400, Trond Myklebust wrote:
> >
> > It will be running a process with a name of the form "<server IP
> > address>-manager", and will be running the function
> > nfs4_run_state_manager().
>
> I will keep an eye out for that the next time I see this problem.
>
> > BTW: Do you have the following patches applied?
> > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git&a=commitdiff&h=b0ed9dbc24f1fd912b2dd08b995153cafc1d5b1c
> > and
> > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git&a=commitdiff&h=ae1007d37e00144b72906a4bdc47d517ae91bcc1
>
> Nope.
>
> > They are not yet in the stable kernel series,
>
> Which would be why I don't have them. I am working with a 2.6.35
> kernel.
>
> > AFAIK, but are slated to
> > be merged soon.
>
> I take that mentioning them in this context means they would be good to
> have? Safe to apply to 2.6.35.4ish kernel?

Yes.


2010-10-17 18:35:24

by Myklebust, Trond

[permalink] [raw]
Subject: Re: Error: state manager failed on NFSv4 server linux with error 127

On Sun, 2010-10-17 at 13:00 -0400, Brian J. Murrell wrote:
> Hi.
>
> Yesterday, on the Ubuntu 2.6.35-22-generic kernel, I got a spew of:
>
> Oct 16 17:00:21 pc kernel: [415259.309475] Error: state manager failed on NFSv4 server linux with error 127
> Oct 16 17:00:21 pc kernel: [415259.311413] Error: state manager failed on NFSv4 server linux with error 127
> Oct 16 17:00:21 pc kernel: [415259.314178] Error: state manager failed on NFSv4 server linux with error 127
> Oct 16 17:00:21 pc kernel: [415259.316237] Error: state manager failed on NFSv4 server linux with error 127
> Oct 16 17:00:21 pc kernel: [415259.318169] Error: state manager failed on NFSv4 server linux with error 127
> Oct 16 17:00:21 pc kernel: [415259.320442] Error: state manager failed on NFSv4 server linux with error 127
> Oct 16 17:00:21 pc kernel: [415259.322989] Error: state manager failed on NFSv4 server linux with error 127
> Oct 16 17:00:21 pc kernel: [415259.324761] Error: state manager failed on NFSv4 server linux with error 127
> Oct 16 17:00:21 pc kernel: [415259.326443] Error: state manager failed on NFSv4 server linux with error 127
>
> At the same time, many processes were blocked with stacks similar to:
>
> [415723.293796] SysRq : Show Blocked State
> [415723.293803] task PC stack pid father
> [415723.293848] gnome-setting D f368bd04 0 4136 1 0x00000000
> [415723.293854] f368bd14 00200086 00000002 f368bd04 f8256917 c05d89e0 c08c3700 c08c3700
> [415723.293859] d209ff9f 00017a15 c08c3700 c08c3700 d209dca8 00017a15 00000000 c08c3700
> [415723.293864] c08c3700 f69bcc20 00000001 f7090000 f7090000 f368bd34 f368bd78 c05c72df
> [415723.293868] Call Trace:
> [415723.293916] [<f8256917>] ? rpc_put_task+0x77/0x80 [sunrpc]
> [415723.293924] [<c05c72df>] schedule_timeout+0x12f/0x270
> [415723.293949] [<f882e3b1>] ? _nfs4_proc_access+0xf1/0x170 [nfs]
> [415723.293954] [<c01589c0>] ? process_timeout+0x0/0x10
> [415723.293959] [<c05c745a>] schedule_timeout_killable+0x1a/0x20
> [415723.293972] [<f882c32d>] nfs4_delay+0x2d/0x70 [nfs]
> [415723.293984] [<f882c50a>] nfs4_handle_exception+0xfa/0x180 [nfs]
> [415723.293997] [<f882e47e>] nfs4_proc_access+0x4e/0x60 [nfs]
> [415723.294006] [<f8813edd>] nfs_do_access+0x7d/0xc0 [nfs]
> [415723.294015] [<f8813f98>] nfs_permission+0x78/0x1a0 [nfs]
> [415723.294020] [<c022309a>] ? do_lookup+0x7a/0x1c0
> [415723.294023] [<c02212ea>] exec_permission+0x2a/0x90
> [415723.294027] [<c0223867>] link_path_walk+0x67/0x890
> [415723.294032] [<c012ce18>] ? default_spin_lock_flags+0x8/0x10
> [415723.294036] [<c02241b1>] path_walk+0x51/0xc0
> [415723.294039] [<c0224339>] do_path_lookup+0x59/0x90
> [415723.294042] [<c0224e81>] user_path_at+0x41/0x80
> [415723.294046] [<c01f46c4>] ? handle_mm_fault+0x2d4/0x400
> [415723.294050] [<c021c9da>] vfs_fstatat+0x3a/0x70
> [415723.294054] [<c05cc3fd>] ? do_page_fault+0x1cd/0x440
> [415723.294057] [<c021cb30>] vfs_stat+0x20/0x30
> [415723.294060] [<c021cb59>] sys_stat64+0x19/0x30
> [415723.294064] [<c0151519>] ? irq_exit+0x39/0x70
> [415723.294068] [<c05c9a40>] ? do_device_not_available+0x0/0x60
> [415723.294072] [<c0104279>] ? math_state_restore+0x39/0x60
> [415723.294075] [<c05c9a8d>] ? do_device_not_available+0x4d/0x60
> [415723.294078] [<c05c90a4>] syscall_call+0x7/0xb
> [415723.294087] pidgin D f373fcd4 0 4191 4057 0x00000000
> [415723.294091] f373fd14 00000086 00000080 f373fcd4 c460e000 ffffff81 c08c3700 c08c3700
> [415723.294095] 3084e28e 00017a17 c08c3700 c08c3700 00000001 00017a17 00000000 c08c3700
> [415723.294100] c08c3700 f42f2610 f7090000 f7090000 f7090000 f373fd34 f373fd78 c05c72df
> [415723.294104] Call Trace:
> [415723.294108] [<c05c72df>] schedule_timeout+0x12f/0x270
> [415723.294123] [<f882e3b1>] ? _nfs4_proc_access+0xf1/0x170 [nfs]
> [415723.294126] [<c01589c0>] ? process_timeout+0x0/0x10
> [415723.294130] [<c05c745a>] schedule_timeout_killable+0x1a/0x20
> [415723.294143] [<f882c32d>] nfs4_delay+0x2d/0x70 [nfs]
> [415723.294155] [<f882c50a>] nfs4_handle_exception+0xfa/0x180 [nfs]
> [415723.294168] [<f882e47e>] nfs4_proc_access+0x4e/0x60 [nfs]
> [415723.294177] [<f8813edd>] nfs_do_access+0x7d/0xc0 [nfs]
> [415723.294186] [<f8813f98>] nfs_permission+0x78/0x1a0 [nfs]
> [415723.294190] [<c022309a>] ? do_lookup+0x7a/0x1c0
> [415723.294193] [<c02212ea>] exec_permission+0x2a/0x90
> [415723.294196] [<c0223867>] link_path_walk+0x67/0x890
> [415723.294201] [<c0139665>] ? update_curr+0x175/0x2a0
> [415723.294204] [<c02241b1>] path_walk+0x51/0xc0
> [415723.294208] [<c0224339>] do_path_lookup+0x59/0x90
> [415723.294211] [<c0224e81>] user_path_at+0x41/0x80
> [415723.294216] [<c03551a3>] ? rb_insert_color+0xd3/0x110
> [415723.294218] [<c05c8b6d>] ? _raw_spin_lock+0xd/0x10
> [415723.294222] [<c01abfc3>] ? rcu_report_qs_rnp+0xb3/0x100
> [415723.294225] [<c01ac6b7>] ? __rcu_process_callbacks+0x47/0x2d0
> [415723.294228] [<c021c9da>] vfs_fstatat+0x3a/0x70
> [415723.294231] [<c021cb30>] vfs_stat+0x20/0x30
> [415723.294234] [<c021cb59>] sys_stat64+0x19/0x30
> [415723.294239] [<c016fb56>] ? getnstimeofday+0x56/0x120
> [415723.294242] [<c035a249>] ? copy_to_user+0x39/0x130
> [415723.294245] [<c016fc76>] ? do_gettimeofday+0x16/0x40
> [415723.294249] [<c0150146>] ? sys_gettimeofday+0x36/0x70
> [415723.294252] [<c05c90a4>] syscall_call+0x7/0xb
> [415723.294284] vinagre D d9423cd4 0 8801 1 0x00000000
> [415723.294288] d9423d14 00200086 00000080 d9423cd4 c460c000 ffffff81 c08c3700 c08c3700
> [415723.294293] 6ada7549 00017a16 c08c3700 c08c3700 00000000 00017a16 00000000 c08c3700
> [415723.294297] c08c3700 efa6e580 c08fc500 c08fc500 c08fc500 d9423d34 d9423d78 c05c72df
> [415723.294302] Call Trace:
> [415723.294306] [<c05c72df>] schedule_timeout+0x12f/0x270
> [415723.294310] [<c01589c0>] ? process_timeout+0x0/0x10
> [415723.294313] [<c05c745a>] schedule_timeout_killable+0x1a/0x20
> [415723.294328] [<f882c32d>] nfs4_delay+0x2d/0x70 [nfs]
> [415723.294340] [<f882c50a>] nfs4_handle_exception+0xfa/0x180 [nfs]
> [415723.294353] [<f882e47e>] nfs4_proc_access+0x4e/0x60 [nfs]
> [415723.294362] [<f8813edd>] nfs_do_access+0x7d/0xc0 [nfs]
> [415723.294371] [<f8813f98>] nfs_permission+0x78/0x1a0 [nfs]
> [415723.294375] [<c022309a>] ? do_lookup+0x7a/0x1c0
> [415723.294378] [<c02212ea>] exec_permission+0x2a/0x90
> [415723.294381] [<c0223867>] link_path_walk+0x67/0x890
> [415723.294385] [<c02241b1>] path_walk+0x51/0xc0
> [415723.294388] [<c0224339>] do_path_lookup+0x59/0x90
> [415723.294391] [<c0224e81>] user_path_at+0x41/0x80
> [415723.294395] [<c0142a11>] ? try_to_wake_up+0xa1/0x3b0
> [415723.294399] [<c01222db>] ? lapic_next_event+0x1b/0x20
> [415723.294402] [<c0175802>] ? tick_dev_program_event+0x42/0x150
> [415723.294406] [<c0169bd2>] ? __run_hrtimer+0x92/0x190
> [415723.294409] [<c021c9da>] vfs_fstatat+0x3a/0x70
> [415723.294411] [<c021cb30>] vfs_stat+0x20/0x30
> [415723.294414] [<c021cb59>] sys_stat64+0x19/0x30
> [415723.294418] [<c016fb56>] ? getnstimeofday+0x56/0x120
> [415723.294421] [<c035a249>] ? copy_to_user+0x39/0x130
> [415723.294424] [<c016fc76>] ? do_gettimeofday+0x16/0x40
> [415723.294427] [<c0150146>] ? sys_gettimeofday+0x36/0x70
> [415723.294430] [<c05c90a4>] syscall_call+0x7/0xb
> [415723.294433] rhythmbox D f7090000 0 27043 1 0x00000000
> [415723.294437] d5dc7cac 00200086 c01588cc f7090000 d5dc7c70 c05d89e0 c08c3700 c08c3700
> [415723.294441] a4861998 00017a18 c08c3700 c08c3700 a485c4cb 00017a18 00000000 c08c3700
> [415723.294446] c08c3700 ef950000 d5dc7cb4 d5dc7ce0 00000000 d5dc7ce8 d5dc7cb4 f8256c0c
> [415723.294450] Call Trace:
> [415723.294453] [<c01588cc>] ? lock_timer_base+0x2c/0x60
> [415723.294473] [<f8256c0c>] rpc_wait_bit_killable+0x1c/0x40 [sunrpc]
> [415723.294476] [<c05c761d>] __wait_on_bit+0x4d/0x70
> [415723.294490] [<f8256bf0>] ? rpc_wait_bit_killable+0x0/0x40 [sunrpc]
> [415723.294502] [<f8256bf0>] ? rpc_wait_bit_killable+0x0/0x40 [sunrpc]
> [415723.294506] [<c05c76eb>] out_of_line_wait_on_bit+0xab/0xc0
> [415723.294510] [<c0165e60>] ? wake_bit_function+0x0/0x50
> [415723.294525] [<f825731b>] __rpc_execute+0xdb/0x250 [sunrpc]
> [415723.294538] [<f8256a17>] ? rpc_init_task+0xd7/0x120 [sunrpc]
> [415723.294552] [<f82574fe>] rpc_execute+0x6e/0x80 [sunrpc]
> [415723.294563] [<f82509af>] rpc_run_task+0x1f/0x30 [sunrpc]
> [415723.294575] [<f8250abe>] rpc_call_sync+0x3e/0x60 [sunrpc]
> [415723.294591] [<f882f5e2>] _nfs4_call_sync+0x22/0x30 [nfs]
> [415723.294604] [<f882e3a3>] _nfs4_proc_access+0xe3/0x170 [nfs]
> [415723.294617] [<f882e469>] nfs4_proc_access+0x39/0x60 [nfs]
> [415723.294626] [<f8813edd>] nfs_do_access+0x7d/0xc0 [nfs]
> [415723.294635] [<f8813f98>] nfs_permission+0x78/0x1a0 [nfs]
> [415723.294639] [<c022309a>] ? do_lookup+0x7a/0x1c0
> [415723.294642] [<c02212ea>] exec_permission+0x2a/0x90
> [415723.294645] [<c0223867>] link_path_walk+0x67/0x890
> [415723.294649] [<c02241b1>] path_walk+0x51/0xc0
> [415723.294652] [<c0224339>] do_path_lookup+0x59/0x90
> [415723.294655] [<c0224e81>] user_path_at+0x41/0x80
> [415723.294659] [<c016ffbd>] ? ktime_get_ts+0xed/0x120
> [415723.294662] [<c016fb56>] ? getnstimeofday+0x56/0x120
> [415723.294666] [<c024ad63>] sys_inotify_add_watch+0x63/0x100
> [415723.294669] [<c0228829>] ? sys_poll+0x59/0xc0
> [415723.294672] [<c05c90a4>] syscall_call+0x7/0xb
> [415723.294681] clock-applet D ce1f5cd4 0 29323 1 0x00000000
> [415723.294684] ce1f5d14 00200086 00000080 ce1f5cd4 f5fa6000 ffffff81 c08c3700 c08c3700
> [415723.294689] e8a6c632 00017a18 c08c3700 c08c3700 00000000 00017a18 f5e67e00 c08c3700
> [415723.294693] c08c3700 f68bd8d0 c08fc500 c08fc500 c08fc500 ce1f5d34 ce1f5d78 c05c72df
> [415723.294698] Call Trace:
> [415723.294701] [<c05c72df>] schedule_timeout+0x12f/0x270
> [415723.294705] [<c01589c0>] ? process_timeout+0x0/0x10
> [415723.294709] [<c05c745a>] schedule_timeout_killable+0x1a/0x20
> [415723.294722] [<f882c32d>] nfs4_delay+0x2d/0x70 [nfs]
> [415723.294734] [<f882c50a>] nfs4_handle_exception+0xfa/0x180 [nfs]
> [415723.294747] [<f882e47e>] nfs4_proc_access+0x4e/0x60 [nfs]
> [415723.294756] [<f8813edd>] nfs_do_access+0x7d/0xc0 [nfs]
> [415723.294765] [<f8813f98>] nfs_permission+0x78/0x1a0 [nfs]
> [415723.294770] [<c02306cf>] ? mntput_no_expire+0x1f/0xd0
> [415723.294773] [<c02212ea>] exec_permission+0x2a/0x90
> [415723.294776] [<c0223867>] link_path_walk+0x67/0x890
> [415723.294780] [<c02241b1>] path_walk+0x51/0xc0
> [415723.294783] [<c0224339>] do_path_lookup+0x59/0x90
> [415723.294787] [<c0224e81>] user_path_at+0x41/0x80
> [415723.294802] [<c04f5f4b>] ? net_rx_action+0x1ab/0x220
> [415723.294805] [<c021c9da>] vfs_fstatat+0x3a/0x70
> [415723.294809] [<c0124fc4>] ? ack_apic_level+0x64/0x1f0
> [415723.294812] [<c021cb30>] vfs_stat+0x20/0x30
> [415723.294815] [<c021cb59>] sys_stat64+0x19/0x30
> [415723.294818] [<c016fb56>] ? getnstimeofday+0x56/0x120
> [415723.294821] [<c035a249>] ? copy_to_user+0x39/0x130
> [415723.294824] [<c016fc76>] ? do_gettimeofday+0x16/0x40
> [415723.294828] [<c0150146>] ? sys_gettimeofday+0x36/0x70
> [415723.294831] [<c05c90a4>] syscall_call+0x7/0xb
> [415723.294833] evolution D f20e9cd4 0 29359 1 0x00000000
> [415723.294837] f20e9d14 00000086 00000080 f20e9cd4 c4478000 ffffff81 c08c3700 c08c3700
> [415723.294841] ee19e4de 00017a15 c08c3700 c08c3700 00000000 00017a15 00000000 c08c3700
> [415723.294846] c08c3700 d049b2c0 c08fc500 c08fc500 c08fc500 f20e9d34 f20e9d78 c05c72df
> [415723.294850] Call Trace:
> [415723.294854] [<c05c72df>] schedule_timeout+0x12f/0x270
> [415723.294857] [<c01589c0>] ? process_timeout+0x0/0x10
> [415723.294861] [<c05c745a>] schedule_timeout_killable+0x1a/0x20
> [415723.294874] [<f882c32d>] nfs4_delay+0x2d/0x70 [nfs]
> [415723.294886] [<f882c50a>] nfs4_handle_exception+0xfa/0x180 [nfs]
> [415723.294899] [<f882e47e>] nfs4_proc_access+0x4e/0x60 [nfs]
> [415723.294908] [<f8813edd>] nfs_do_access+0x7d/0xc0 [nfs]
> [415723.294917] [<f8813f98>] nfs_permission+0x78/0x1a0 [nfs]
> [415723.294922] [<c022309a>] ? do_lookup+0x7a/0x1c0
> [415723.294926] [<c02212ea>] exec_permission+0x2a/0x90
> [415723.294929] [<c0223867>] link_path_walk+0x67/0x890
> [415723.294933] [<c02241b1>] path_walk+0x51/0xc0
> [415723.294936] [<c0224339>] do_path_lookup+0x59/0x90
> [415723.294939] [<c0224e81>] user_path_at+0x41/0x80
> [415723.294942] [<c021c9da>] vfs_fstatat+0x3a/0x70
> [415723.294945] [<c021cb30>] vfs_stat+0x20/0x30
> [415723.294948] [<c021cb59>] sys_stat64+0x19/0x30
> [415723.294951] [<c05c90a4>] syscall_call+0x7/0xb
> [415723.294954] [<c01698a8>] ? lock_hrtimer_base+0x28/0x50
> [415723.294960] uptrack-upgra D e9fa5cfc 0 11865 11864 0x00000000
> [415723.294964] e9fa5d0c 00000086 00000002 e9fa5cfc 00017a19 c05d89e0 c08c3700 c08c3700
> [415723.294968] 1ed1c8f3 00017a19 c08c3700 c08c3700 1ed196bd 00017a19 00000000 c08c3700
> [415723.294973] c08c3700 c5a558d0 00000001 e9fa5d40 00000000 e9fa5d48 e9fa5d14 f88198cc
> [415723.294977] Call Trace:
> [415723.294988] [<f88198cc>] nfs_wait_bit_killable+0x1c/0x40 [nfs]
> [415723.294992] [<c05c761d>] __wait_on_bit+0x4d/0x70
> [415723.295002] [<f88198b0>] ? nfs_wait_bit_killable+0x0/0x40 [nfs]
> [415723.295011] [<f88198b0>] ? nfs_wait_bit_killable+0x0/0x40 [nfs]
> [415723.295015] [<c05c76eb>] out_of_line_wait_on_bit+0xab/0xc0
> [415723.295018] [<c0165e60>] ? wake_bit_function+0x0/0x50
> [415723.295031] [<f882c407>] nfs4_wait_clnt_recover+0x37/0x40 [nfs]
> [415723.295044] [<f882c4d3>] nfs4_handle_exception+0xc3/0x180 [nfs]
> [415723.295057] [<f883053e>] nfs4_do_open+0x8e/0xf0 [nfs]
> [415723.295070] [<f8830896>] nfs4_atomic_open+0x96/0x180 [nfs]
> [415723.295079] [<f8813173>] nfs_atomic_lookup+0x93/0x100 [nfs]
> [415723.295082] [<c05c8b6d>] ? _raw_spin_lock+0xd/0x10
> [415723.295085] [<c022b797>] ? d_alloc+0x117/0x170
> [415723.295089] [<c022317b>] do_lookup+0x15b/0x1c0
> [415723.295092] [<c022361d>] do_last+0x24d/0x3a0
> [415723.295096] [<c02251bd>] do_filp_open+0x19d/0x4c0
> [415723.295101] [<c0216a65>] do_sys_open+0x55/0x150
> [415723.295105] [<c0216bce>] sys_open+0x2e/0x40
> [415723.295107] [<c05c90a4>] syscall_call+0x7/0xb
> [415723.295110] gnome-screens D ed361cd4 0 23902 4269 0x00000000
> [415723.295114] ed361d14 00000086 00000080 ed361cd4 f5fa6000 ffffff81 c08c3700 c08c3700
> [415723.295118] cc6ee762 00017a17 c08c3700 c08c3700 00000001 00017a17 f5e67e00 c08c3700
> [415723.295123] c08c3700 f344f230 f7090000 f7090000 f7090000 ed361d34 ed361d78 c05c72df
> [415723.295127] Call Trace:
> [415723.295131] [<c05c72df>] schedule_timeout+0x12f/0x270
> [415723.295144] [<f882e3b1>] ? _nfs4_proc_access+0xf1/0x170 [nfs]
> [415723.295147] [<c01589c0>] ? process_timeout+0x0/0x10
> [415723.295151] [<c05c745a>] schedule_timeout_killable+0x1a/0x20
> [415723.295164] [<f882c32d>] nfs4_delay+0x2d/0x70 [nfs]
> [415723.295176] [<f882c50a>] nfs4_handle_exception+0xfa/0x180 [nfs]
> [415723.295189] [<f882e47e>] nfs4_proc_access+0x4e/0x60 [nfs]
> [415723.295197] [<f8813edd>] nfs_do_access+0x7d/0xc0 [nfs]
> [415723.295206] [<f8813f98>] nfs_permission+0x78/0x1a0 [nfs]
> [415723.295210] [<c022309a>] ? do_lookup+0x7a/0x1c0
> [415723.295213] [<c02212ea>] exec_permission+0x2a/0x90
> [415723.295216] [<c0223867>] link_path_walk+0x67/0x890
> [415723.295220] [<c02241b1>] path_walk+0x51/0xc0
> [415723.295223] [<c0224339>] do_path_lookup+0x59/0x90
> [415723.295227] [<c0224e81>] user_path_at+0x41/0x80
> [415723.295230] [<c0218534>] ? do_sync_read+0xa4/0xe0
> [415723.295234] [<c021c9da>] vfs_fstatat+0x3a/0x70
> [415723.295236] [<c021cb30>] vfs_stat+0x20/0x30
> [415723.295239] [<c021cb59>] sys_stat64+0x19/0x30
> [415723.295243] [<c0219f9d>] ? fput+0x1d/0x30
> [415723.295246] [<c021691c>] ? filp_close+0x4c/0x80
> [415723.295249] [<c02169c5>] ? sys_close+0x75/0xc0
> [415723.295252] [<c05c90a4>] syscall_call+0x7/0xb
> [415723.295255] lsb_release D 00000095 0 30540 30529 0x00000000
> [415723.295259] da947f2c 00000086 00000000 00000095 f42b0000 c2ce14c0 c08c3700 c08c3700
> [415723.295263] 33bfe72a 000179a8 c08c3700 c08c3700 00000001 000179a8 f5db4c00 c08c3700
> [415723.295268] c08c3700 f1733f70 dc1d4008 cafac954 ffffffff cafac95c da947f58 c05c7bbc
> [415723.295272] Call Trace:
> [415723.295276] [<c05c7bbc>] __mutex_lock_killable_slowpath+0xdc/0x160
> [415723.295280] [<c05cc3fd>] ? do_page_fault+0x1cd/0x440
> [415723.295283] [<c05c7c74>] mutex_lock_killable+0x34/0x40
> [415723.295286] [<c0227494>] vfs_readdir+0x64/0xb0
> [415723.295289] [<c0227190>] ? filldir64+0x0/0xf0
> [415723.295292] [<c0227549>] sys_getdents64+0x69/0xc0
> [415723.295294] [<c05c90a4>] syscall_call+0x7/0xb
> [415723.295302] Sched Debug Version: v0.09, 2.6.35-22-generic #33-Ubuntu
>
> A reboot was required to resolve this.
>
> Today, I found similar (although no errors about the state manager):
>
> [64676.088847] SysRq : Show Blocked State
> [64676.092628] task PC stack pid father
> [64676.092652] pulseaudio D f36b1d34 0 3323 1 0x00000000
> [64676.092652] f36b1d44 00000086 00000002 f36b1d34 f879c917 c05d89e0 c08c3700 c08c3700
> [64676.092652] 8e643a96 00003acf c08c3700 c08c3700 8e6404cd 00003acf 00000000 c08c3700
> [64676.092652] c08c3700 f3418000 00000001 c08fc500 c08fc500 f36b1d64 f36b1da8 c05c72df
> [64676.092652] Call Trace:
> [64676.092652] [<f879c917>] ? rpc_put_task+0x77/0x80 [sunrpc]
> [64676.092652] [<c05c72df>] schedule_timeout+0x12f/0x270
> [64676.092652] [<c01589c0>] ? process_timeout+0x0/0x10
> [64676.092652] [<c05c745a>] schedule_timeout_killable+0x1a/0x20
> [64676.092652] [<f8c5a32d>] nfs4_delay+0x2d/0x70 [nfs]
> [64676.092652] [<f8c5a50a>] nfs4_handle_exception+0xfa/0x180 [nfs]
> [64676.092652] [<f8c5c47e>] nfs4_proc_access+0x4e/0x60 [nfs]
> [64676.092652] [<f8c41edd>] nfs_do_access+0x7d/0xc0 [nfs]
> [64676.092652] [<f8c41f98>] nfs_permission+0x78/0x1a0 [nfs]
> [64676.092652] [<c022309a>] ? do_lookup+0x7a/0x1c0
> [64676.092652] [<c02212ea>] exec_permission+0x2a/0x90
> [64676.092652] [<c0223867>] link_path_walk+0x67/0x890
> [64676.092652] [<c02306cf>] ? mntput_no_expire+0x1f/0xd0
> [64676.092652] [<c022510d>] do_filp_open+0xed/0x4c0
> [64676.092652] [<f8050b98>] ? nv_napi_poll+0x148/0x2e0 [forcedeth]
> [64676.092652] [<c01512cc>] ? __do_softirq+0xec/0x1b0
> [64676.092652] [<c022f0fd>] ? alloc_fd+0xbd/0xf0
> [64676.092652] [<c0216a65>] do_sys_open+0x55/0x150
> [64676.092652] [<c05cf665>] ? do_IRQ+0x55/0xc0
> [64676.092652] [<c0216bce>] sys_open+0x2e/0x40
> [64676.092652] [<c05c90a4>] syscall_call+0x7/0xb
> [64676.092652] rhythmbox D f3731d5c 0 3342 3227 0x00000000
> [64676.092652] f3731d6c 00000086 00000002 f3731d5c f879c917 c05d89e0 c08c3700 c08c3700
> [64676.092652] 6c3d30ab 00003ad2 c08c3700 c08c3700 6c3cf7c6 00003ad2 00000000 c08c3700
> [64676.092652] c08c3700 f36f2610 00000001 c08fc500 c08fc500 f3731d8c f3731dd0 c05c72df
> [64676.092652] Call Trace:
> [64676.092652] [<f879c917>] ? rpc_put_task+0x77/0x80 [sunrpc]
> [64676.092652] [<c05c72df>] schedule_timeout+0x12f/0x270
> [64676.092652] [<c01589c0>] ? process_timeout+0x0/0x10
> [64676.092652] [<c05c745a>] schedule_timeout_killable+0x1a/0x20
> [64676.092652] [<f8c5a32d>] nfs4_delay+0x2d/0x70 [nfs]
> [64676.092652] [<f8c5a50a>] nfs4_handle_exception+0xfa/0x180 [nfs]
> [64676.092652] [<f8c5c47e>] nfs4_proc_access+0x4e/0x60 [nfs]
> [64676.092652] [<f8c41edd>] nfs_do_access+0x7d/0xc0 [nfs]
> [64676.092652] [<f8c41f98>] nfs_permission+0x78/0x1a0 [nfs]
> [64676.092652] [<c022309a>] ? do_lookup+0x7a/0x1c0
> [64676.092652] [<c02212ea>] exec_permission+0x2a/0x90
> [64676.092652] [<c0223867>] link_path_walk+0x67/0x890
> [64676.092652] [<c02241b1>] path_walk+0x51/0xc0
> [64676.092652] [<c0224339>] do_path_lookup+0x59/0x90
> [64676.092652] [<c0224e81>] user_path_at+0x41/0x80
> [64676.092652] [<c032e626>] ? apparmor_file_permission+0x16/0x20
> [64676.092652] [<c016ffbd>] ? ktime_get_ts+0xed/0x120
> [64676.092652] [<c016fb56>] ? getnstimeofday+0x56/0x120
> [64676.092652] [<c024ad63>] sys_inotify_add_watch+0x63/0x100
> [64676.092652] [<c0228829>] ? sys_poll+0x59/0xc0
> [64676.092652] [<c05c90a4>] syscall_call+0x7/0xb
> [64676.092652] pidgin D f37b7d04 0 3354 3227 0x00000000
> [64676.092652] f37b7d14 00000086 00000002 f37b7d04 f879c917 c05d89e0 c08c3700 c08c3700
> [64676.092652] 680f2ffc 00003ad0 c08c3700 c08c3700 680efa47 00003ad0 00000000 c08c3700
> [64676.092652] c08c3700 f36f58d0 00000001 c08fc500 c08fc500 f37b7d34 f37b7d78 c05c72df
> [64676.092652] Call Trace:
> [64676.092652] [<f879c917>] ? rpc_put_task+0x77/0x80 [sunrpc]
> [64676.092652] [<c05c72df>] schedule_timeout+0x12f/0x270
> [64676.092652] [<c01589c0>] ? process_timeout+0x0/0x10
> [64676.092652] [<c05c745a>] schedule_timeout_killable+0x1a/0x20
> [64676.092652] [<f8c5a32d>] nfs4_delay+0x2d/0x70 [nfs]
> [64676.092652] [<f8c5a50a>] nfs4_handle_exception+0xfa/0x180 [nfs]
> [64676.092652] [<f8c5c47e>] nfs4_proc_access+0x4e/0x60 [nfs]
> [64676.092652] [<f8c41edd>] nfs_do_access+0x7d/0xc0 [nfs]
> [64676.092652] [<f8c41f98>] nfs_permission+0x78/0x1a0 [nfs]
> [64676.092652] [<c022309a>] ? do_lookup+0x7a/0x1c0
> [64676.092652] [<c02212ea>] exec_permission+0x2a/0x90
> [64676.092652] [<c0223867>] link_path_walk+0x67/0x890
> [64676.092652] [<c02241b1>] path_walk+0x51/0xc0
> [64676.092652] [<c0224339>] do_path_lookup+0x59/0x90
> [64676.092652] [<c0224e81>] user_path_at+0x41/0x80
> [64676.092652] [<c021c9da>] vfs_fstatat+0x3a/0x70
> [64676.092652] [<c021cb30>] vfs_stat+0x20/0x30
> [64676.092652] [<c021cb59>] sys_stat64+0x19/0x30
> [64676.092652] [<c016fb56>] ? getnstimeofday+0x56/0x120
> [64676.092652] [<c035a249>] ? copy_to_user+0x39/0x130
> [64676.092652] [<c016fc76>] ? do_gettimeofday+0x16/0x40
> [64676.092652] [<c0150146>] ? sys_gettimeofday+0x36/0x70
> [64676.092652] [<c05c90a4>] syscall_call+0x7/0xb
> [64676.092652] sensors-apple D f0cd9c44 0 3778 1 0x00000000
> [64676.092652] f0cd9c54 00000086 00000002 f0cd9c44 f0cd9c18 c05d89e0 c08c3700 c08c3700
> [64676.092652] 5929818f 00003ad2 c08c3700 c08c3700 59294ecf 00003ad2 00000000 c08c3700
> [64676.092652] c08c3700 f1abd8d0 00000001 f0cd9c88 00000000 f0cd9c90 f0cd9c5c f879cc0c
> [64676.092652] Call Trace:
> [64676.092652] [<f879cc0c>] rpc_wait_bit_killable+0x1c/0x40 [sunrpc]
> [64676.092652] [<c05c761d>] __wait_on_bit+0x4d/0x70
> [64676.092652] [<f879cbf0>] ? rpc_wait_bit_killable+0x0/0x40 [sunrpc]
> [64676.092652] [<f879cbf0>] ? rpc_wait_bit_killable+0x0/0x40 [sunrpc]
> [64676.092652] [<c05c76eb>] out_of_line_wait_on_bit+0xab/0xc0
> [64676.092652] [<c0165e60>] ? wake_bit_function+0x0/0x50
> [64676.092652] [<f879d31b>] __rpc_execute+0xdb/0x250 [sunrpc]
> [64676.092652] [<f879ca17>] ? rpc_init_task+0xd7/0x120 [sunrpc]
> [64676.092652] [<c01449b8>] ? dequeue_entity+0x1c8/0x210
> [64676.092652] [<f879d4fe>] rpc_execute+0x6e/0x80 [sunrpc]
> [64676.092652] [<f87969af>] rpc_run_task+0x1f/0x30 [sunrpc]
> [64676.092652] [<f8796abe>] rpc_call_sync+0x3e/0x60 [sunrpc]
> [64676.092652] [<f8c5d5e2>] _nfs4_call_sync+0x22/0x30 [nfs]
> [64676.092652] [<f8c5c3a3>] _nfs4_proc_access+0xe3/0x170 [nfs]
> [64676.092652] [<f8c5c469>] nfs4_proc_access+0x39/0x60 [nfs]
> [64676.092652] [<f8c41edd>] nfs_do_access+0x7d/0xc0 [nfs]
> [64676.092652] [<f8c41f98>] nfs_permission+0x78/0x1a0 [nfs]
> [64676.092652] [<c02306cf>] ? mntput_no_expire+0x1f/0xd0
> [64676.092652] [<c02212ea>] exec_permission+0x2a/0x90
> [64676.092652] [<c0223867>] link_path_walk+0x67/0x890
> [64676.092652] [<c02241b1>] path_walk+0x51/0xc0
> [64676.092652] [<c0224339>] do_path_lookup+0x59/0x90
> [64676.092652] [<c0224e81>] user_path_at+0x41/0x80
> [64676.092652] [<c0139665>] ? update_curr+0x175/0x2a0
> [64676.092652] [<c016bf14>] ? sched_clock_local+0xa4/0x180
> [64676.092652] [<c0139cb6>] ? __dequeue_entity+0x26/0x50
> [64676.092652] [<c021c9da>] vfs_fstatat+0x3a/0x70
> [64676.092652] [<c021cb30>] vfs_stat+0x20/0x30
> [64676.092652] [<c021cb59>] sys_stat64+0x19/0x30
> [64676.092652] [<c016fb56>] ? getnstimeofday+0x56/0x120
> [64676.092652] [<c05c90a4>] syscall_call+0x7/0xb
> [64676.092652] Sched Debug Version: v0.09, 2.6.35-22-generic #33-Ubuntu
>
> Any ideas what's going on here? If there isn't enough info here, I'd
> more than happy to gather and post whatever is needed.

Error 127 is EKEYEXPIRED. It means that the RPCSEC_GSS context for at
least one of your threads has expired.

Sigh... It looks as if we have a really poor handling of that in the
recovery threads.

Does the following patch help?

Cheers
Trond
-------------------------------------------------------------------------------
NFSv4: The state manager must ignore EKEYEXPIRED.

From: Trond Myklebust <[email protected]>

Otherwise, we cannot recover state correctly.

Signed-off-by: Trond Myklebust <[email protected]>
---

fs/nfs/nfs4proc.c | 27 ++++++++++++++++++---------
fs/nfs/nfs4state.c | 23 ++++++++++++++++++++++-
2 files changed, 40 insertions(+), 10 deletions(-)


diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 74aa54e..2edc3ec 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -1188,7 +1188,7 @@ static int nfs4_do_open_reclaim(struct nfs_open_context *ctx, struct nfs4_state
int err;
do {
err = _nfs4_do_open_reclaim(ctx, state);
- if (err != -NFS4ERR_DELAY && err != -EKEYEXPIRED)
+ if (err != -NFS4ERR_DELAY)
break;
nfs4_handle_exception(server, err, &exception);
} while (exception.retry);
@@ -1258,6 +1258,13 @@ int nfs4_open_delegation_recall(struct nfs_open_context *ctx, struct nfs4_state
case -NFS4ERR_ADMIN_REVOKED:
case -NFS4ERR_BAD_STATEID:
nfs4_state_mark_reclaim_nograce(server->nfs_client, state);
+ case -EKEYEXPIRED:
+ /*
+ * User RPCSEC_GSS context has expired.
+ * We cannot recover this stateid now, so
+ * skip it and allow recovery thread to
+ * proceed.
+ */
case -ENOMEM:
err = 0;
goto out;
@@ -1605,7 +1612,6 @@ static int nfs4_do_open_expired(struct nfs_open_context *ctx, struct nfs4_state
goto out;
case -NFS4ERR_GRACE:
case -NFS4ERR_DELAY:
- case -EKEYEXPIRED:
nfs4_handle_exception(server, err, &exception);
err = 0;
}
@@ -3623,7 +3629,6 @@ int nfs4_proc_setclientid_confirm(struct nfs_client *clp,
case -NFS4ERR_RESOURCE:
/* The IBM lawyers misread another document! */
case -NFS4ERR_DELAY:
- case -EKEYEXPIRED:
err = nfs4_delay(clp->cl_rpcclient, &timeout);
}
} while (err == 0);
@@ -4238,7 +4243,7 @@ static int nfs4_lock_reclaim(struct nfs4_state *state, struct file_lock *request
if (test_bit(NFS_DELEGATED_STATE, &state->flags) != 0)
return 0;
err = _nfs4_do_setlk(state, F_SETLK, request, NFS_LOCK_RECLAIM);
- if (err != -NFS4ERR_DELAY && err != -EKEYEXPIRED)
+ if (err != -NFS4ERR_DELAY)
break;
nfs4_handle_exception(server, err, &exception);
} while (exception.retry);
@@ -4263,7 +4268,6 @@ static int nfs4_lock_expired(struct nfs4_state *state, struct file_lock *request
goto out;
case -NFS4ERR_GRACE:
case -NFS4ERR_DELAY:
- case -EKEYEXPIRED:
nfs4_handle_exception(server, err, &exception);
err = 0;
}
@@ -4409,13 +4413,21 @@ int nfs4_lock_delegation_recall(struct nfs4_state *state, struct file_lock *fl)
nfs4_state_mark_reclaim_nograce(server->nfs_client, state);
err = 0;
goto out;
+ case -EKEYEXPIRED:
+ /*
+ * User RPCSEC_GSS context has expired.
+ * We cannot recover this stateid now, so
+ * skip it and allow recovery thread to
+ * proceed.
+ */
+ err = 0;
+ goto out;
case -ENOMEM:
case -NFS4ERR_DENIED:
/* kill_proc(fl->fl_pid, SIGLOST, 1); */
err = 0;
goto out;
case -NFS4ERR_DELAY:
- case -EKEYEXPIRED:
break;
}
err = nfs4_handle_exception(server, err, &exception);
@@ -4644,7 +4656,6 @@ static void nfs4_get_lease_time_done(struct rpc_task *task, void *calldata)
switch (task->tk_status) {
case -NFS4ERR_DELAY:
case -NFS4ERR_GRACE:
- case -EKEYEXPIRED:
dprintk("%s Retry: tk_status %d\n", __func__, task->tk_status);
rpc_delay(task, NFS4_POLL_RETRY_MIN);
task->tk_status = 0;
@@ -5108,7 +5119,6 @@ static int nfs41_sequence_handle_errors(struct rpc_task *task, struct nfs_client
{
switch(task->tk_status) {
case -NFS4ERR_DELAY:
- case -EKEYEXPIRED:
rpc_delay(task, NFS4_POLL_RETRY_MAX);
return -EAGAIN;
default:
@@ -5251,7 +5261,6 @@ static int nfs41_reclaim_complete_handle_errors(struct rpc_task *task, struct nf
case -NFS4ERR_WRONG_CRED: /* What to do here? */
break;
case -NFS4ERR_DELAY:
- case -EKEYEXPIRED:
rpc_delay(task, NFS4_POLL_RETRY_MAX);
return -EAGAIN;
default:
diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index 940cf7c..40028ac 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -1063,6 +1063,14 @@ restart:
/* Mark the file as being 'closed' */
state->state = 0;
break;
+ case -EKEYEXPIRED:
+ /*
+ * User RPCSEC_GSS context has expired.
+ * We cannot recover this stateid now, so
+ * skip it and allow recovery thread to
+ * proceed.
+ */
+ break;
case -NFS4ERR_ADMIN_REVOKED:
case -NFS4ERR_STALE_STATEID:
case -NFS4ERR_BAD_STATEID:
@@ -1181,6 +1189,14 @@ static void nfs4_state_start_reclaim_nograce(struct nfs_client *clp)
nfs4_state_mark_reclaim_helper(clp, nfs4_state_mark_reclaim_nograce);
}

+static void nfs4_warn_keyexpired(const char *s)
+{
+ printk_ratelimited(KERN_WARNING "Error: state manager"
+ " encountered RPCSEC_GSS session"
+ " expired against NFSv4 server %s.\n",
+ s);
+}
+
static int nfs4_recovery_handle_error(struct nfs_client *clp, int error)
{
switch (error) {
@@ -1210,6 +1226,10 @@ static int nfs4_recovery_handle_error(struct nfs_client *clp, int error)
set_bit(NFS4CLNT_SESSION_RESET, &clp->cl_state);
/* Zero session reset errors */
return 0;
+ case -EKEYEXPIRED:
+ /* Nothing we can do */
+ nfs4_warn_keyexpired(clp->cl_hostname);
+ return 0;
}
return error;
}
@@ -1420,9 +1440,10 @@ static void nfs4_set_lease_expired(struct nfs_client *clp, int status)
case -NFS4ERR_DELAY:
case -NFS4ERR_CLID_INUSE:
case -EAGAIN:
- case -EKEYEXPIRED:
break;

+ case -EKEYEXPIRED:
+ nfs4_warn_keyexpired(clp->cl_hostname);
case -NFS4ERR_NOT_SAME: /* FixMe: implement recovery
* in nfs4_exchange_id */
default:


2010-10-30 21:29:27

by Brian J. Murrell

[permalink] [raw]
Subject: Re: Error: state manager failed on NFSv4 server linux with error 127

On Sat, 2010-10-30 at 14:19 -0400, Trond Myklebust wrote:
>
> BTW: Do you have the following patches applied?
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git&a=commitdiff&h=b0ed9dbc24f1fd912b2dd08b995153cafc1d5b1c
> and
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git&a=commitdiff&h=ae1007d37e00144b72906a4bdc47d517ae91bcc1

These patches both deal with recovery issues, which you mentioned seems
to be the state the previously posted stack traces where in also.

Since the server has not been rebooted or even had its export list
reread/reexported, I wonder why recovery would have been triggered, to
cause this client problem.

Does recovery actually get invoked on the client for events other than
an outright restart of NFS on the server? NFS on the server here should
have been entirely stable over the period of time in which this client
went bad.

b.


Attachments:
signature.asc (198.00 B)
This is a digitally signed message part

2010-10-30 21:41:31

by Trond Myklebust

[permalink] [raw]
Subject: Re: Error: state manager failed on NFSv4 server linux with error 127

On Sat, 2010-10-30 at 17:29 -0400, Brian J. Murrell wrote:
> On Sat, 2010-10-30 at 14:19 -0400, Trond Myklebust wrote:
> >
> > BTW: Do you have the following patches applied?
> > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git&a=commitdiff&h=b0ed9dbc24f1fd912b2dd08b995153cafc1d5b1c
> > and
> > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git&a=commitdiff&h=ae1007d37e00144b72906a4bdc47d517ae91bcc1
>
> These patches both deal with recovery issues, which you mentioned seems
> to be the state the previously posted stack traces where in also.
>
> Since the server has not been rebooted or even had its export list
> reread/reexported, I wonder why recovery would have been triggered, to
> cause this client problem.
>
> Does recovery actually get invoked on the client for events other than
> an outright restart of NFS on the server? NFS on the server here should
> have been entirely stable over the period of time in which this client
> went bad.

There are 2 cases which can trigger recovery: server reboot, and network
partition (i.e. a networking fault that causes the client to be unable
to contact the server in time in order to renew its lease).

If none of the above apply, then we need to look at whether it is the
client or the server that is screwed up.

Trond


2010-10-30 21:46:58

by Brian J. Murrell

[permalink] [raw]
Subject: Re: Error: state manager failed on NFSv4 server linux with error 127

On Sat, 2010-10-30 at 17:41 -0400, Trond Myklebust wrote:
>
> There are 2 cases which can trigger recovery: server reboot, and network
> partition (i.e. a networking fault that causes the client to be unable
> to contact the server in time in order to renew its lease).

Yes, seems two cases which I would suspect also.

> If none of the above apply,

None should be applicable. Of course I could never know if the network
"blipped" (but highly doubt even that happened) but there should not
have been an outtage long enough to prevent a lease renewal.

> then we need to look at whether it is the
> client or the server that is screwed up.

Yes, fair enough. I suppose having more than 1 client using the same
server could be sufficient to determine fault at the client or server?
Or is there a more direct route?

b.


Attachments:
signature.asc (198.00 B)
This is a digitally signed message part

2010-10-30 17:41:59

by Brian J. Murrell

[permalink] [raw]
Subject: Re: Error: state manager failed on NFSv4 server linux with error 127

On Sun, 2010-10-17 at 14:35 -0400, Trond Myklebust wrote:
>
> Error 127 is EKEYEXPIRED. It means that the RPCSEC_GSS context for at
> least one of your threads has expired.
>
> Sigh... It looks as if we have a really poor handling of that in the
> recovery threads.
>
> Does the following patch help?

Well, while I don't seem to have had any of the state manager failed
errors, I did have a number of processes blocked in NFS4 today with the
patched kernel running:

[316999.281908] SysRq : Show Blocked State
[316999.285576] task PC stack pid father
[316999.285576] rhythmbox D f32d1c9c 0 12726 12614 0x00000000
[316999.285576] f32d1cac 00000086 00000002 f32d1c9c f32d1c70 c05d89e0 c08c1700 c08c1700
[316999.285576] 0dbaa7a3 0001204f c08c1700 c08c1700 0dba7e9d 0001204f 00000000 c08c1700
[316999.285576] c08c1700 ef127230 00000001 f32d1ce0 00000000 f32d1ce8 f32d1cb4 f83eac0c
[316999.285576] Call Trace:
[316999.285576] [<f83eac0c>] rpc_wait_bit_killable+0x1c/0x40 [sunrpc]
[316999.285576] [<c05c761d>] __wait_on_bit+0x4d/0x70
[316999.285576] [<f83eabf0>] ? rpc_wait_bit_killable+0x0/0x40 [sunrpc]
[316999.285576] [<f83eabf0>] ? rpc_wait_bit_killable+0x0/0x40 [sunrpc]
[316999.285576] [<c05c76eb>] out_of_line_wait_on_bit+0xab/0xc0
[316999.285576] [<c0165e60>] ? wake_bit_function+0x0/0x50
[316999.285576] [<f83eb31b>] __rpc_execute+0xdb/0x250 [sunrpc]
[316999.285576] [<f83eaa17>] ? rpc_init_task+0xd7/0x120 [sunrpc]
[316999.285576] [<f83eb4fe>] rpc_execute+0x6e/0x80 [sunrpc]
[316999.285576] [<f83e49af>] rpc_run_task+0x1f/0x30 [sunrpc]
[316999.285576] [<f83e4abe>] rpc_call_sync+0x3e/0x60 [sunrpc]
[316999.285576] [<f82345d2>] _nfs4_call_sync+0x22/0x30 [nfs]
[316999.285576] [<f82333a3>] _nfs4_proc_access+0xe3/0x170 [nfs]
[316999.285576] [<f8233469>] nfs4_proc_access+0x39/0x60 [nfs]
[316999.285576] [<f8218edd>] nfs_do_access+0x7d/0xc0 [nfs]
[316999.285576] [<f8218f98>] nfs_permission+0x78/0x1a0 [nfs]
[316999.285576] [<c022309a>] ? do_lookup+0x7a/0x1c0
[316999.285576] [<c02212ea>] exec_permission+0x2a/0x90
[316999.285576] [<c0223867>] link_path_walk+0x67/0x890
[316999.285576] [<c02241b1>] path_walk+0x51/0xc0
[316999.285576] [<c0224339>] do_path_lookup+0x59/0x90
[316999.285576] [<c0224e81>] user_path_at+0x41/0x80
[316999.285576] [<c032e626>] ? apparmor_file_permission+0x16/0x20
[316999.285576] [<c016ffbd>] ? ktime_get_ts+0xed/0x120
[316999.285576] [<c016fb56>] ? getnstimeofday+0x56/0x120
[316999.285576] [<c024ad63>] sys_inotify_add_watch+0x63/0x100
[316999.285576] [<c0228829>] ? sys_poll+0x59/0xc0
[316999.285576] [<c05c90a4>] syscall_call+0x7/0xb
[316999.285576] pidgin D dfa25d04 0 12742 12614 0x00000000
[316999.285576] dfa25d14 00000086 00000002 dfa25d04 f83ea917 c05d89e0 c08c1700 c08c1700
[316999.285576] d7f600c4 0001204c c08c1700 c08c1700 d7f5ce5a 0001204c 00000000 c08c1700
[316999.285576] c08c1700 efb46580 00000001 c08fa500 c08fa500 dfa25d34 dfa25d78 c05c72df
[316999.285576] Call Trace:
[316999.285576] [<f83ea917>] ? rpc_put_task+0x77/0x80 [sunrpc]
[316999.285576] [<c05c72df>] schedule_timeout+0x12f/0x270
[316999.285576] [<c01589c0>] ? process_timeout+0x0/0x10
[316999.285576] [<c05c745a>] schedule_timeout_killable+0x1a/0x20
[316999.285576] [<f823132d>] nfs4_delay+0x2d/0x70 [nfs]
[316999.285576] [<f823150a>] nfs4_handle_exception+0xfa/0x180 [nfs]
[316999.285576] [<f823347e>] nfs4_proc_access+0x4e/0x60 [nfs]
[316999.285576] [<f8218edd>] nfs_do_access+0x7d/0xc0 [nfs]
[316999.285576] [<f8218f98>] nfs_permission+0x78/0x1a0 [nfs]
[316999.285576] [<c022309a>] ? do_lookup+0x7a/0x1c0
[316999.285576] [<c02212ea>] exec_permission+0x2a/0x90
[316999.285576] [<c0223867>] link_path_walk+0x67/0x890
[316999.285576] [<c02241b1>] path_walk+0x51/0xc0
[316999.285576] [<c0224339>] do_path_lookup+0x59/0x90
[316999.285576] [<c0224e81>] user_path_at+0x41/0x80
[316999.285576] [<c021c9da>] vfs_fstatat+0x3a/0x70
[316999.285576] [<c021cb30>] vfs_stat+0x20/0x30
[316999.285576] [<c021cb59>] sys_stat64+0x19/0x30
[316999.285576] [<c016fb56>] ? getnstimeofday+0x56/0x120
[316999.285576] [<c035a249>] ? copy_to_user+0x39/0x130
[316999.285576] [<c016fc76>] ? do_gettimeofday+0x16/0x40
[316999.285576] [<c0150146>] ? sys_gettimeofday+0x36/0x70
[316999.285576] [<c05c90a4>] syscall_call+0x7/0xb
[316999.285576] sensors-apple D d1b21d04 0 13205 1 0x00000000
[316999.285576] d1b21d14 00000086 00000002 d1b21d04 f83ea917 c05d89e0 c08c1700 c08c1700
[316999.285576] 5a8cd7b3 0001204c c08c1700 c08c1700 5a8ca8c6 0001204c 00000000 c08c1700
[316999.285576] c08c1700 e02c2610 00000001 c08fa500 c08fa500 d1b21d34 d1b21d78 c05c72df
[316999.285576] Call Trace:
[316999.285576] [<f83ea917>] ? rpc_put_task+0x77/0x80 [sunrpc]
[316999.285576] [<c05c72df>] schedule_timeout+0x12f/0x270
[316999.285576] [<c01589c0>] ? process_timeout+0x0/0x10
[316999.285576] [<c05c745a>] schedule_timeout_killable+0x1a/0x20
[316999.285576] [<f823132d>] nfs4_delay+0x2d/0x70 [nfs]
[316999.285576] [<f823150a>] nfs4_handle_exception+0xfa/0x180 [nfs]
[316999.285576] [<f823347e>] nfs4_proc_access+0x4e/0x60 [nfs]
[316999.285576] [<f8218edd>] nfs_do_access+0x7d/0xc0 [nfs]
[316999.285576] [<f8218f98>] nfs_permission+0x78/0x1a0 [nfs]
[316999.285576] [<c02306cf>] ? mntput_no_expire+0x1f/0xd0
[316999.285576] [<c02212ea>] exec_permission+0x2a/0x90
[316999.285576] [<c0223867>] link_path_walk+0x67/0x890
[316999.285576] [<c02241b1>] path_walk+0x51/0xc0
[316999.285576] [<c0224339>] do_path_lookup+0x59/0x90
[316999.285576] [<c0224e81>] user_path_at+0x41/0x80
[316999.285576] [<c0218534>] ? do_sync_read+0xa4/0xe0
[316999.285576] [<c021c9da>] vfs_fstatat+0x3a/0x70
[316999.285576] [<c02185d2>] ? rw_verify_area+0x62/0xd0
[316999.285576] [<c021cb30>] vfs_stat+0x20/0x30
[316999.285576] [<c021cb59>] sys_stat64+0x19/0x30
[316999.285576] [<c016fb56>] ? getnstimeofday+0x56/0x120
[316999.285576] [<c035a249>] ? copy_to_user+0x39/0x130
[316999.285576] [<c016fc76>] ? do_gettimeofday+0x16/0x40
[316999.285576] [<c0150146>] ? sys_gettimeofday+0x36/0x70
[316999.285576] [<c05c90a4>] syscall_call+0x7/0xb
[316999.285576] update-notifi D eda75d04 0 13269 12614 0x00000000
[316999.285576] eda75d14 00200086 00000002 eda75d04 f83ea917 c05d89e0 c08c1700 c08c1700
[316999.285576] ca8d6759 0001204e c08c1700 c08c1700 ca8d321a 0001204e 00000000 c08c1700
[316999.285576] c08c1700 c0d8b2c0 00000001 f7090000 f7090000 eda75d34 eda75d78 c05c72df
[316999.285576] Call Trace:
[316999.285576] [<f83ea917>] ? rpc_put_task+0x77/0x80 [sunrpc]
[316999.285576] [<c05c72df>] schedule_timeout+0x12f/0x270
[316999.285576] [<c01589c0>] ? process_timeout+0x0/0x10
[316999.285576] [<c05c745a>] schedule_timeout_killable+0x1a/0x20
[316999.285576] [<f823132d>] nfs4_delay+0x2d/0x70 [nfs]
[316999.285576] [<f823150a>] nfs4_handle_exception+0xfa/0x180 [nfs]
[316999.285576] [<f823347e>] nfs4_proc_access+0x4e/0x60 [nfs]
[316999.285576] [<f8218edd>] nfs_do_access+0x7d/0xc0 [nfs]
[316999.285576] [<f8218f98>] nfs_permission+0x78/0x1a0 [nfs]
[316999.285576] [<c022309a>] ? do_lookup+0x7a/0x1c0
[316999.285576] [<c02212ea>] exec_permission+0x2a/0x90
[316999.285576] [<c0223867>] link_path_walk+0x67/0x890
[316999.285576] [<c02241b1>] path_walk+0x51/0xc0
[316999.285576] [<c0224339>] do_path_lookup+0x59/0x90
[316999.285576] [<c0224e81>] user_path_at+0x41/0x80
[316999.285576] [<c0218534>] ? do_sync_read+0xa4/0xe0
[316999.285576] [<c021c9da>] vfs_fstatat+0x3a/0x70
[316999.285576] [<c02185d2>] ? rw_verify_area+0x62/0xd0
[316999.285576] [<c021cb30>] vfs_stat+0x20/0x30
[316999.285576] [<c021cb59>] sys_stat64+0x19/0x30
[316999.285576] [<c016fb56>] ? getnstimeofday+0x56/0x120
[316999.285576] [<c035a249>] ? copy_to_user+0x39/0x130
[316999.285576] [<c016fc76>] ? do_gettimeofday+0x16/0x40
[316999.285576] [<c0150146>] ? sys_gettimeofday+0x36/0x70
[316999.285576] [<c05c90a4>] syscall_call+0x7/0xb
[316999.285576] gnome-screens D da83bd04 0 31067 12803 0x00000000
[316999.285576] da83bd14 00000086 00000002 da83bd04 f83ea917 c05d89e0 c08c1700 c08c1700
[316999.285576] a1a94a91 0001204d c08c1700 c08c1700 a1a91cd7 0001204d 00000000 c08c1700
[316999.285576] c08c1700 e02c3f70 00000001 c08fa500 c08fa500 da83bd34 da83bd78 c05c72df
[316999.285576] Call Trace:
[316999.285576] [<f83ea917>] ? rpc_put_task+0x77/0x80 [sunrpc]
[316999.285576] [<c05c72df>] schedule_timeout+0x12f/0x270
[316999.285576] [<c01589c0>] ? process_timeout+0x0/0x10
[316999.285576] [<c05c745a>] schedule_timeout_killable+0x1a/0x20
[316999.285576] [<f823132d>] nfs4_delay+0x2d/0x70 [nfs]
[316999.285576] [<f823150a>] nfs4_handle_exception+0xfa/0x180 [nfs]
[316999.285576] [<f823347e>] nfs4_proc_access+0x4e/0x60 [nfs]
[316999.285576] [<f8218edd>] nfs_do_access+0x7d/0xc0 [nfs]
[316999.285576] [<f8218f98>] nfs_permission+0x78/0x1a0 [nfs]
[316999.285576] [<c022309a>] ? do_lookup+0x7a/0x1c0
[316999.285576] [<c02212ea>] exec_permission+0x2a/0x90
[316999.285576] [<c0223867>] link_path_walk+0x67/0x890
[316999.285576] [<c02241b1>] path_walk+0x51/0xc0
[316999.285576] [<c0224339>] do_path_lookup+0x59/0x90
[316999.285576] [<c0224e81>] user_path_at+0x41/0x80
[316999.285576] [<c0218534>] ? do_sync_read+0xa4/0xe0
[316999.285576] [<c01abe9c>] ? rcu_start_gp+0x12c/0x1a0
[316999.285576] [<c021c9da>] vfs_fstatat+0x3a/0x70
[316999.285576] [<c021cb30>] vfs_stat+0x20/0x30
[316999.285576] [<c021cb59>] sys_stat64+0x19/0x30
[316999.285576] [<c0219f9d>] ? fput+0x1d/0x30
[316999.285576] [<c021691c>] ? filp_close+0x4c/0x80
[316999.285576] [<c02169c5>] ? sys_close+0x75/0xc0
[316999.285576] [<c05c90a4>] syscall_call+0x7/0xb

Maybe the above is an unrelated problem?

The way I resolved this was to kill autofs which took down the nfs4
mounts and then restart autofs which brought them back up, although,
TBH, I had to log back into my desktop session so the above processes
could have just died rather than have been resumed by NFS4 becoming
unblocked.

FWIW, the above stack traces look quite similar to issues I have seen
for the last year or so of running NFS4. :-(

Cheers,
b.


Attachments:
signature.asc (198.00 B)
This is a digitally signed message part

2010-11-10 13:43:14

by Brian J. Murrell

[permalink] [raw]
Subject: Re: Error: state manager failed on NFSv4 server linux with error 127

On Sat, 2010-10-30 at 18:22 -0400, Trond Myklebust wrote:
>
> How about something like the following patch? It should allow you to log
> only the errors that lead to a state recovery situation.

So, I came to find my machine looking locked up this morning however
eventually it seemed to release itself. Looking in dmesg I can see:

[350230.265829] Error: state manager encountered RPCSEC_GSS session expired against NFSv4 server linux.
repeated 8 more times
[350230.297307] Error: state manager encountered RPCSEC_GSS session expired against NFSv4 server linux.
[350235.269660] nfs4_warn_keyexpired: 2075 callbacks suppressed
[350235.269665] Error: state manager encountered RPCSEC_GSS session expired against NFSv4 server linux.
repeated 8 more times
[350235.290167] Error: state manager encountered RPCSEC_GSS session expired against NFSv4 server linux.
[350240.272110] nfs4_warn_keyexpired: 2093 callbacks suppressed
[350240.272115] Error: state manager encountered RPCSEC_GSS session expired against NFSv4 server linux.
repeated 8 more times
[350240.293239] Error: state manager encountered RPCSEC_GSS session expired against NFSv4 server linux.
[350245.276176] nfs4_warn_keyexpired: 2121 callbacks suppressed
[350245.276181] Error: state manager encountered RPCSEC_GSS session expired against NFSv4 server linux.
repeated 8 more times
[350245.299625] Error: state manager encountered RPCSEC_GSS session expired against NFSv4 server linux.
[350250.280834] nfs4_warn_keyexpired: 2082 callbacks suppressed
[350250.280839] Error: state manager encountered RPCSEC_GSS session expired against NFSv4 server linux.
repeated 8 more times
[350250.302470] Error: state manager encountered RPCSEC_GSS session expired against NFSv4 server linux.
[350255.285567] nfs4_warn_keyexpired: 2110 callbacks suppressed
[350255.285572] Error: state manager encountered RPCSEC_GSS session expired against NFSv4 server linux.
repeated 8 more times
[350255.307410] Error: state manager encountered RPCSEC_GSS session expired against NFSv4 server linux.
[350260.290107] nfs4_warn_keyexpired: 2118 callbacks suppressed
[350260.290111] Error: state manager encountered RPCSEC_GSS session expired against NFSv4 server linux.
repeated 8 more times
[350260.315070] Error: state manager encountered RPCSEC_GSS session expired against NFSv4 server linux.
[350265.296668] nfs4_warn_keyexpired: 2107 callbacks suppressed
[350265.296673] Error: state manager encountered RPCSEC_GSS session expired against NFSv4 server linux.
repeated 8 more times
[350265.317877] Error: state manager encountered RPCSEC_GSS session expired against NFSv4 server linux.
[350270.302257] nfs4_warn_keyexpired: 2128 callbacks suppressed
[350270.302262] Error: state manager encountered RPCSEC_GSS session expired against NFSv4 server linux.
repeated 8 more times
[350270.323727] Error: state manager encountered RPCSEC_GSS session expired against NFSv4 server linux.
[350275.310073] nfs4_warn_keyexpired: 2100 callbacks suppressed
[350275.310078] Error: state manager encountered RPCSEC_GSS session expired against NFSv4 server linux.
repeated 8 more times
[350275.332569] Error: state manager encountered RPCSEC_GSS session expired against NFSv4 server linux.
[350280.316868] nfs4_warn_keyexpired: 2085 callbacks suppressed
[350280.316873] Error: state manager encountered RPCSEC_GSS session expired against NFSv4 server linux.
repeated 8 more times
[350280.338332] Error: state manager encountered RPCSEC_GSS session expired against NFSv4 server linux.
[350285.322043] nfs4_warn_keyexpired: 2119 callbacks suppressed
[350285.322049] Error: state manager encountered RPCSEC_GSS session expired against NFSv4 server linux.
repeated 8 more times
[350285.343702] Error: state manager encountered RPCSEC_GSS session expired against NFSv4 server linux.
[350290.328213] nfs4_warn_keyexpired: 2088 callbacks suppressed
[350290.328218] Error: state manager encountered RPCSEC_GSS session expired against NFSv4 server linux.
repeated 8 more times
[350290.350477] Error: state manager encountered RPCSEC_GSS session expired against NFSv4 server linux.
[350295.332510] nfs4_warn_keyexpired: 2122 callbacks suppressed
[350295.332515] Error: state manager encountered RPCSEC_GSS session expired against NFSv4 server linux.
repeated 8 more times
[350295.352991] Error: state manager encountered RPCSEC_GSS session expired against NFSv4 server linux.
[350300.337953] nfs4_warn_keyexpired: 2077 callbacks suppressed
[350300.337959] Error: state manager encountered RPCSEC_GSS session expired against NFSv4 server linux.
repeated 8 more times
[350300.359669] Error: state manager encountered RPCSEC_GSS session expired against NFSv4 server linux.
[350305.345596] nfs4_warn_keyexpired: 2094 callbacks suppressed
[350305.345602] Error: state manager encountered RPCSEC_GSS session expired against NFSv4 server linux.
repeated 8 more times
[350305.380263] Error: state manager encountered RPCSEC_GSS session expired against NFSv4 server linux.
[350310.352954] nfs4_warn_keyexpired: 2081 callbacks suppressed
[350310.352959] Error: state manager encountered RPCSEC_GSS session expired against NFSv4 server linux.
repeated 8 more times
[350310.373999] Error: state manager encountered RPCSEC_GSS session expired against NFSv4 server linux.
[350315.357615] nfs4_warn_keyexpired: 2087 callbacks suppressed
[350315.357620] Error: state manager encountered RPCSEC_GSS session expired against NFSv4 server linux.
repeated 8 more times
[350315.380362] Error: state manager encountered RPCSEC_GSS session expired against NFSv4 server linux.
[350320.362030] nfs4_warn_keyexpired: 2048 callbacks suppressed
[350320.362035] Error: state manager encountered RPCSEC_GSS session expired against NFSv4 server linux.
repeated 8 more times
[350320.383488] Error: state manager encountered RPCSEC_GSS session expired against NFSv4 server linux.

b.


Attachments:
signature.asc (198.00 B)
This is a digitally signed message part

2010-11-11 05:19:47

by Trond Myklebust

[permalink] [raw]
Subject: Re: Error: state manager failed on NFSv4 server linux with error 127

On Wed, 2010-11-10 at 08:43 -0500, Brian J. Murrell wrote:
> On Sat, 2010-10-30 at 18:22 -0400, Trond Myklebust wrote:
> >
> > How about something like the following patch? It should allow you to log
> > only the errors that lead to a state recovery situation.
>
> So, I came to find my machine looking locked up this morning however
> eventually it seemed to release itself. Looking in dmesg I can see:
>
> [350230.265829] Error: state manager encountered RPCSEC_GSS session expired against NFSv4 server linux.
> repeated 8 more times
> [350230.297307] Error: state manager encountered RPCSEC_GSS session expired against NFSv4 server linux.
> [350235.269660] nfs4_warn_keyexpired: 2075 callbacks suppressed
> [350235.269665] Error: state manager encountered RPCSEC_GSS session expired against NFSv4 server linux.
> repeated 8 more times
> [350235.290167] Error: state manager encountered RPCSEC_GSS session expired against NFSv4 server linux.
> [350240.272110] nfs4_warn_keyexpired: 2093 callbacks suppressed
> [350240.272115] Error: state manager encountered RPCSEC_GSS session expired against NFSv4 server linux.
> repeated 8 more times
> [350240.293239] Error: state manager encountered RPCSEC_GSS session expired against NFSv4 server linux.
> [350245.276176] nfs4_warn_keyexpired: 2121 callbacks suppressed
> [350245.276181] Error: state manager encountered RPCSEC_GSS session expired against NFSv4 server linux.
> repeated 8 more times
> [350245.299625] Error: state manager encountered RPCSEC_GSS session expired against NFSv4 server linux.
> [350250.280834] nfs4_warn_keyexpired: 2082 callbacks suppressed
> [350250.280839] Error: state manager encountered RPCSEC_GSS session expired against NFSv4 server linux.
> repeated 8 more times
> [350250.302470] Error: state manager encountered RPCSEC_GSS session expired against NFSv4 server linux.
> [350255.285567] nfs4_warn_keyexpired: 2110 callbacks suppressed
> [350255.285572] Error: state manager encountered RPCSEC_GSS session expired against NFSv4 server linux.
> repeated 8 more times
> [350255.307410] Error: state manager encountered RPCSEC_GSS session expired against NFSv4 server linux.
> [350260.290107] nfs4_warn_keyexpired: 2118 callbacks suppressed
> [350260.290111] Error: state manager encountered RPCSEC_GSS session expired against NFSv4 server linux.
> repeated 8 more times
> [350260.315070] Error: state manager encountered RPCSEC_GSS session expired against NFSv4 server linux.
> [350265.296668] nfs4_warn_keyexpired: 2107 callbacks suppressed
> [350265.296673] Error: state manager encountered RPCSEC_GSS session expired against NFSv4 server linux.
> repeated 8 more times
> [350265.317877] Error: state manager encountered RPCSEC_GSS session expired against NFSv4 server linux.
> [350270.302257] nfs4_warn_keyexpired: 2128 callbacks suppressed
> [350270.302262] Error: state manager encountered RPCSEC_GSS session expired against NFSv4 server linux.
> repeated 8 more times
> [350270.323727] Error: state manager encountered RPCSEC_GSS session expired against NFSv4 server linux.
> [350275.310073] nfs4_warn_keyexpired: 2100 callbacks suppressed
> [350275.310078] Error: state manager encountered RPCSEC_GSS session expired against NFSv4 server linux.
> repeated 8 more times
> [350275.332569] Error: state manager encountered RPCSEC_GSS session expired against NFSv4 server linux.
> [350280.316868] nfs4_warn_keyexpired: 2085 callbacks suppressed
> [350280.316873] Error: state manager encountered RPCSEC_GSS session expired against NFSv4 server linux.
> repeated 8 more times
> [350280.338332] Error: state manager encountered RPCSEC_GSS session expired against NFSv4 server linux.
> [350285.322043] nfs4_warn_keyexpired: 2119 callbacks suppressed
> [350285.322049] Error: state manager encountered RPCSEC_GSS session expired against NFSv4 server linux.
> repeated 8 more times
> [350285.343702] Error: state manager encountered RPCSEC_GSS session expired against NFSv4 server linux.
> [350290.328213] nfs4_warn_keyexpired: 2088 callbacks suppressed
> [350290.328218] Error: state manager encountered RPCSEC_GSS session expired against NFSv4 server linux.
> repeated 8 more times
> [350290.350477] Error: state manager encountered RPCSEC_GSS session expired against NFSv4 server linux.
> [350295.332510] nfs4_warn_keyexpired: 2122 callbacks suppressed
> [350295.332515] Error: state manager encountered RPCSEC_GSS session expired against NFSv4 server linux.
> repeated 8 more times
> [350295.352991] Error: state manager encountered RPCSEC_GSS session expired against NFSv4 server linux.
> [350300.337953] nfs4_warn_keyexpired: 2077 callbacks suppressed
> [350300.337959] Error: state manager encountered RPCSEC_GSS session expired against NFSv4 server linux.
> repeated 8 more times
> [350300.359669] Error: state manager encountered RPCSEC_GSS session expired against NFSv4 server linux.
> [350305.345596] nfs4_warn_keyexpired: 2094 callbacks suppressed
> [350305.345602] Error: state manager encountered RPCSEC_GSS session expired against NFSv4 server linux.
> repeated 8 more times
> [350305.380263] Error: state manager encountered RPCSEC_GSS session expired against NFSv4 server linux.
> [350310.352954] nfs4_warn_keyexpired: 2081 callbacks suppressed
> [350310.352959] Error: state manager encountered RPCSEC_GSS session expired against NFSv4 server linux.
> repeated 8 more times
> [350310.373999] Error: state manager encountered RPCSEC_GSS session expired against NFSv4 server linux.
> [350315.357615] nfs4_warn_keyexpired: 2087 callbacks suppressed
> [350315.357620] Error: state manager encountered RPCSEC_GSS session expired against NFSv4 server linux.
> repeated 8 more times
> [350315.380362] Error: state manager encountered RPCSEC_GSS session expired against NFSv4 server linux.
> [350320.362030] nfs4_warn_keyexpired: 2048 callbacks suppressed
> [350320.362035] Error: state manager encountered RPCSEC_GSS session expired against NFSv4 server linux.
> repeated 8 more times
> [350320.383488] Error: state manager encountered RPCSEC_GSS session expired against NFSv4 server linux.
>
> b.
>

Which simply means that you left your machine and let the kerberos
ticket expire. When that happened, the machine was unable to recover
state for the affected applications.

Cheers
Trond