Hi,
I just noticed there were couple of oops in my Oracle6 in nfs4.dev network.
I'm not sure who ran which tests (be useful to know) that caused these oops.
Here is the stack traces:
[286123.154006] BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:1585
[286123.155126] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 3983, name: nfsd
[286123.155872] preempt_count: 1, expected: 0
[286123.156443] RCU nest depth: 0, expected: 0
[286123.156771] 1 lock held by nfsd/3983:
[286123.156786] #0: ffff888006762520 (&clp->cl_lock){+.+.}-{2:2}, at: nfsd4_release_lockowner+0x306/0x850 [nfsd]
[286123.156949] Preemption disabled at:
[286123.156961] [<0000000000000000>] 0x0
[286123.157520] CPU: 1 PID: 3983 Comm: nfsd Kdump: loaded Not tainted 5.18.0-rc4+ #1
[286123.157539] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[286123.157552] Call Trace:
[286123.157565] <TASK>
[286123.157581] dump_stack_lvl+0x57/0x7d
[286123.157609] __might_resched.cold+0x222/0x26b
[286123.157644] down_read_nested+0x68/0x420
[286123.157671] ? down_write_nested+0x130/0x130
[286123.157686] ? rwlock_bug.part.0+0x90/0x90
[286123.157705] ? rcu_read_lock_sched_held+0x81/0xb0
[286123.157749] xfs_file_fsync+0x3b9/0x820
[286123.157776] ? lock_downgrade+0x680/0x680
[286123.157798] ? xfs_filemap_pfn_mkwrite+0x10/0x10
[286123.157823] ? nfsd_file_put+0x100/0x100 [nfsd]
[286123.157921] nfsd_file_flush.isra.0+0x1b/0x220 [nfsd]
[286123.158007] nfsd_file_put+0x79/0x100 [nfsd]
[286123.158088] check_for_locks+0x152/0x200 [nfsd]
[286123.158191] nfsd4_release_lockowner+0x4cf/0x850 [nfsd]
[286123.158393] ? nfsd4_locku+0xd10/0xd10 [nfsd]
[286123.158488] ? rcu_read_lock_bh_held+0x90/0x90
[286123.158525] nfsd4_proc_compound+0xd15/0x25a0 [nfsd]
[286123.158699] nfsd_dispatch+0x4ed/0xc30 [nfsd]
[286123.158974] ? rcu_read_lock_bh_held+0x90/0x90
[286123.159010] svc_process_common+0xd8e/0x1b20 [sunrpc]
[286123.159043] ? svc_generic_rpcbind_set+0x450/0x450 [sunrpc]
[286123.159043] ? nfsd_svc+0xc50/0xc50 [nfsd]
[286123.159043] ? svc_sock_secure_port+0x27/0x40 [sunrpc]
[286123.159043] ? svc_recv+0x1100/0x2390 [sunrpc]
[286123.159043] svc_process+0x361/0x4f0 [sunrpc]
[286123.159043] nfsd+0x2d6/0x570 [nfsd]
[286123.159043] ? nfsd_shutdown_threads+0x2a0/0x2a0 [nfsd]
[286123.159043] kthread+0x29f/0x340
[286123.159043] ? kthread_complete_and_exit+0x20/0x20
[286123.159043] ret_from_fork+0x22/0x30
[286123.159043] </TASK>
[286123.187052] BUG: scheduling while atomic: nfsd/3983/0x00000002
[286123.187551] INFO: lockdep is turned off.
[286123.187918] Modules linked in: nfsd auth_rpcgss nfs_acl lockd grace sunrpc
[286123.188527] Preemption disabled at:
[286123.188535] [<0000000000000000>] 0x0
[286123.189255] CPU: 1 PID: 3983 Comm: nfsd Kdump: loaded Tainted: G W 5.18.0-rc4+ #1
[286123.190233] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[286123.190910] Call Trace:
[286123.190910] <TASK>
[286123.190910] dump_stack_lvl+0x57/0x7d
[286123.190910] __schedule_bug.cold+0x133/0x143
[286123.190910] __schedule+0x16c9/0x20a0
[286123.190910] ? schedule_timeout+0x314/0x510
[286123.190910] ? __sched_text_start+0x8/0x8
[286123.190910] ? internal_add_timer+0xa4/0xe0
[286123.190910] schedule+0xd7/0x1f0
[286123.190910] schedule_timeout+0x319/0x510
[286123.190910] ? rcu_read_lock_held_common+0xe/0xa0
[286123.190910] ? usleep_range_state+0x150/0x150
[286123.190910] ? lock_acquire+0x331/0x490
[286123.190910] ? init_timer_on_stack_key+0x50/0x50
[286123.190910] ? do_raw_spin_lock+0x116/0x260
[286123.190910] ? rwlock_bug.part.0+0x90/0x90
[286123.190910] io_schedule_timeout+0x26/0x80
[286123.190910] wait_for_completion_io_timeout+0x16a/0x340
[286123.190910] ? rcu_read_lock_bh_held+0x90/0x90
[286123.190910] ? wait_for_completion+0x330/0x330
[286123.190910] submit_bio_wait+0x135/0x1d0
[286123.190910] ? submit_bio_wait_endio+0x40/0x40
[286123.190910] ? xfs_iunlock+0xd5/0x300
[286123.190910] ? bio_init+0x295/0x470
[286123.190910] blkdev_issue_flush+0x69/0x80
[286123.190910] ? blk_unregister_queue+0x1e0/0x1e0
[286123.190910] ? bio_kmalloc+0x90/0x90
[286123.190910] ? xfs_iunlock+0x1b4/0x300
[286123.190910] xfs_file_fsync+0x354/0x820
[286123.190910] ? lock_downgrade+0x680/0x680
[286123.190910] ? xfs_filemap_pfn_mkwrite+0x10/0x10
[286123.190910] ? nfsd_file_put+0x100/0x100 [nfsd]
[286123.190910] nfsd_file_flush.isra.0+0x1b/0x220 [nfsd]
[286123.190910] nfsd_file_put+0x79/0x100 [nfsd]
[286123.190910] check_for_locks+0x152/0x200 [nfsd]
[286123.190910] nfsd4_release_lockowner+0x4cf/0x850 [nfsd]
[286123.190910] ? nfsd4_locku+0xd10/0xd10 [nfsd]
[286123.190910] ? rcu_read_lock_bh_held+0x90/0x90
[286123.190910] nfsd4_proc_compound+0xd15/0x25a0 [nfsd]
[286123.190910] nfsd_dispatch+0x4ed/0xc30 [nfsd]
[286123.190910] ? rcu_read_lock_bh_held+0x90/0x90
[286123.190910] svc_process_common+0xd8e/0x1b20 [sunrpc]
[286123.190910] ? svc_generic_rpcbind_set+0x450/0x450 [sunrpc]
[286123.190910] ? nfsd_svc+0xc50/0xc50 [nfsd]
[286123.190910] ? svc_sock_secure_port+0x27/0x40 [sunrpc]
[286123.190910] ? svc_recv+0x1100/0x2390 [sunrpc]
[286123.190910] svc_process+0x361/0x4f0 [sunrpc]
[286123.190910] nfsd+0x2d6/0x570 [nfsd]
[286123.190910] ? nfsd_shutdown_threads+0x2a0/0x2a0 [nfsd]
[286123.190910] kthread+0x29f/0x340
[286123.190910] ? kthread_complete_and_exit+0x20/0x20
[286123.190910] ret_from_fork+0x22/0x30
[286123.190910] </TASK>
The problem is the process tries to sleep while holding the
cl_lock by nfsd4_release_lockowner. I think the problem was
introduced with the filemap_flush in nfsd_file_put since
'b6669305d35a nfsd: Reduce the number of calls to nfsd_file_gc()'.
The filemap_flush is later replaced by nfsd_file_flush by
'6b8a94332ee4f nfsd: Fix a write performance regression'.
-Dai
> On May 3, 2022, at 3:11 PM, [email protected] wrote:
>
> Hi,
>
> I just noticed there were couple of oops in my Oracle6 in nfs4.dev network.
> I'm not sure who ran which tests (be useful to know) that caused these oops.
>
> Here is the stack traces:
>
> [286123.154006] BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:1585
> [286123.155126] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 3983, name: nfsd
> [286123.155872] preempt_count: 1, expected: 0
> [286123.156443] RCU nest depth: 0, expected: 0
> [286123.156771] 1 lock held by nfsd/3983:
> [286123.156786] #0: ffff888006762520 (&clp->cl_lock){+.+.}-{2:2}, at: nfsd4_release_lockowner+0x306/0x850 [nfsd]
> [286123.156949] Preemption disabled at:
> [286123.156961] [<0000000000000000>] 0x0
> [286123.157520] CPU: 1 PID: 3983 Comm: nfsd Kdump: loaded Not tainted 5.18.0-rc4+ #1
> [286123.157539] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
> [286123.157552] Call Trace:
> [286123.157565] <TASK>
> [286123.157581] dump_stack_lvl+0x57/0x7d
> [286123.157609] __might_resched.cold+0x222/0x26b
> [286123.157644] down_read_nested+0x68/0x420
> [286123.157671] ? down_write_nested+0x130/0x130
> [286123.157686] ? rwlock_bug.part.0+0x90/0x90
> [286123.157705] ? rcu_read_lock_sched_held+0x81/0xb0
> [286123.157749] xfs_file_fsync+0x3b9/0x820
> [286123.157776] ? lock_downgrade+0x680/0x680
> [286123.157798] ? xfs_filemap_pfn_mkwrite+0x10/0x10
> [286123.157823] ? nfsd_file_put+0x100/0x100 [nfsd]
> [286123.157921] nfsd_file_flush.isra.0+0x1b/0x220 [nfsd]
> [286123.158007] nfsd_file_put+0x79/0x100 [nfsd]
> [286123.158088] check_for_locks+0x152/0x200 [nfsd]
> [286123.158191] nfsd4_release_lockowner+0x4cf/0x850 [nfsd]
> [286123.158393] ? nfsd4_locku+0xd10/0xd10 [nfsd]
> [286123.158488] ? rcu_read_lock_bh_held+0x90/0x90
> [286123.158525] nfsd4_proc_compound+0xd15/0x25a0 [nfsd]
> [286123.158699] nfsd_dispatch+0x4ed/0xc30 [nfsd]
> [286123.158974] ? rcu_read_lock_bh_held+0x90/0x90
> [286123.159010] svc_process_common+0xd8e/0x1b20 [sunrpc]
> [286123.159043] ? svc_generic_rpcbind_set+0x450/0x450 [sunrpc]
> [286123.159043] ? nfsd_svc+0xc50/0xc50 [nfsd]
> [286123.159043] ? svc_sock_secure_port+0x27/0x40 [sunrpc]
> [286123.159043] ? svc_recv+0x1100/0x2390 [sunrpc]
> [286123.159043] svc_process+0x361/0x4f0 [sunrpc]
> [286123.159043] nfsd+0x2d6/0x570 [nfsd]
> [286123.159043] ? nfsd_shutdown_threads+0x2a0/0x2a0 [nfsd]
> [286123.159043] kthread+0x29f/0x340
> [286123.159043] ? kthread_complete_and_exit+0x20/0x20
> [286123.159043] ret_from_fork+0x22/0x30
> [286123.159043] </TASK>
> [286123.187052] BUG: scheduling while atomic: nfsd/3983/0x00000002
> [286123.187551] INFO: lockdep is turned off.
> [286123.187918] Modules linked in: nfsd auth_rpcgss nfs_acl lockd grace sunrpc
> [286123.188527] Preemption disabled at:
> [286123.188535] [<0000000000000000>] 0x0
> [286123.189255] CPU: 1 PID: 3983 Comm: nfsd Kdump: loaded Tainted: G W 5.18.0-rc4+ #1
> [286123.190233] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
> [286123.190910] Call Trace:
> [286123.190910] <TASK>
> [286123.190910] dump_stack_lvl+0x57/0x7d
> [286123.190910] __schedule_bug.cold+0x133/0x143
> [286123.190910] __schedule+0x16c9/0x20a0
> [286123.190910] ? schedule_timeout+0x314/0x510
> [286123.190910] ? __sched_text_start+0x8/0x8
> [286123.190910] ? internal_add_timer+0xa4/0xe0
> [286123.190910] schedule+0xd7/0x1f0
> [286123.190910] schedule_timeout+0x319/0x510
> [286123.190910] ? rcu_read_lock_held_common+0xe/0xa0
> [286123.190910] ? usleep_range_state+0x150/0x150
> [286123.190910] ? lock_acquire+0x331/0x490
> [286123.190910] ? init_timer_on_stack_key+0x50/0x50
> [286123.190910] ? do_raw_spin_lock+0x116/0x260
> [286123.190910] ? rwlock_bug.part.0+0x90/0x90
> [286123.190910] io_schedule_timeout+0x26/0x80
> [286123.190910] wait_for_completion_io_timeout+0x16a/0x340
> [286123.190910] ? rcu_read_lock_bh_held+0x90/0x90
> [286123.190910] ? wait_for_completion+0x330/0x330
> [286123.190910] submit_bio_wait+0x135/0x1d0
> [286123.190910] ? submit_bio_wait_endio+0x40/0x40
> [286123.190910] ? xfs_iunlock+0xd5/0x300
> [286123.190910] ? bio_init+0x295/0x470
> [286123.190910] blkdev_issue_flush+0x69/0x80
> [286123.190910] ? blk_unregister_queue+0x1e0/0x1e0
> [286123.190910] ? bio_kmalloc+0x90/0x90
> [286123.190910] ? xfs_iunlock+0x1b4/0x300
> [286123.190910] xfs_file_fsync+0x354/0x820
> [286123.190910] ? lock_downgrade+0x680/0x680
> [286123.190910] ? xfs_filemap_pfn_mkwrite+0x10/0x10
> [286123.190910] ? nfsd_file_put+0x100/0x100 [nfsd]
> [286123.190910] nfsd_file_flush.isra.0+0x1b/0x220 [nfsd]
> [286123.190910] nfsd_file_put+0x79/0x100 [nfsd]
> [286123.190910] check_for_locks+0x152/0x200 [nfsd]
> [286123.190910] nfsd4_release_lockowner+0x4cf/0x850 [nfsd]
> [286123.190910] ? nfsd4_locku+0xd10/0xd10 [nfsd]
> [286123.190910] ? rcu_read_lock_bh_held+0x90/0x90
> [286123.190910] nfsd4_proc_compound+0xd15/0x25a0 [nfsd]
> [286123.190910] nfsd_dispatch+0x4ed/0xc30 [nfsd]
> [286123.190910] ? rcu_read_lock_bh_held+0x90/0x90
> [286123.190910] svc_process_common+0xd8e/0x1b20 [sunrpc]
> [286123.190910] ? svc_generic_rpcbind_set+0x450/0x450 [sunrpc]
> [286123.190910] ? nfsd_svc+0xc50/0xc50 [nfsd]
> [286123.190910] ? svc_sock_secure_port+0x27/0x40 [sunrpc]
> [286123.190910] ? svc_recv+0x1100/0x2390 [sunrpc]
> [286123.190910] svc_process+0x361/0x4f0 [sunrpc]
> [286123.190910] nfsd+0x2d6/0x570 [nfsd]
> [286123.190910] ? nfsd_shutdown_threads+0x2a0/0x2a0 [nfsd]
> [286123.190910] kthread+0x29f/0x340
> [286123.190910] ? kthread_complete_and_exit+0x20/0x20
> [286123.190910] ret_from_fork+0x22/0x30
> [286123.190910] </TASK>
>
> The problem is the process tries to sleep while holding the
> cl_lock by nfsd4_release_lockowner. I think the problem was
> introduced with the filemap_flush in nfsd_file_put since
> 'b6669305d35a nfsd: Reduce the number of calls to nfsd_file_gc()'.
> The filemap_flush is later replaced by nfsd_file_flush by
> '6b8a94332ee4f nfsd: Fix a write performance regression'.
That seems plausible, given the traces above.
But it begs the question: why was a vfs_fsync() needed by
RELEASE_LOCKOWNER in this case? I've tried to reproduce the
problem, and even added a might_sleep() call in nfsd_file_flush()
but haven't been able to reproduce.
Since this was 5.18-rc4, would you open a bug report on
bugzilla.linux-nfs.org and copy in all this detail?
--
Chuck Lever
On 5/10/22 7:24 AM, Chuck Lever III wrote:
>
>> On May 3, 2022, at 3:11 PM, [email protected] wrote:
>>
>> Hi,
>>
>> I just noticed there were couple of oops in my Oracle6 in nfs4.dev network.
>> I'm not sure who ran which tests (be useful to know) that caused these oops.
>>
>> Here is the stack traces:
>>
>> [286123.154006] BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:1585
>> [286123.155126] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 3983, name: nfsd
>> [286123.155872] preempt_count: 1, expected: 0
>> [286123.156443] RCU nest depth: 0, expected: 0
>> [286123.156771] 1 lock held by nfsd/3983:
>> [286123.156786] #0: ffff888006762520 (&clp->cl_lock){+.+.}-{2:2}, at: nfsd4_release_lockowner+0x306/0x850 [nfsd]
>> [286123.156949] Preemption disabled at:
>> [286123.156961] [<0000000000000000>] 0x0
>> [286123.157520] CPU: 1 PID: 3983 Comm: nfsd Kdump: loaded Not tainted 5.18.0-rc4+ #1
>> [286123.157539] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
>> [286123.157552] Call Trace:
>> [286123.157565] <TASK>
>> [286123.157581] dump_stack_lvl+0x57/0x7d
>> [286123.157609] __might_resched.cold+0x222/0x26b
>> [286123.157644] down_read_nested+0x68/0x420
>> [286123.157671] ? down_write_nested+0x130/0x130
>> [286123.157686] ? rwlock_bug.part.0+0x90/0x90
>> [286123.157705] ? rcu_read_lock_sched_held+0x81/0xb0
>> [286123.157749] xfs_file_fsync+0x3b9/0x820
>> [286123.157776] ? lock_downgrade+0x680/0x680
>> [286123.157798] ? xfs_filemap_pfn_mkwrite+0x10/0x10
>> [286123.157823] ? nfsd_file_put+0x100/0x100 [nfsd]
>> [286123.157921] nfsd_file_flush.isra.0+0x1b/0x220 [nfsd]
>> [286123.158007] nfsd_file_put+0x79/0x100 [nfsd]
>> [286123.158088] check_for_locks+0x152/0x200 [nfsd]
>> [286123.158191] nfsd4_release_lockowner+0x4cf/0x850 [nfsd]
>> [286123.158393] ? nfsd4_locku+0xd10/0xd10 [nfsd]
>> [286123.158488] ? rcu_read_lock_bh_held+0x90/0x90
>> [286123.158525] nfsd4_proc_compound+0xd15/0x25a0 [nfsd]
>> [286123.158699] nfsd_dispatch+0x4ed/0xc30 [nfsd]
>> [286123.158974] ? rcu_read_lock_bh_held+0x90/0x90
>> [286123.159010] svc_process_common+0xd8e/0x1b20 [sunrpc]
>> [286123.159043] ? svc_generic_rpcbind_set+0x450/0x450 [sunrpc]
>> [286123.159043] ? nfsd_svc+0xc50/0xc50 [nfsd]
>> [286123.159043] ? svc_sock_secure_port+0x27/0x40 [sunrpc]
>> [286123.159043] ? svc_recv+0x1100/0x2390 [sunrpc]
>> [286123.159043] svc_process+0x361/0x4f0 [sunrpc]
>> [286123.159043] nfsd+0x2d6/0x570 [nfsd]
>> [286123.159043] ? nfsd_shutdown_threads+0x2a0/0x2a0 [nfsd]
>> [286123.159043] kthread+0x29f/0x340
>> [286123.159043] ? kthread_complete_and_exit+0x20/0x20
>> [286123.159043] ret_from_fork+0x22/0x30
>> [286123.159043] </TASK>
>> [286123.187052] BUG: scheduling while atomic: nfsd/3983/0x00000002
>> [286123.187551] INFO: lockdep is turned off.
>> [286123.187918] Modules linked in: nfsd auth_rpcgss nfs_acl lockd grace sunrpc
>> [286123.188527] Preemption disabled at:
>> [286123.188535] [<0000000000000000>] 0x0
>> [286123.189255] CPU: 1 PID: 3983 Comm: nfsd Kdump: loaded Tainted: G W 5.18.0-rc4+ #1
>> [286123.190233] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
>> [286123.190910] Call Trace:
>> [286123.190910] <TASK>
>> [286123.190910] dump_stack_lvl+0x57/0x7d
>> [286123.190910] __schedule_bug.cold+0x133/0x143
>> [286123.190910] __schedule+0x16c9/0x20a0
>> [286123.190910] ? schedule_timeout+0x314/0x510
>> [286123.190910] ? __sched_text_start+0x8/0x8
>> [286123.190910] ? internal_add_timer+0xa4/0xe0
>> [286123.190910] schedule+0xd7/0x1f0
>> [286123.190910] schedule_timeout+0x319/0x510
>> [286123.190910] ? rcu_read_lock_held_common+0xe/0xa0
>> [286123.190910] ? usleep_range_state+0x150/0x150
>> [286123.190910] ? lock_acquire+0x331/0x490
>> [286123.190910] ? init_timer_on_stack_key+0x50/0x50
>> [286123.190910] ? do_raw_spin_lock+0x116/0x260
>> [286123.190910] ? rwlock_bug.part.0+0x90/0x90
>> [286123.190910] io_schedule_timeout+0x26/0x80
>> [286123.190910] wait_for_completion_io_timeout+0x16a/0x340
>> [286123.190910] ? rcu_read_lock_bh_held+0x90/0x90
>> [286123.190910] ? wait_for_completion+0x330/0x330
>> [286123.190910] submit_bio_wait+0x135/0x1d0
>> [286123.190910] ? submit_bio_wait_endio+0x40/0x40
>> [286123.190910] ? xfs_iunlock+0xd5/0x300
>> [286123.190910] ? bio_init+0x295/0x470
>> [286123.190910] blkdev_issue_flush+0x69/0x80
>> [286123.190910] ? blk_unregister_queue+0x1e0/0x1e0
>> [286123.190910] ? bio_kmalloc+0x90/0x90
>> [286123.190910] ? xfs_iunlock+0x1b4/0x300
>> [286123.190910] xfs_file_fsync+0x354/0x820
>> [286123.190910] ? lock_downgrade+0x680/0x680
>> [286123.190910] ? xfs_filemap_pfn_mkwrite+0x10/0x10
>> [286123.190910] ? nfsd_file_put+0x100/0x100 [nfsd]
>> [286123.190910] nfsd_file_flush.isra.0+0x1b/0x220 [nfsd]
>> [286123.190910] nfsd_file_put+0x79/0x100 [nfsd]
>> [286123.190910] check_for_locks+0x152/0x200 [nfsd]
>> [286123.190910] nfsd4_release_lockowner+0x4cf/0x850 [nfsd]
>> [286123.190910] ? nfsd4_locku+0xd10/0xd10 [nfsd]
>> [286123.190910] ? rcu_read_lock_bh_held+0x90/0x90
>> [286123.190910] nfsd4_proc_compound+0xd15/0x25a0 [nfsd]
>> [286123.190910] nfsd_dispatch+0x4ed/0xc30 [nfsd]
>> [286123.190910] ? rcu_read_lock_bh_held+0x90/0x90
>> [286123.190910] svc_process_common+0xd8e/0x1b20 [sunrpc]
>> [286123.190910] ? svc_generic_rpcbind_set+0x450/0x450 [sunrpc]
>> [286123.190910] ? nfsd_svc+0xc50/0xc50 [nfsd]
>> [286123.190910] ? svc_sock_secure_port+0x27/0x40 [sunrpc]
>> [286123.190910] ? svc_recv+0x1100/0x2390 [sunrpc]
>> [286123.190910] svc_process+0x361/0x4f0 [sunrpc]
>> [286123.190910] nfsd+0x2d6/0x570 [nfsd]
>> [286123.190910] ? nfsd_shutdown_threads+0x2a0/0x2a0 [nfsd]
>> [286123.190910] kthread+0x29f/0x340
>> [286123.190910] ? kthread_complete_and_exit+0x20/0x20
>> [286123.190910] ret_from_fork+0x22/0x30
>> [286123.190910] </TASK>
>>
>> The problem is the process tries to sleep while holding the
>> cl_lock by nfsd4_release_lockowner. I think the problem was
>> introduced with the filemap_flush in nfsd_file_put since
>> 'b6669305d35a nfsd: Reduce the number of calls to nfsd_file_gc()'.
>> The filemap_flush is later replaced by nfsd_file_flush by
>> '6b8a94332ee4f nfsd: Fix a write performance regression'.
> That seems plausible, given the traces above.
>
> But it begs the question: why was a vfs_fsync() needed by
> RELEASE_LOCKOWNER in this case? I've tried to reproduce the
> problem, and even added a might_sleep() call in nfsd_file_flush()
> but haven't been able to reproduce.
I haven't found a way to force the NFSD_FILE_HASHED bit to be
cleared when the client sends RELEASE_LOCKOWNER so nfsd_file_flush
is called. But if you add might_sleep() to nfsd_file_put() and
run pynfs 4.0 RLOWN1 test, the problem will happen.
>
> Since this was 5.18-rc4, would you open a bug report on
> bugzilla.linux-nfs.org and copy in all this detail?
will do. Note that the kernel on Oracle6 is 5.18-rc4 with
Courteous server patches v24 but I don't think it's related
to this problem.
-Dai
>
>
> --
> Chuck Lever
>
>
>
> On May 11, 2022, at 12:09 PM, Trond Myklebust <[email protected]> wrote:
>
> On Wed, 2022-05-11 at 14:57 +0000, Chuck Lever III wrote:
>>
>>
>>> On May 10, 2022, at 10:24 AM, Chuck Lever III
>>> <[email protected]> wrote:
>>>
>>>
>>>
>>>> On May 3, 2022, at 3:11 PM, [email protected] wrote:
>>>>
>>>> Hi,
>>>>
>>>> I just noticed there were couple of oops in my Oracle6 in
>>>> nfs4.dev network.
>>>> I'm not sure who ran which tests (be useful to know) that caused
>>>> these oops.
>>>>
>>>> Here is the stack traces:
>>>>
>>>> [286123.154006] BUG: sleeping function called from invalid
>>>> context at kernel/locking/rwsem.c:1585
>>>> [286123.155126] in_atomic(): 1, irqs_disabled(): 0, non_block: 0,
>>>> pid: 3983, name: nfsd
>>>> [286123.155872] preempt_count: 1, expected: 0
>>>> [286123.156443] RCU nest depth: 0, expected: 0
>>>> [286123.156771] 1 lock held by nfsd/3983:
>>>> [286123.156786] #0: ffff888006762520 (&clp->cl_lock){+.+.}-
>>>> {2:2}, at: nfsd4_release_lockowner+0x306/0x850 [nfsd]
>>>> [286123.156949] Preemption disabled at:
>>>> [286123.156961] [<0000000000000000>] 0x0
>>>> [286123.157520] CPU: 1 PID: 3983 Comm: nfsd Kdump: loaded Not
>>>> tainted 5.18.0-rc4+ #1
>>>> [286123.157539] Hardware name: innotek GmbH
>>>> VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
>>>> [286123.157552] Call Trace:
>>>> [286123.157565] <TASK>
>>>> [286123.157581] dump_stack_lvl+0x57/0x7d
>>>> [286123.157609] __might_resched.cold+0x222/0x26b
>>>> [286123.157644] down_read_nested+0x68/0x420
>>>> [286123.157671] ? down_write_nested+0x130/0x130
>>>> [286123.157686] ? rwlock_bug.part.0+0x90/0x90
>>>> [286123.157705] ? rcu_read_lock_sched_held+0x81/0xb0
>>>> [286123.157749] xfs_file_fsync+0x3b9/0x820
>>>> [286123.157776] ? lock_downgrade+0x680/0x680
>>>> [286123.157798] ? xfs_filemap_pfn_mkwrite+0x10/0x10
>>>> [286123.157823] ? nfsd_file_put+0x100/0x100 [nfsd]
>>>> [286123.157921] nfsd_file_flush.isra.0+0x1b/0x220 [nfsd]
>>>> [286123.158007] nfsd_file_put+0x79/0x100 [nfsd]
>>>> [286123.158088] check_for_locks+0x152/0x200 [nfsd]
>>>> [286123.158191] nfsd4_release_lockowner+0x4cf/0x850 [nfsd]
>>>> [286123.158393] ? nfsd4_locku+0xd10/0xd10 [nfsd]
>>>> [286123.158488] ? rcu_read_lock_bh_held+0x90/0x90
>>>> [286123.158525] nfsd4_proc_compound+0xd15/0x25a0 [nfsd]
>>>> [286123.158699] nfsd_dispatch+0x4ed/0xc30 [nfsd]
>>>> [286123.158974] ? rcu_read_lock_bh_held+0x90/0x90
>>>> [286123.159010] svc_process_common+0xd8e/0x1b20 [sunrpc]
>>>> [286123.159043] ? svc_generic_rpcbind_set+0x450/0x450 [sunrpc]
>>>> [286123.159043] ? nfsd_svc+0xc50/0xc50 [nfsd]
>>>> [286123.159043] ? svc_sock_secure_port+0x27/0x40 [sunrpc]
>>>> [286123.159043] ? svc_recv+0x1100/0x2390 [sunrpc]
>>>> [286123.159043] svc_process+0x361/0x4f0 [sunrpc]
>>>> [286123.159043] nfsd+0x2d6/0x570 [nfsd]
>>>> [286123.159043] ? nfsd_shutdown_threads+0x2a0/0x2a0 [nfsd]
>>>> [286123.159043] kthread+0x29f/0x340
>>>> [286123.159043] ? kthread_complete_and_exit+0x20/0x20
>>>> [286123.159043] ret_from_fork+0x22/0x30
>>>> [286123.159043] </TASK>
>>>> [286123.187052] BUG: scheduling while atomic:
>>>> nfsd/3983/0x00000002
>>>> [286123.187551] INFO: lockdep is turned off.
>>>> [286123.187918] Modules linked in: nfsd auth_rpcgss nfs_acl lockd
>>>> grace sunrpc
>>>> [286123.188527] Preemption disabled at:
>>>> [286123.188535] [<0000000000000000>] 0x0
>>>> [286123.189255] CPU: 1 PID: 3983 Comm: nfsd Kdump: loaded
>>>> Tainted: G W 5.18.0-rc4+ #1
>>>> [286123.190233] Hardware name: innotek GmbH
>>>> VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
>>>> [286123.190910] Call Trace:
>>>> [286123.190910] <TASK>
>>>> [286123.190910] dump_stack_lvl+0x57/0x7d
>>>> [286123.190910] __schedule_bug.cold+0x133/0x143
>>>> [286123.190910] __schedule+0x16c9/0x20a0
>>>> [286123.190910] ? schedule_timeout+0x314/0x510
>>>> [286123.190910] ? __sched_text_start+0x8/0x8
>>>> [286123.190910] ? internal_add_timer+0xa4/0xe0
>>>> [286123.190910] schedule+0xd7/0x1f0
>>>> [286123.190910] schedule_timeout+0x319/0x510
>>>> [286123.190910] ? rcu_read_lock_held_common+0xe/0xa0
>>>> [286123.190910] ? usleep_range_state+0x150/0x150
>>>> [286123.190910] ? lock_acquire+0x331/0x490
>>>> [286123.190910] ? init_timer_on_stack_key+0x50/0x50
>>>> [286123.190910] ? do_raw_spin_lock+0x116/0x260
>>>> [286123.190910] ? rwlock_bug.part.0+0x90/0x90
>>>> [286123.190910] io_schedule_timeout+0x26/0x80
>>>> [286123.190910] wait_for_completion_io_timeout+0x16a/0x340
>>>> [286123.190910] ? rcu_read_lock_bh_held+0x90/0x90
>>>> [286123.190910] ? wait_for_completion+0x330/0x330
>>>> [286123.190910] submit_bio_wait+0x135/0x1d0
>>>> [286123.190910] ? submit_bio_wait_endio+0x40/0x40
>>>> [286123.190910] ? xfs_iunlock+0xd5/0x300
>>>> [286123.190910] ? bio_init+0x295/0x470
>>>> [286123.190910] blkdev_issue_flush+0x69/0x80
>>>> [286123.190910] ? blk_unregister_queue+0x1e0/0x1e0
>>>> [286123.190910] ? bio_kmalloc+0x90/0x90
>>>> [286123.190910] ? xfs_iunlock+0x1b4/0x300
>>>> [286123.190910] xfs_file_fsync+0x354/0x820
>>>> [286123.190910] ? lock_downgrade+0x680/0x680
>>>> [286123.190910] ? xfs_filemap_pfn_mkwrite+0x10/0x10
>>>> [286123.190910] ? nfsd_file_put+0x100/0x100 [nfsd]
>>>> [286123.190910] nfsd_file_flush.isra.0+0x1b/0x220 [nfsd]
>>>> [286123.190910] nfsd_file_put+0x79/0x100 [nfsd]
>>>> [286123.190910] check_for_locks+0x152/0x200 [nfsd]
>>>> [286123.190910] nfsd4_release_lockowner+0x4cf/0x850 [nfsd]
>>>> [286123.190910] ? nfsd4_locku+0xd10/0xd10 [nfsd]
>>>> [286123.190910] ? rcu_read_lock_bh_held+0x90/0x90
>>>> [286123.190910] nfsd4_proc_compound+0xd15/0x25a0 [nfsd]
>>>> [286123.190910] nfsd_dispatch+0x4ed/0xc30 [nfsd]
>>>> [286123.190910] ? rcu_read_lock_bh_held+0x90/0x90
>>>> [286123.190910] svc_process_common+0xd8e/0x1b20 [sunrpc]
>>>> [286123.190910] ? svc_generic_rpcbind_set+0x450/0x450 [sunrpc]
>>>> [286123.190910] ? nfsd_svc+0xc50/0xc50 [nfsd]
>>>> [286123.190910] ? svc_sock_secure_port+0x27/0x40 [sunrpc]
>>>> [286123.190910] ? svc_recv+0x1100/0x2390 [sunrpc]
>>>> [286123.190910] svc_process+0x361/0x4f0 [sunrpc]
>>>> [286123.190910] nfsd+0x2d6/0x570 [nfsd]
>>>> [286123.190910] ? nfsd_shutdown_threads+0x2a0/0x2a0 [nfsd]
>>>> [286123.190910] kthread+0x29f/0x340
>>>> [286123.190910] ? kthread_complete_and_exit+0x20/0x20
>>>> [286123.190910] ret_from_fork+0x22/0x30
>>>> [286123.190910] </TASK>
>>>>
>>>> The problem is the process tries to sleep while holding the
>>>> cl_lock by nfsd4_release_lockowner. I think the problem was
>>>> introduced with the filemap_flush in nfsd_file_put since
>>>> 'b6669305d35a nfsd: Reduce the number of calls to
>>>> nfsd_file_gc()'.
>>>> The filemap_flush is later replaced by nfsd_file_flush by
>>>> '6b8a94332ee4f nfsd: Fix a write performance regression'.
>>>
>>> That seems plausible, given the traces above.
>>>
>>> But it begs the question: why was a vfs_fsync() needed by
>>> RELEASE_LOCKOWNER in this case? I've tried to reproduce the
>>> problem, and even added a might_sleep() call in nfsd_file_flush()
>>> but haven't been able to reproduce.
>>
>> Trond, I'm assuming you switched to a synchronous flush here to
>> capture writeback errors. There's no other requirement for
>> waiting for the flush to complete, right?
>
> It's because the file is unhashed, so it is about to be closed and
> garbage collected as soon as the refcount goes to zero.
>
>>
>> To enable nfsd_file_put() to be invoked in atomic contexts again,
>> would the following be a reasonable short term fix:
>>
>> diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
>> index 2c1b027774d4..96c8d07788f4 100644
>> --- a/fs/nfsd/filecache.c
>> +++ b/fs/nfsd/filecache.c
>> @@ -304,7 +304,7 @@ nfsd_file_put(struct nfsd_file *nf)
>> {
>> set_bit(NFSD_FILE_REFERENCED, &nf->nf_flags);
>> if (test_bit(NFSD_FILE_HASHED, &nf->nf_flags) == 0) {
>> - nfsd_file_flush(nf);
>> + filemap_flush(nf->nf_file->f_mapping);
>> nfsd_file_put_noref(nf);
>> } else {
>> nfsd_file_put_noref(nf);
>>
>>
>
> filemap_flush() sleeps, and so does nfsd_file_put_noref() (it can call
> filp_close() + fput()), so this kind of change isn't going to work no
> matter how you massage it.
Er. Wouldn't that mean we would have seen "sleep while spinlock is
held" BUGs since nfsd4_release_lockowner() was added? It's done
at least an fput() while holding clp->cl_lock since it was added,
I think.
> Is there any reason why you need a reference to the nfs_file there?
> Wouldn't holding the fp->fi_lock be sufficient, since you're already in
> an atomic context? As long as one of the entries in fp->fi_fds[] is
> non-zero then you should be safe.
Sure, check_for_locks() seems to be the only problematic caller.
Other callers appear to be careful to call nfsd_file_put() only
after releasing spinlocks.
I would like a fix that can be backported without fuss. I was
thinking changing check_for_locks() might get a little too
hairy for that, but I'll check it out.
--
Chuck Lever
On Wed, 2022-05-11 at 14:57 +0000, Chuck Lever III wrote:
>
>
> > On May 10, 2022, at 10:24 AM, Chuck Lever III
> > <[email protected]> wrote:
> >
> >
> >
> > > On May 3, 2022, at 3:11 PM, [email protected] wrote:
> > >
> > > Hi,
> > >
> > > I just noticed there were couple of oops in my Oracle6 in
> > > nfs4.dev network.
> > > I'm not sure who ran which tests (be useful to know) that caused
> > > these oops.
> > >
> > > Here is the stack traces:
> > >
> > > [286123.154006] BUG: sleeping function called from invalid
> > > context at kernel/locking/rwsem.c:1585
> > > [286123.155126] in_atomic(): 1, irqs_disabled(): 0, non_block: 0,
> > > pid: 3983, name: nfsd
> > > [286123.155872] preempt_count: 1, expected: 0
> > > [286123.156443] RCU nest depth: 0, expected: 0
> > > [286123.156771] 1 lock held by nfsd/3983:
> > > [286123.156786] #0: ffff888006762520 (&clp->cl_lock){+.+.}-
> > > {2:2}, at: nfsd4_release_lockowner+0x306/0x850 [nfsd]
> > > [286123.156949] Preemption disabled at:
> > > [286123.156961] [<0000000000000000>] 0x0
> > > [286123.157520] CPU: 1 PID: 3983 Comm: nfsd Kdump: loaded Not
> > > tainted 5.18.0-rc4+ #1
> > > [286123.157539] Hardware name: innotek GmbH
> > > VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
> > > [286123.157552] Call Trace:
> > > [286123.157565] <TASK>
> > > [286123.157581] dump_stack_lvl+0x57/0x7d
> > > [286123.157609] __might_resched.cold+0x222/0x26b
> > > [286123.157644] down_read_nested+0x68/0x420
> > > [286123.157671] ? down_write_nested+0x130/0x130
> > > [286123.157686] ? rwlock_bug.part.0+0x90/0x90
> > > [286123.157705] ? rcu_read_lock_sched_held+0x81/0xb0
> > > [286123.157749] xfs_file_fsync+0x3b9/0x820
> > > [286123.157776] ? lock_downgrade+0x680/0x680
> > > [286123.157798] ? xfs_filemap_pfn_mkwrite+0x10/0x10
> > > [286123.157823] ? nfsd_file_put+0x100/0x100 [nfsd]
> > > [286123.157921] nfsd_file_flush.isra.0+0x1b/0x220 [nfsd]
> > > [286123.158007] nfsd_file_put+0x79/0x100 [nfsd]
> > > [286123.158088] check_for_locks+0x152/0x200 [nfsd]
> > > [286123.158191] nfsd4_release_lockowner+0x4cf/0x850 [nfsd]
> > > [286123.158393] ? nfsd4_locku+0xd10/0xd10 [nfsd]
> > > [286123.158488] ? rcu_read_lock_bh_held+0x90/0x90
> > > [286123.158525] nfsd4_proc_compound+0xd15/0x25a0 [nfsd]
> > > [286123.158699] nfsd_dispatch+0x4ed/0xc30 [nfsd]
> > > [286123.158974] ? rcu_read_lock_bh_held+0x90/0x90
> > > [286123.159010] svc_process_common+0xd8e/0x1b20 [sunrpc]
> > > [286123.159043] ? svc_generic_rpcbind_set+0x450/0x450 [sunrpc]
> > > [286123.159043] ? nfsd_svc+0xc50/0xc50 [nfsd]
> > > [286123.159043] ? svc_sock_secure_port+0x27/0x40 [sunrpc]
> > > [286123.159043] ? svc_recv+0x1100/0x2390 [sunrpc]
> > > [286123.159043] svc_process+0x361/0x4f0 [sunrpc]
> > > [286123.159043] nfsd+0x2d6/0x570 [nfsd]
> > > [286123.159043] ? nfsd_shutdown_threads+0x2a0/0x2a0 [nfsd]
> > > [286123.159043] kthread+0x29f/0x340
> > > [286123.159043] ? kthread_complete_and_exit+0x20/0x20
> > > [286123.159043] ret_from_fork+0x22/0x30
> > > [286123.159043] </TASK>
> > > [286123.187052] BUG: scheduling while atomic:
> > > nfsd/3983/0x00000002
> > > [286123.187551] INFO: lockdep is turned off.
> > > [286123.187918] Modules linked in: nfsd auth_rpcgss nfs_acl lockd
> > > grace sunrpc
> > > [286123.188527] Preemption disabled at:
> > > [286123.188535] [<0000000000000000>] 0x0
> > > [286123.189255] CPU: 1 PID: 3983 Comm: nfsd Kdump: loaded
> > > Tainted: G W 5.18.0-rc4+ #1
> > > [286123.190233] Hardware name: innotek GmbH
> > > VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
> > > [286123.190910] Call Trace:
> > > [286123.190910] <TASK>
> > > [286123.190910] dump_stack_lvl+0x57/0x7d
> > > [286123.190910] __schedule_bug.cold+0x133/0x143
> > > [286123.190910] __schedule+0x16c9/0x20a0
> > > [286123.190910] ? schedule_timeout+0x314/0x510
> > > [286123.190910] ? __sched_text_start+0x8/0x8
> > > [286123.190910] ? internal_add_timer+0xa4/0xe0
> > > [286123.190910] schedule+0xd7/0x1f0
> > > [286123.190910] schedule_timeout+0x319/0x510
> > > [286123.190910] ? rcu_read_lock_held_common+0xe/0xa0
> > > [286123.190910] ? usleep_range_state+0x150/0x150
> > > [286123.190910] ? lock_acquire+0x331/0x490
> > > [286123.190910] ? init_timer_on_stack_key+0x50/0x50
> > > [286123.190910] ? do_raw_spin_lock+0x116/0x260
> > > [286123.190910] ? rwlock_bug.part.0+0x90/0x90
> > > [286123.190910] io_schedule_timeout+0x26/0x80
> > > [286123.190910] wait_for_completion_io_timeout+0x16a/0x340
> > > [286123.190910] ? rcu_read_lock_bh_held+0x90/0x90
> > > [286123.190910] ? wait_for_completion+0x330/0x330
> > > [286123.190910] submit_bio_wait+0x135/0x1d0
> > > [286123.190910] ? submit_bio_wait_endio+0x40/0x40
> > > [286123.190910] ? xfs_iunlock+0xd5/0x300
> > > [286123.190910] ? bio_init+0x295/0x470
> > > [286123.190910] blkdev_issue_flush+0x69/0x80
> > > [286123.190910] ? blk_unregister_queue+0x1e0/0x1e0
> > > [286123.190910] ? bio_kmalloc+0x90/0x90
> > > [286123.190910] ? xfs_iunlock+0x1b4/0x300
> > > [286123.190910] xfs_file_fsync+0x354/0x820
> > > [286123.190910] ? lock_downgrade+0x680/0x680
> > > [286123.190910] ? xfs_filemap_pfn_mkwrite+0x10/0x10
> > > [286123.190910] ? nfsd_file_put+0x100/0x100 [nfsd]
> > > [286123.190910] nfsd_file_flush.isra.0+0x1b/0x220 [nfsd]
> > > [286123.190910] nfsd_file_put+0x79/0x100 [nfsd]
> > > [286123.190910] check_for_locks+0x152/0x200 [nfsd]
> > > [286123.190910] nfsd4_release_lockowner+0x4cf/0x850 [nfsd]
> > > [286123.190910] ? nfsd4_locku+0xd10/0xd10 [nfsd]
> > > [286123.190910] ? rcu_read_lock_bh_held+0x90/0x90
> > > [286123.190910] nfsd4_proc_compound+0xd15/0x25a0 [nfsd]
> > > [286123.190910] nfsd_dispatch+0x4ed/0xc30 [nfsd]
> > > [286123.190910] ? rcu_read_lock_bh_held+0x90/0x90
> > > [286123.190910] svc_process_common+0xd8e/0x1b20 [sunrpc]
> > > [286123.190910] ? svc_generic_rpcbind_set+0x450/0x450 [sunrpc]
> > > [286123.190910] ? nfsd_svc+0xc50/0xc50 [nfsd]
> > > [286123.190910] ? svc_sock_secure_port+0x27/0x40 [sunrpc]
> > > [286123.190910] ? svc_recv+0x1100/0x2390 [sunrpc]
> > > [286123.190910] svc_process+0x361/0x4f0 [sunrpc]
> > > [286123.190910] nfsd+0x2d6/0x570 [nfsd]
> > > [286123.190910] ? nfsd_shutdown_threads+0x2a0/0x2a0 [nfsd]
> > > [286123.190910] kthread+0x29f/0x340
> > > [286123.190910] ? kthread_complete_and_exit+0x20/0x20
> > > [286123.190910] ret_from_fork+0x22/0x30
> > > [286123.190910] </TASK>
> > >
> > > The problem is the process tries to sleep while holding the
> > > cl_lock by nfsd4_release_lockowner. I think the problem was
> > > introduced with the filemap_flush in nfsd_file_put since
> > > 'b6669305d35a nfsd: Reduce the number of calls to
> > > nfsd_file_gc()'.
> > > The filemap_flush is later replaced by nfsd_file_flush by
> > > '6b8a94332ee4f nfsd: Fix a write performance regression'.
> >
> > That seems plausible, given the traces above.
> >
> > But it begs the question: why was a vfs_fsync() needed by
> > RELEASE_LOCKOWNER in this case? I've tried to reproduce the
> > problem, and even added a might_sleep() call in nfsd_file_flush()
> > but haven't been able to reproduce.
>
> Trond, I'm assuming you switched to a synchronous flush here to
> capture writeback errors. There's no other requirement for
> waiting for the flush to complete, right?
It's because the file is unhashed, so it is about to be closed and
garbage collected as soon as the refcount goes to zero.
>
> To enable nfsd_file_put() to be invoked in atomic contexts again,
> would the following be a reasonable short term fix:
>
> diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> index 2c1b027774d4..96c8d07788f4 100644
> --- a/fs/nfsd/filecache.c
> +++ b/fs/nfsd/filecache.c
> @@ -304,7 +304,7 @@ nfsd_file_put(struct nfsd_file *nf)
> {
> set_bit(NFSD_FILE_REFERENCED, &nf->nf_flags);
> if (test_bit(NFSD_FILE_HASHED, &nf->nf_flags) == 0) {
> - nfsd_file_flush(nf);
> + filemap_flush(nf->nf_file->f_mapping);
> nfsd_file_put_noref(nf);
> } else {
> nfsd_file_put_noref(nf);
>
>
filemap_flush() sleeps, and so does nfsd_file_put_noref() (it can call
filp_close() + fput()), so this kind of change isn't going to work no
matter how you massage it.
Is there any reason why you need a reference to the nfs_file there?
Wouldn't holding the fp->fi_lock be sufficient, since you're already in
an atomic context? As long as one of the entries in fp->fi_fds[] is
non-zero then you should be safe.
--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]
> On May 11, 2022, at 1:31 PM, Trond Myklebust <[email protected]> wrote:
>
> On Wed, 2022-05-11 at 16:27 +0000, Chuck Lever III wrote:
>>
>>
>>> On May 11, 2022, at 12:09 PM, Trond Myklebust
>>> <[email protected]> wrote:
>>>
>>> On Wed, 2022-05-11 at 14:57 +0000, Chuck Lever III wrote:
>>>>
>>>>
>>>>> On May 10, 2022, at 10:24 AM, Chuck Lever III
>>>>> <[email protected]> wrote:
>>>>>
>>>>>
>>>>>
>>>>>> On May 3, 2022, at 3:11 PM, [email protected] wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I just noticed there were couple of oops in my Oracle6 in
>>>>>> nfs4.dev network.
>>>>>> I'm not sure who ran which tests (be useful to know) that
>>>>>> caused
>>>>>> these oops.
>>>>>>
>>>>>> Here is the stack traces:
>>>>>>
>>>>>> [286123.154006] BUG: sleeping function called from invalid
>>>>>> context at kernel/locking/rwsem.c:1585
>>>>>> [286123.155126] in_atomic(): 1, irqs_disabled(): 0,
>>>>>> non_block: 0,
>>>>>> pid: 3983, name: nfsd
>>>>>> [286123.155872] preempt_count: 1, expected: 0
>>>>>> [286123.156443] RCU nest depth: 0, expected: 0
>>>>>> [286123.156771] 1 lock held by nfsd/3983:
>>>>>> [286123.156786] #0: ffff888006762520 (&clp->cl_lock){+.+.}-
>>>>>> {2:2}, at: nfsd4_release_lockowner+0x306/0x850 [nfsd]
>>>>>> [286123.156949] Preemption disabled at:
>>>>>> [286123.156961] [<0000000000000000>] 0x0
>>>>>> [286123.157520] CPU: 1 PID: 3983 Comm: nfsd Kdump: loaded Not
>>>>>> tainted 5.18.0-rc4+ #1
>>>>>> [286123.157539] Hardware name: innotek GmbH
>>>>>> VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
>>>>>> [286123.157552] Call Trace:
>>>>>> [286123.157565] <TASK>
>>>>>> [286123.157581] dump_stack_lvl+0x57/0x7d
>>>>>> [286123.157609] __might_resched.cold+0x222/0x26b
>>>>>> [286123.157644] down_read_nested+0x68/0x420
>>>>>> [286123.157671] ? down_write_nested+0x130/0x130
>>>>>> [286123.157686] ? rwlock_bug.part.0+0x90/0x90
>>>>>> [286123.157705] ? rcu_read_lock_sched_held+0x81/0xb0
>>>>>> [286123.157749] xfs_file_fsync+0x3b9/0x820
>>>>>> [286123.157776] ? lock_downgrade+0x680/0x680
>>>>>> [286123.157798] ? xfs_filemap_pfn_mkwrite+0x10/0x10
>>>>>> [286123.157823] ? nfsd_file_put+0x100/0x100 [nfsd]
>>>>>> [286123.157921] nfsd_file_flush.isra.0+0x1b/0x220 [nfsd]
>>>>>> [286123.158007] nfsd_file_put+0x79/0x100 [nfsd]
>>>>>> [286123.158088] check_for_locks+0x152/0x200 [nfsd]
>>>>>> [286123.158191] nfsd4_release_lockowner+0x4cf/0x850 [nfsd]
>>>>>> [286123.158393] ? nfsd4_locku+0xd10/0xd10 [nfsd]
>>>>>> [286123.158488] ? rcu_read_lock_bh_held+0x90/0x90
>>>>>> [286123.158525] nfsd4_proc_compound+0xd15/0x25a0 [nfsd]
>>>>>> [286123.158699] nfsd_dispatch+0x4ed/0xc30 [nfsd]
>>>>>> [286123.158974] ? rcu_read_lock_bh_held+0x90/0x90
>>>>>> [286123.159010] svc_process_common+0xd8e/0x1b20 [sunrpc]
>>>>>> [286123.159043] ? svc_generic_rpcbind_set+0x450/0x450
>>>>>> [sunrpc]
>>>>>> [286123.159043] ? nfsd_svc+0xc50/0xc50 [nfsd]
>>>>>> [286123.159043] ? svc_sock_secure_port+0x27/0x40 [sunrpc]
>>>>>> [286123.159043] ? svc_recv+0x1100/0x2390 [sunrpc]
>>>>>> [286123.159043] svc_process+0x361/0x4f0 [sunrpc]
>>>>>> [286123.159043] nfsd+0x2d6/0x570 [nfsd]
>>>>>> [286123.159043] ? nfsd_shutdown_threads+0x2a0/0x2a0 [nfsd]
>>>>>> [286123.159043] kthread+0x29f/0x340
>>>>>> [286123.159043] ? kthread_complete_and_exit+0x20/0x20
>>>>>> [286123.159043] ret_from_fork+0x22/0x30
>>>>>> [286123.159043] </TASK>
>>>>>> [286123.187052] BUG: scheduling while atomic:
>>>>>> nfsd/3983/0x00000002
>>>>>> [286123.187551] INFO: lockdep is turned off.
>>>>>> [286123.187918] Modules linked in: nfsd auth_rpcgss nfs_acl
>>>>>> lockd
>>>>>> grace sunrpc
>>>>>> [286123.188527] Preemption disabled at:
>>>>>> [286123.188535] [<0000000000000000>] 0x0
>>>>>> [286123.189255] CPU: 1 PID: 3983 Comm: nfsd Kdump: loaded
>>>>>> Tainted: G W 5.18.0-rc4+ #1
>>>>>> [286123.190233] Hardware name: innotek GmbH
>>>>>> VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
>>>>>> [286123.190910] Call Trace:
>>>>>> [286123.190910] <TASK>
>>>>>> [286123.190910] dump_stack_lvl+0x57/0x7d
>>>>>> [286123.190910] __schedule_bug.cold+0x133/0x143
>>>>>> [286123.190910] __schedule+0x16c9/0x20a0
>>>>>> [286123.190910] ? schedule_timeout+0x314/0x510
>>>>>> [286123.190910] ? __sched_text_start+0x8/0x8
>>>>>> [286123.190910] ? internal_add_timer+0xa4/0xe0
>>>>>> [286123.190910] schedule+0xd7/0x1f0
>>>>>> [286123.190910] schedule_timeout+0x319/0x510
>>>>>> [286123.190910] ? rcu_read_lock_held_common+0xe/0xa0
>>>>>> [286123.190910] ? usleep_range_state+0x150/0x150
>>>>>> [286123.190910] ? lock_acquire+0x331/0x490
>>>>>> [286123.190910] ? init_timer_on_stack_key+0x50/0x50
>>>>>> [286123.190910] ? do_raw_spin_lock+0x116/0x260
>>>>>> [286123.190910] ? rwlock_bug.part.0+0x90/0x90
>>>>>> [286123.190910] io_schedule_timeout+0x26/0x80
>>>>>> [286123.190910] wait_for_completion_io_timeout+0x16a/0x340
>>>>>> [286123.190910] ? rcu_read_lock_bh_held+0x90/0x90
>>>>>> [286123.190910] ? wait_for_completion+0x330/0x330
>>>>>> [286123.190910] submit_bio_wait+0x135/0x1d0
>>>>>> [286123.190910] ? submit_bio_wait_endio+0x40/0x40
>>>>>> [286123.190910] ? xfs_iunlock+0xd5/0x300
>>>>>> [286123.190910] ? bio_init+0x295/0x470
>>>>>> [286123.190910] blkdev_issue_flush+0x69/0x80
>>>>>> [286123.190910] ? blk_unregister_queue+0x1e0/0x1e0
>>>>>> [286123.190910] ? bio_kmalloc+0x90/0x90
>>>>>> [286123.190910] ? xfs_iunlock+0x1b4/0x300
>>>>>> [286123.190910] xfs_file_fsync+0x354/0x820
>>>>>> [286123.190910] ? lock_downgrade+0x680/0x680
>>>>>> [286123.190910] ? xfs_filemap_pfn_mkwrite+0x10/0x10
>>>>>> [286123.190910] ? nfsd_file_put+0x100/0x100 [nfsd]
>>>>>> [286123.190910] nfsd_file_flush.isra.0+0x1b/0x220 [nfsd]
>>>>>> [286123.190910] nfsd_file_put+0x79/0x100 [nfsd]
>>>>>> [286123.190910] check_for_locks+0x152/0x200 [nfsd]
>>>>>> [286123.190910] nfsd4_release_lockowner+0x4cf/0x850 [nfsd]
>>>>>> [286123.190910] ? nfsd4_locku+0xd10/0xd10 [nfsd]
>>>>>> [286123.190910] ? rcu_read_lock_bh_held+0x90/0x90
>>>>>> [286123.190910] nfsd4_proc_compound+0xd15/0x25a0 [nfsd]
>>>>>> [286123.190910] nfsd_dispatch+0x4ed/0xc30 [nfsd]
>>>>>> [286123.190910] ? rcu_read_lock_bh_held+0x90/0x90
>>>>>> [286123.190910] svc_process_common+0xd8e/0x1b20 [sunrpc]
>>>>>> [286123.190910] ? svc_generic_rpcbind_set+0x450/0x450
>>>>>> [sunrpc]
>>>>>> [286123.190910] ? nfsd_svc+0xc50/0xc50 [nfsd]
>>>>>> [286123.190910] ? svc_sock_secure_port+0x27/0x40 [sunrpc]
>>>>>> [286123.190910] ? svc_recv+0x1100/0x2390 [sunrpc]
>>>>>> [286123.190910] svc_process+0x361/0x4f0 [sunrpc]
>>>>>> [286123.190910] nfsd+0x2d6/0x570 [nfsd]
>>>>>> [286123.190910] ? nfsd_shutdown_threads+0x2a0/0x2a0 [nfsd]
>>>>>> [286123.190910] kthread+0x29f/0x340
>>>>>> [286123.190910] ? kthread_complete_and_exit+0x20/0x20
>>>>>> [286123.190910] ret_from_fork+0x22/0x30
>>>>>> [286123.190910] </TASK>
>>>>>>
>>>>>> The problem is the process tries to sleep while holding the
>>>>>> cl_lock by nfsd4_release_lockowner. I think the problem was
>>>>>> introduced with the filemap_flush in nfsd_file_put since
>>>>>> 'b6669305d35a nfsd: Reduce the number of calls to
>>>>>> nfsd_file_gc()'.
>>>>>> The filemap_flush is later replaced by nfsd_file_flush by
>>>>>> '6b8a94332ee4f nfsd: Fix a write performance regression'.
>>>>>
>>>>> That seems plausible, given the traces above.
>>>>>
>>>>> But it begs the question: why was a vfs_fsync() needed by
>>>>> RELEASE_LOCKOWNER in this case? I've tried to reproduce the
>>>>> problem, and even added a might_sleep() call in
>>>>> nfsd_file_flush()
>>>>> but haven't been able to reproduce.
>>>>
>>>> Trond, I'm assuming you switched to a synchronous flush here to
>>>> capture writeback errors. There's no other requirement for
>>>> waiting for the flush to complete, right?
>>>
>>> It's because the file is unhashed, so it is about to be closed and
>>> garbage collected as soon as the refcount goes to zero.
>>>
>>>>
>>>> To enable nfsd_file_put() to be invoked in atomic contexts again,
>>>> would the following be a reasonable short term fix:
>>>>
>>>> diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
>>>> index 2c1b027774d4..96c8d07788f4 100644
>>>> --- a/fs/nfsd/filecache.c
>>>> +++ b/fs/nfsd/filecache.c
>>>> @@ -304,7 +304,7 @@ nfsd_file_put(struct nfsd_file *nf)
>>>> {
>>>> set_bit(NFSD_FILE_REFERENCED, &nf->nf_flags);
>>>> if (test_bit(NFSD_FILE_HASHED, &nf->nf_flags) == 0) {
>>>> - nfsd_file_flush(nf);
>>>> + filemap_flush(nf->nf_file->f_mapping);
>>>> nfsd_file_put_noref(nf);
>>>> } else {
>>>> nfsd_file_put_noref(nf);
>>>>
>>>>
>>>
>>> filemap_flush() sleeps, and so does nfsd_file_put_noref() (it can
>>> call
>>> filp_close() + fput()), so this kind of change isn't going to work
>>> no
>>> matter how you massage it.
>>
>> Er. Wouldn't that mean we would have seen "sleep while spinlock is
>> held" BUGs since nfsd4_release_lockowner() was added? It's done
>> at least an fput() while holding clp->cl_lock since it was added,
>> I think.
>
>
> There is nothing magical about using WB_SYNC_NONE as far as the
> writeback code is concerned. write_cache_pages() will still happily
> call lock_page() and sleep on that lock if it is contended. The
> writepage/writepages code will happily try to allocate memory if
> necessary.
>
> The only difference is that it won't sleep waiting for the PG_writeback
> bit.
>
> So, no, you can't safely call filemap_flush() from a spin locked
> context, and
> yes, the bug has been there from day 1. It was not introduced by me.
>
> Also, as I said, nfsd_file_put_noref() is not safe to call from a spin
> locked context. Again, this was not introduced any time recently.
OK. I'm trying to figure out how bad the problem is and how
far back to apply the fix.
I agree that the arrangement of the code path means
nfsd4_release_lockowner() has always called fput() or
filemap_flush() while cl_lock was held.
But again, I'm not aware of recent instances of this particular
splat. So I'm wondering if a recent change has made this issue
easier to hit. We might not be able to answer that until we
find out how the bake-a-thon testers managed to trigger the
issue on Dai's server.
>>> Is there any reason why you need a reference to the nfs_file there?
>>> Wouldn't holding the fp->fi_lock be sufficient, since you're
>>> already in
>>> an atomic context? As long as one of the entries in fp->fi_fds[] is
>>> non-zero then you should be safe.
>>
>> Sure, check_for_locks() seems to be the only problematic caller.
>> Other callers appear to be careful to call nfsd_file_put() only
>> after releasing spinlocks.
>>
>> I would like a fix that can be backported without fuss. I was
>> thinking changing check_for_locks() might get a little too
>> hairy for that, but I'll check it out.
Notably: check_for_locks() needs to drop fi_lock before taking
flc_lock because the OPEN path can take flc_lock first, then
fi_lock, via nfsd_break_deleg_cb(). Holding a reference to the
nfsd_file guarantees that the inode won't go away while
check_for_locks() examines the flc_posix list without holding
fi_lock.
So my first take on this was we need nfsd4_release_lockowner()
to drop cl_lock before check_for_locks() is called.
--
Chuck Lever
> On May 10, 2022, at 10:24 AM, Chuck Lever III <[email protected]> wrote:
>
>
>
>> On May 3, 2022, at 3:11 PM, [email protected] wrote:
>>
>> Hi,
>>
>> I just noticed there were couple of oops in my Oracle6 in nfs4.dev network.
>> I'm not sure who ran which tests (be useful to know) that caused these oops.
>>
>> Here is the stack traces:
>>
>> [286123.154006] BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:1585
>> [286123.155126] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 3983, name: nfsd
>> [286123.155872] preempt_count: 1, expected: 0
>> [286123.156443] RCU nest depth: 0, expected: 0
>> [286123.156771] 1 lock held by nfsd/3983:
>> [286123.156786] #0: ffff888006762520 (&clp->cl_lock){+.+.}-{2:2}, at: nfsd4_release_lockowner+0x306/0x850 [nfsd]
>> [286123.156949] Preemption disabled at:
>> [286123.156961] [<0000000000000000>] 0x0
>> [286123.157520] CPU: 1 PID: 3983 Comm: nfsd Kdump: loaded Not tainted 5.18.0-rc4+ #1
>> [286123.157539] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
>> [286123.157552] Call Trace:
>> [286123.157565] <TASK>
>> [286123.157581] dump_stack_lvl+0x57/0x7d
>> [286123.157609] __might_resched.cold+0x222/0x26b
>> [286123.157644] down_read_nested+0x68/0x420
>> [286123.157671] ? down_write_nested+0x130/0x130
>> [286123.157686] ? rwlock_bug.part.0+0x90/0x90
>> [286123.157705] ? rcu_read_lock_sched_held+0x81/0xb0
>> [286123.157749] xfs_file_fsync+0x3b9/0x820
>> [286123.157776] ? lock_downgrade+0x680/0x680
>> [286123.157798] ? xfs_filemap_pfn_mkwrite+0x10/0x10
>> [286123.157823] ? nfsd_file_put+0x100/0x100 [nfsd]
>> [286123.157921] nfsd_file_flush.isra.0+0x1b/0x220 [nfsd]
>> [286123.158007] nfsd_file_put+0x79/0x100 [nfsd]
>> [286123.158088] check_for_locks+0x152/0x200 [nfsd]
>> [286123.158191] nfsd4_release_lockowner+0x4cf/0x850 [nfsd]
>> [286123.158393] ? nfsd4_locku+0xd10/0xd10 [nfsd]
>> [286123.158488] ? rcu_read_lock_bh_held+0x90/0x90
>> [286123.158525] nfsd4_proc_compound+0xd15/0x25a0 [nfsd]
>> [286123.158699] nfsd_dispatch+0x4ed/0xc30 [nfsd]
>> [286123.158974] ? rcu_read_lock_bh_held+0x90/0x90
>> [286123.159010] svc_process_common+0xd8e/0x1b20 [sunrpc]
>> [286123.159043] ? svc_generic_rpcbind_set+0x450/0x450 [sunrpc]
>> [286123.159043] ? nfsd_svc+0xc50/0xc50 [nfsd]
>> [286123.159043] ? svc_sock_secure_port+0x27/0x40 [sunrpc]
>> [286123.159043] ? svc_recv+0x1100/0x2390 [sunrpc]
>> [286123.159043] svc_process+0x361/0x4f0 [sunrpc]
>> [286123.159043] nfsd+0x2d6/0x570 [nfsd]
>> [286123.159043] ? nfsd_shutdown_threads+0x2a0/0x2a0 [nfsd]
>> [286123.159043] kthread+0x29f/0x340
>> [286123.159043] ? kthread_complete_and_exit+0x20/0x20
>> [286123.159043] ret_from_fork+0x22/0x30
>> [286123.159043] </TASK>
>> [286123.187052] BUG: scheduling while atomic: nfsd/3983/0x00000002
>> [286123.187551] INFO: lockdep is turned off.
>> [286123.187918] Modules linked in: nfsd auth_rpcgss nfs_acl lockd grace sunrpc
>> [286123.188527] Preemption disabled at:
>> [286123.188535] [<0000000000000000>] 0x0
>> [286123.189255] CPU: 1 PID: 3983 Comm: nfsd Kdump: loaded Tainted: G W 5.18.0-rc4+ #1
>> [286123.190233] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
>> [286123.190910] Call Trace:
>> [286123.190910] <TASK>
>> [286123.190910] dump_stack_lvl+0x57/0x7d
>> [286123.190910] __schedule_bug.cold+0x133/0x143
>> [286123.190910] __schedule+0x16c9/0x20a0
>> [286123.190910] ? schedule_timeout+0x314/0x510
>> [286123.190910] ? __sched_text_start+0x8/0x8
>> [286123.190910] ? internal_add_timer+0xa4/0xe0
>> [286123.190910] schedule+0xd7/0x1f0
>> [286123.190910] schedule_timeout+0x319/0x510
>> [286123.190910] ? rcu_read_lock_held_common+0xe/0xa0
>> [286123.190910] ? usleep_range_state+0x150/0x150
>> [286123.190910] ? lock_acquire+0x331/0x490
>> [286123.190910] ? init_timer_on_stack_key+0x50/0x50
>> [286123.190910] ? do_raw_spin_lock+0x116/0x260
>> [286123.190910] ? rwlock_bug.part.0+0x90/0x90
>> [286123.190910] io_schedule_timeout+0x26/0x80
>> [286123.190910] wait_for_completion_io_timeout+0x16a/0x340
>> [286123.190910] ? rcu_read_lock_bh_held+0x90/0x90
>> [286123.190910] ? wait_for_completion+0x330/0x330
>> [286123.190910] submit_bio_wait+0x135/0x1d0
>> [286123.190910] ? submit_bio_wait_endio+0x40/0x40
>> [286123.190910] ? xfs_iunlock+0xd5/0x300
>> [286123.190910] ? bio_init+0x295/0x470
>> [286123.190910] blkdev_issue_flush+0x69/0x80
>> [286123.190910] ? blk_unregister_queue+0x1e0/0x1e0
>> [286123.190910] ? bio_kmalloc+0x90/0x90
>> [286123.190910] ? xfs_iunlock+0x1b4/0x300
>> [286123.190910] xfs_file_fsync+0x354/0x820
>> [286123.190910] ? lock_downgrade+0x680/0x680
>> [286123.190910] ? xfs_filemap_pfn_mkwrite+0x10/0x10
>> [286123.190910] ? nfsd_file_put+0x100/0x100 [nfsd]
>> [286123.190910] nfsd_file_flush.isra.0+0x1b/0x220 [nfsd]
>> [286123.190910] nfsd_file_put+0x79/0x100 [nfsd]
>> [286123.190910] check_for_locks+0x152/0x200 [nfsd]
>> [286123.190910] nfsd4_release_lockowner+0x4cf/0x850 [nfsd]
>> [286123.190910] ? nfsd4_locku+0xd10/0xd10 [nfsd]
>> [286123.190910] ? rcu_read_lock_bh_held+0x90/0x90
>> [286123.190910] nfsd4_proc_compound+0xd15/0x25a0 [nfsd]
>> [286123.190910] nfsd_dispatch+0x4ed/0xc30 [nfsd]
>> [286123.190910] ? rcu_read_lock_bh_held+0x90/0x90
>> [286123.190910] svc_process_common+0xd8e/0x1b20 [sunrpc]
>> [286123.190910] ? svc_generic_rpcbind_set+0x450/0x450 [sunrpc]
>> [286123.190910] ? nfsd_svc+0xc50/0xc50 [nfsd]
>> [286123.190910] ? svc_sock_secure_port+0x27/0x40 [sunrpc]
>> [286123.190910] ? svc_recv+0x1100/0x2390 [sunrpc]
>> [286123.190910] svc_process+0x361/0x4f0 [sunrpc]
>> [286123.190910] nfsd+0x2d6/0x570 [nfsd]
>> [286123.190910] ? nfsd_shutdown_threads+0x2a0/0x2a0 [nfsd]
>> [286123.190910] kthread+0x29f/0x340
>> [286123.190910] ? kthread_complete_and_exit+0x20/0x20
>> [286123.190910] ret_from_fork+0x22/0x30
>> [286123.190910] </TASK>
>>
>> The problem is the process tries to sleep while holding the
>> cl_lock by nfsd4_release_lockowner. I think the problem was
>> introduced with the filemap_flush in nfsd_file_put since
>> 'b6669305d35a nfsd: Reduce the number of calls to nfsd_file_gc()'.
>> The filemap_flush is later replaced by nfsd_file_flush by
>> '6b8a94332ee4f nfsd: Fix a write performance regression'.
>
> That seems plausible, given the traces above.
>
> But it begs the question: why was a vfs_fsync() needed by
> RELEASE_LOCKOWNER in this case? I've tried to reproduce the
> problem, and even added a might_sleep() call in nfsd_file_flush()
> but haven't been able to reproduce.
Trond, I'm assuming you switched to a synchronous flush here to
capture writeback errors. There's no other requirement for
waiting for the flush to complete, right?
To enable nfsd_file_put() to be invoked in atomic contexts again,
would the following be a reasonable short term fix:
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index 2c1b027774d4..96c8d07788f4 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -304,7 +304,7 @@ nfsd_file_put(struct nfsd_file *nf)
{
set_bit(NFSD_FILE_REFERENCED, &nf->nf_flags);
if (test_bit(NFSD_FILE_HASHED, &nf->nf_flags) == 0) {
- nfsd_file_flush(nf);
+ filemap_flush(nf->nf_file->f_mapping);
nfsd_file_put_noref(nf);
} else {
nfsd_file_put_noref(nf);
--
Chuck Lever
On Wed, 2022-05-11 at 16:27 +0000, Chuck Lever III wrote:
>
>
> > On May 11, 2022, at 12:09 PM, Trond Myklebust
> > <[email protected]> wrote:
> >
> > On Wed, 2022-05-11 at 14:57 +0000, Chuck Lever III wrote:
> > >
> > >
> > > > On May 10, 2022, at 10:24 AM, Chuck Lever III
> > > > <[email protected]> wrote:
> > > >
> > > >
> > > >
> > > > > On May 3, 2022, at 3:11 PM, [email protected] wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > I just noticed there were couple of oops in my Oracle6 in
> > > > > nfs4.dev network.
> > > > > I'm not sure who ran which tests (be useful to know) that
> > > > > caused
> > > > > these oops.
> > > > >
> > > > > Here is the stack traces:
> > > > >
> > > > > [286123.154006] BUG: sleeping function called from invalid
> > > > > context at kernel/locking/rwsem.c:1585
> > > > > [286123.155126] in_atomic(): 1, irqs_disabled(): 0,
> > > > > non_block: 0,
> > > > > pid: 3983, name: nfsd
> > > > > [286123.155872] preempt_count: 1, expected: 0
> > > > > [286123.156443] RCU nest depth: 0, expected: 0
> > > > > [286123.156771] 1 lock held by nfsd/3983:
> > > > > [286123.156786] #0: ffff888006762520 (&clp->cl_lock){+.+.}-
> > > > > {2:2}, at: nfsd4_release_lockowner+0x306/0x850 [nfsd]
> > > > > [286123.156949] Preemption disabled at:
> > > > > [286123.156961] [<0000000000000000>] 0x0
> > > > > [286123.157520] CPU: 1 PID: 3983 Comm: nfsd Kdump: loaded Not
> > > > > tainted 5.18.0-rc4+ #1
> > > > > [286123.157539] Hardware name: innotek GmbH
> > > > > VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
> > > > > [286123.157552] Call Trace:
> > > > > [286123.157565] <TASK>
> > > > > [286123.157581] dump_stack_lvl+0x57/0x7d
> > > > > [286123.157609] __might_resched.cold+0x222/0x26b
> > > > > [286123.157644] down_read_nested+0x68/0x420
> > > > > [286123.157671] ? down_write_nested+0x130/0x130
> > > > > [286123.157686] ? rwlock_bug.part.0+0x90/0x90
> > > > > [286123.157705] ? rcu_read_lock_sched_held+0x81/0xb0
> > > > > [286123.157749] xfs_file_fsync+0x3b9/0x820
> > > > > [286123.157776] ? lock_downgrade+0x680/0x680
> > > > > [286123.157798] ? xfs_filemap_pfn_mkwrite+0x10/0x10
> > > > > [286123.157823] ? nfsd_file_put+0x100/0x100 [nfsd]
> > > > > [286123.157921] nfsd_file_flush.isra.0+0x1b/0x220 [nfsd]
> > > > > [286123.158007] nfsd_file_put+0x79/0x100 [nfsd]
> > > > > [286123.158088] check_for_locks+0x152/0x200 [nfsd]
> > > > > [286123.158191] nfsd4_release_lockowner+0x4cf/0x850 [nfsd]
> > > > > [286123.158393] ? nfsd4_locku+0xd10/0xd10 [nfsd]
> > > > > [286123.158488] ? rcu_read_lock_bh_held+0x90/0x90
> > > > > [286123.158525] nfsd4_proc_compound+0xd15/0x25a0 [nfsd]
> > > > > [286123.158699] nfsd_dispatch+0x4ed/0xc30 [nfsd]
> > > > > [286123.158974] ? rcu_read_lock_bh_held+0x90/0x90
> > > > > [286123.159010] svc_process_common+0xd8e/0x1b20 [sunrpc]
> > > > > [286123.159043] ? svc_generic_rpcbind_set+0x450/0x450
> > > > > [sunrpc]
> > > > > [286123.159043] ? nfsd_svc+0xc50/0xc50 [nfsd]
> > > > > [286123.159043] ? svc_sock_secure_port+0x27/0x40 [sunrpc]
> > > > > [286123.159043] ? svc_recv+0x1100/0x2390 [sunrpc]
> > > > > [286123.159043] svc_process+0x361/0x4f0 [sunrpc]
> > > > > [286123.159043] nfsd+0x2d6/0x570 [nfsd]
> > > > > [286123.159043] ? nfsd_shutdown_threads+0x2a0/0x2a0 [nfsd]
> > > > > [286123.159043] kthread+0x29f/0x340
> > > > > [286123.159043] ? kthread_complete_and_exit+0x20/0x20
> > > > > [286123.159043] ret_from_fork+0x22/0x30
> > > > > [286123.159043] </TASK>
> > > > > [286123.187052] BUG: scheduling while atomic:
> > > > > nfsd/3983/0x00000002
> > > > > [286123.187551] INFO: lockdep is turned off.
> > > > > [286123.187918] Modules linked in: nfsd auth_rpcgss nfs_acl
> > > > > lockd
> > > > > grace sunrpc
> > > > > [286123.188527] Preemption disabled at:
> > > > > [286123.188535] [<0000000000000000>] 0x0
> > > > > [286123.189255] CPU: 1 PID: 3983 Comm: nfsd Kdump: loaded
> > > > > Tainted: G W 5.18.0-rc4+ #1
> > > > > [286123.190233] Hardware name: innotek GmbH
> > > > > VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
> > > > > [286123.190910] Call Trace:
> > > > > [286123.190910] <TASK>
> > > > > [286123.190910] dump_stack_lvl+0x57/0x7d
> > > > > [286123.190910] __schedule_bug.cold+0x133/0x143
> > > > > [286123.190910] __schedule+0x16c9/0x20a0
> > > > > [286123.190910] ? schedule_timeout+0x314/0x510
> > > > > [286123.190910] ? __sched_text_start+0x8/0x8
> > > > > [286123.190910] ? internal_add_timer+0xa4/0xe0
> > > > > [286123.190910] schedule+0xd7/0x1f0
> > > > > [286123.190910] schedule_timeout+0x319/0x510
> > > > > [286123.190910] ? rcu_read_lock_held_common+0xe/0xa0
> > > > > [286123.190910] ? usleep_range_state+0x150/0x150
> > > > > [286123.190910] ? lock_acquire+0x331/0x490
> > > > > [286123.190910] ? init_timer_on_stack_key+0x50/0x50
> > > > > [286123.190910] ? do_raw_spin_lock+0x116/0x260
> > > > > [286123.190910] ? rwlock_bug.part.0+0x90/0x90
> > > > > [286123.190910] io_schedule_timeout+0x26/0x80
> > > > > [286123.190910] wait_for_completion_io_timeout+0x16a/0x340
> > > > > [286123.190910] ? rcu_read_lock_bh_held+0x90/0x90
> > > > > [286123.190910] ? wait_for_completion+0x330/0x330
> > > > > [286123.190910] submit_bio_wait+0x135/0x1d0
> > > > > [286123.190910] ? submit_bio_wait_endio+0x40/0x40
> > > > > [286123.190910] ? xfs_iunlock+0xd5/0x300
> > > > > [286123.190910] ? bio_init+0x295/0x470
> > > > > [286123.190910] blkdev_issue_flush+0x69/0x80
> > > > > [286123.190910] ? blk_unregister_queue+0x1e0/0x1e0
> > > > > [286123.190910] ? bio_kmalloc+0x90/0x90
> > > > > [286123.190910] ? xfs_iunlock+0x1b4/0x300
> > > > > [286123.190910] xfs_file_fsync+0x354/0x820
> > > > > [286123.190910] ? lock_downgrade+0x680/0x680
> > > > > [286123.190910] ? xfs_filemap_pfn_mkwrite+0x10/0x10
> > > > > [286123.190910] ? nfsd_file_put+0x100/0x100 [nfsd]
> > > > > [286123.190910] nfsd_file_flush.isra.0+0x1b/0x220 [nfsd]
> > > > > [286123.190910] nfsd_file_put+0x79/0x100 [nfsd]
> > > > > [286123.190910] check_for_locks+0x152/0x200 [nfsd]
> > > > > [286123.190910] nfsd4_release_lockowner+0x4cf/0x850 [nfsd]
> > > > > [286123.190910] ? nfsd4_locku+0xd10/0xd10 [nfsd]
> > > > > [286123.190910] ? rcu_read_lock_bh_held+0x90/0x90
> > > > > [286123.190910] nfsd4_proc_compound+0xd15/0x25a0 [nfsd]
> > > > > [286123.190910] nfsd_dispatch+0x4ed/0xc30 [nfsd]
> > > > > [286123.190910] ? rcu_read_lock_bh_held+0x90/0x90
> > > > > [286123.190910] svc_process_common+0xd8e/0x1b20 [sunrpc]
> > > > > [286123.190910] ? svc_generic_rpcbind_set+0x450/0x450
> > > > > [sunrpc]
> > > > > [286123.190910] ? nfsd_svc+0xc50/0xc50 [nfsd]
> > > > > [286123.190910] ? svc_sock_secure_port+0x27/0x40 [sunrpc]
> > > > > [286123.190910] ? svc_recv+0x1100/0x2390 [sunrpc]
> > > > > [286123.190910] svc_process+0x361/0x4f0 [sunrpc]
> > > > > [286123.190910] nfsd+0x2d6/0x570 [nfsd]
> > > > > [286123.190910] ? nfsd_shutdown_threads+0x2a0/0x2a0 [nfsd]
> > > > > [286123.190910] kthread+0x29f/0x340
> > > > > [286123.190910] ? kthread_complete_and_exit+0x20/0x20
> > > > > [286123.190910] ret_from_fork+0x22/0x30
> > > > > [286123.190910] </TASK>
> > > > >
> > > > > The problem is the process tries to sleep while holding the
> > > > > cl_lock by nfsd4_release_lockowner. I think the problem was
> > > > > introduced with the filemap_flush in nfsd_file_put since
> > > > > 'b6669305d35a nfsd: Reduce the number of calls to
> > > > > nfsd_file_gc()'.
> > > > > The filemap_flush is later replaced by nfsd_file_flush by
> > > > > '6b8a94332ee4f nfsd: Fix a write performance regression'.
> > > >
> > > > That seems plausible, given the traces above.
> > > >
> > > > But it begs the question: why was a vfs_fsync() needed by
> > > > RELEASE_LOCKOWNER in this case? I've tried to reproduce the
> > > > problem, and even added a might_sleep() call in
> > > > nfsd_file_flush()
> > > > but haven't been able to reproduce.
> > >
> > > Trond, I'm assuming you switched to a synchronous flush here to
> > > capture writeback errors. There's no other requirement for
> > > waiting for the flush to complete, right?
> >
> > It's because the file is unhashed, so it is about to be closed and
> > garbage collected as soon as the refcount goes to zero.
> >
> > >
> > > To enable nfsd_file_put() to be invoked in atomic contexts again,
> > > would the following be a reasonable short term fix:
> > >
> > > diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> > > index 2c1b027774d4..96c8d07788f4 100644
> > > --- a/fs/nfsd/filecache.c
> > > +++ b/fs/nfsd/filecache.c
> > > @@ -304,7 +304,7 @@ nfsd_file_put(struct nfsd_file *nf)
> > > {
> > > set_bit(NFSD_FILE_REFERENCED, &nf->nf_flags);
> > > if (test_bit(NFSD_FILE_HASHED, &nf->nf_flags) == 0) {
> > > - nfsd_file_flush(nf);
> > > + filemap_flush(nf->nf_file->f_mapping);
> > > nfsd_file_put_noref(nf);
> > > } else {
> > > nfsd_file_put_noref(nf);
> > >
> > >
> >
> > filemap_flush() sleeps, and so does nfsd_file_put_noref() (it can
> > call
> > filp_close() + fput()), so this kind of change isn't going to work
> > no
> > matter how you massage it.
>
> Er. Wouldn't that mean we would have seen "sleep while spinlock is
> held" BUGs since nfsd4_release_lockowner() was added? It's done
> at least an fput() while holding clp->cl_lock since it was added,
> I think.
There is nothing magical about using WB_SYNC_NONE as far as the
writeback code is concerned. write_cache_pages() will still happily
call lock_page() and sleep on that lock if it is contended. The
writepage/writepages code will happily try to allocate memory if
necessary.
The only difference is that it won't sleep waiting for the PG_writeback
bit.
So, no, you can't safely call filemap_flush() from a spin locked
context, and
yes, the bug has been there from day 1. It was not introduced by me.
Also, as I said, nfsd_file_put_noref() is not safe to call from a spin
locked context. Again, this was not introduced any time recently.
>
> > Is there any reason why you need a reference to the nfs_file there?
> > Wouldn't holding the fp->fi_lock be sufficient, since you're
> > already in
> > an atomic context? As long as one of the entries in fp->fi_fds[] is
> > non-zero then you should be safe.
>
> Sure, check_for_locks() seems to be the only problematic caller.
> Other callers appear to be careful to call nfsd_file_put() only
> after releasing spinlocks.
>
> I would like a fix that can be backported without fuss. I was
> thinking changing check_for_locks() might get a little too
> hairy for that, but I'll check it out.
--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]
On 5/11/22 11:57 AM, Chuck Lever III wrote:
>
>> On May 11, 2022, at 1:31 PM, Trond Myklebust <[email protected]> wrote:
>>
>> On Wed, 2022-05-11 at 16:27 +0000, Chuck Lever III wrote:
>>>
>>>> On May 11, 2022, at 12:09 PM, Trond Myklebust
>>>> <[email protected]> wrote:
>>>>
>>>> On Wed, 2022-05-11 at 14:57 +0000, Chuck Lever III wrote:
>>>>>
>>>>>> On May 10, 2022, at 10:24 AM, Chuck Lever III
>>>>>> <[email protected]> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>> On May 3, 2022, at 3:11 PM, [email protected] wrote:
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I just noticed there were couple of oops in my Oracle6 in
>>>>>>> nfs4.dev network.
>>>>>>> I'm not sure who ran which tests (be useful to know) that
>>>>>>> caused
>>>>>>> these oops.
>>>>>>>
>>>>>>> Here is the stack traces:
>>>>>>>
>>>>>>> [286123.154006] BUG: sleeping function called from invalid
>>>>>>> context at kernel/locking/rwsem.c:1585
>>>>>>> [286123.155126] in_atomic(): 1, irqs_disabled(): 0,
>>>>>>> non_block: 0,
>>>>>>> pid: 3983, name: nfsd
>>>>>>> [286123.155872] preempt_count: 1, expected: 0
>>>>>>> [286123.156443] RCU nest depth: 0, expected: 0
>>>>>>> [286123.156771] 1 lock held by nfsd/3983:
>>>>>>> [286123.156786] #0: ffff888006762520 (&clp->cl_lock){+.+.}-
>>>>>>> {2:2}, at: nfsd4_release_lockowner+0x306/0x850 [nfsd]
>>>>>>> [286123.156949] Preemption disabled at:
>>>>>>> [286123.156961] [<0000000000000000>] 0x0
>>>>>>> [286123.157520] CPU: 1 PID: 3983 Comm: nfsd Kdump: loaded Not
>>>>>>> tainted 5.18.0-rc4+ #1
>>>>>>> [286123.157539] Hardware name: innotek GmbH
>>>>>>> VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
>>>>>>> [286123.157552] Call Trace:
>>>>>>> [286123.157565] <TASK>
>>>>>>> [286123.157581] dump_stack_lvl+0x57/0x7d
>>>>>>> [286123.157609] __might_resched.cold+0x222/0x26b
>>>>>>> [286123.157644] down_read_nested+0x68/0x420
>>>>>>> [286123.157671] ? down_write_nested+0x130/0x130
>>>>>>> [286123.157686] ? rwlock_bug.part.0+0x90/0x90
>>>>>>> [286123.157705] ? rcu_read_lock_sched_held+0x81/0xb0
>>>>>>> [286123.157749] xfs_file_fsync+0x3b9/0x820
>>>>>>> [286123.157776] ? lock_downgrade+0x680/0x680
>>>>>>> [286123.157798] ? xfs_filemap_pfn_mkwrite+0x10/0x10
>>>>>>> [286123.157823] ? nfsd_file_put+0x100/0x100 [nfsd]
>>>>>>> [286123.157921] nfsd_file_flush.isra.0+0x1b/0x220 [nfsd]
>>>>>>> [286123.158007] nfsd_file_put+0x79/0x100 [nfsd]
>>>>>>> [286123.158088] check_for_locks+0x152/0x200 [nfsd]
>>>>>>> [286123.158191] nfsd4_release_lockowner+0x4cf/0x850 [nfsd]
>>>>>>> [286123.158393] ? nfsd4_locku+0xd10/0xd10 [nfsd]
>>>>>>> [286123.158488] ? rcu_read_lock_bh_held+0x90/0x90
>>>>>>> [286123.158525] nfsd4_proc_compound+0xd15/0x25a0 [nfsd]
>>>>>>> [286123.158699] nfsd_dispatch+0x4ed/0xc30 [nfsd]
>>>>>>> [286123.158974] ? rcu_read_lock_bh_held+0x90/0x90
>>>>>>> [286123.159010] svc_process_common+0xd8e/0x1b20 [sunrpc]
>>>>>>> [286123.159043] ? svc_generic_rpcbind_set+0x450/0x450
>>>>>>> [sunrpc]
>>>>>>> [286123.159043] ? nfsd_svc+0xc50/0xc50 [nfsd]
>>>>>>> [286123.159043] ? svc_sock_secure_port+0x27/0x40 [sunrpc]
>>>>>>> [286123.159043] ? svc_recv+0x1100/0x2390 [sunrpc]
>>>>>>> [286123.159043] svc_process+0x361/0x4f0 [sunrpc]
>>>>>>> [286123.159043] nfsd+0x2d6/0x570 [nfsd]
>>>>>>> [286123.159043] ? nfsd_shutdown_threads+0x2a0/0x2a0 [nfsd]
>>>>>>> [286123.159043] kthread+0x29f/0x340
>>>>>>> [286123.159043] ? kthread_complete_and_exit+0x20/0x20
>>>>>>> [286123.159043] ret_from_fork+0x22/0x30
>>>>>>> [286123.159043] </TASK>
>>>>>>> [286123.187052] BUG: scheduling while atomic:
>>>>>>> nfsd/3983/0x00000002
>>>>>>> [286123.187551] INFO: lockdep is turned off.
>>>>>>> [286123.187918] Modules linked in: nfsd auth_rpcgss nfs_acl
>>>>>>> lockd
>>>>>>> grace sunrpc
>>>>>>> [286123.188527] Preemption disabled at:
>>>>>>> [286123.188535] [<0000000000000000>] 0x0
>>>>>>> [286123.189255] CPU: 1 PID: 3983 Comm: nfsd Kdump: loaded
>>>>>>> Tainted: G W 5.18.0-rc4+ #1
>>>>>>> [286123.190233] Hardware name: innotek GmbH
>>>>>>> VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
>>>>>>> [286123.190910] Call Trace:
>>>>>>> [286123.190910] <TASK>
>>>>>>> [286123.190910] dump_stack_lvl+0x57/0x7d
>>>>>>> [286123.190910] __schedule_bug.cold+0x133/0x143
>>>>>>> [286123.190910] __schedule+0x16c9/0x20a0
>>>>>>> [286123.190910] ? schedule_timeout+0x314/0x510
>>>>>>> [286123.190910] ? __sched_text_start+0x8/0x8
>>>>>>> [286123.190910] ? internal_add_timer+0xa4/0xe0
>>>>>>> [286123.190910] schedule+0xd7/0x1f0
>>>>>>> [286123.190910] schedule_timeout+0x319/0x510
>>>>>>> [286123.190910] ? rcu_read_lock_held_common+0xe/0xa0
>>>>>>> [286123.190910] ? usleep_range_state+0x150/0x150
>>>>>>> [286123.190910] ? lock_acquire+0x331/0x490
>>>>>>> [286123.190910] ? init_timer_on_stack_key+0x50/0x50
>>>>>>> [286123.190910] ? do_raw_spin_lock+0x116/0x260
>>>>>>> [286123.190910] ? rwlock_bug.part.0+0x90/0x90
>>>>>>> [286123.190910] io_schedule_timeout+0x26/0x80
>>>>>>> [286123.190910] wait_for_completion_io_timeout+0x16a/0x340
>>>>>>> [286123.190910] ? rcu_read_lock_bh_held+0x90/0x90
>>>>>>> [286123.190910] ? wait_for_completion+0x330/0x330
>>>>>>> [286123.190910] submit_bio_wait+0x135/0x1d0
>>>>>>> [286123.190910] ? submit_bio_wait_endio+0x40/0x40
>>>>>>> [286123.190910] ? xfs_iunlock+0xd5/0x300
>>>>>>> [286123.190910] ? bio_init+0x295/0x470
>>>>>>> [286123.190910] blkdev_issue_flush+0x69/0x80
>>>>>>> [286123.190910] ? blk_unregister_queue+0x1e0/0x1e0
>>>>>>> [286123.190910] ? bio_kmalloc+0x90/0x90
>>>>>>> [286123.190910] ? xfs_iunlock+0x1b4/0x300
>>>>>>> [286123.190910] xfs_file_fsync+0x354/0x820
>>>>>>> [286123.190910] ? lock_downgrade+0x680/0x680
>>>>>>> [286123.190910] ? xfs_filemap_pfn_mkwrite+0x10/0x10
>>>>>>> [286123.190910] ? nfsd_file_put+0x100/0x100 [nfsd]
>>>>>>> [286123.190910] nfsd_file_flush.isra.0+0x1b/0x220 [nfsd]
>>>>>>> [286123.190910] nfsd_file_put+0x79/0x100 [nfsd]
>>>>>>> [286123.190910] check_for_locks+0x152/0x200 [nfsd]
>>>>>>> [286123.190910] nfsd4_release_lockowner+0x4cf/0x850 [nfsd]
>>>>>>> [286123.190910] ? nfsd4_locku+0xd10/0xd10 [nfsd]
>>>>>>> [286123.190910] ? rcu_read_lock_bh_held+0x90/0x90
>>>>>>> [286123.190910] nfsd4_proc_compound+0xd15/0x25a0 [nfsd]
>>>>>>> [286123.190910] nfsd_dispatch+0x4ed/0xc30 [nfsd]
>>>>>>> [286123.190910] ? rcu_read_lock_bh_held+0x90/0x90
>>>>>>> [286123.190910] svc_process_common+0xd8e/0x1b20 [sunrpc]
>>>>>>> [286123.190910] ? svc_generic_rpcbind_set+0x450/0x450
>>>>>>> [sunrpc]
>>>>>>> [286123.190910] ? nfsd_svc+0xc50/0xc50 [nfsd]
>>>>>>> [286123.190910] ? svc_sock_secure_port+0x27/0x40 [sunrpc]
>>>>>>> [286123.190910] ? svc_recv+0x1100/0x2390 [sunrpc]
>>>>>>> [286123.190910] svc_process+0x361/0x4f0 [sunrpc]
>>>>>>> [286123.190910] nfsd+0x2d6/0x570 [nfsd]
>>>>>>> [286123.190910] ? nfsd_shutdown_threads+0x2a0/0x2a0 [nfsd]
>>>>>>> [286123.190910] kthread+0x29f/0x340
>>>>>>> [286123.190910] ? kthread_complete_and_exit+0x20/0x20
>>>>>>> [286123.190910] ret_from_fork+0x22/0x30
>>>>>>> [286123.190910] </TASK>
>>>>>>>
>>>>>>> The problem is the process tries to sleep while holding the
>>>>>>> cl_lock by nfsd4_release_lockowner. I think the problem was
>>>>>>> introduced with the filemap_flush in nfsd_file_put since
>>>>>>> 'b6669305d35a nfsd: Reduce the number of calls to
>>>>>>> nfsd_file_gc()'.
>>>>>>> The filemap_flush is later replaced by nfsd_file_flush by
>>>>>>> '6b8a94332ee4f nfsd: Fix a write performance regression'.
>>>>>> That seems plausible, given the traces above.
>>>>>>
>>>>>> But it begs the question: why was a vfs_fsync() needed by
>>>>>> RELEASE_LOCKOWNER in this case? I've tried to reproduce the
>>>>>> problem, and even added a might_sleep() call in
>>>>>> nfsd_file_flush()
>>>>>> but haven't been able to reproduce.
>>>>> Trond, I'm assuming you switched to a synchronous flush here to
>>>>> capture writeback errors. There's no other requirement for
>>>>> waiting for the flush to complete, right?
>>>> It's because the file is unhashed, so it is about to be closed and
>>>> garbage collected as soon as the refcount goes to zero.
>>>>
>>>>> To enable nfsd_file_put() to be invoked in atomic contexts again,
>>>>> would the following be a reasonable short term fix:
>>>>>
>>>>> diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
>>>>> index 2c1b027774d4..96c8d07788f4 100644
>>>>> --- a/fs/nfsd/filecache.c
>>>>> +++ b/fs/nfsd/filecache.c
>>>>> @@ -304,7 +304,7 @@ nfsd_file_put(struct nfsd_file *nf)
>>>>> {
>>>>> set_bit(NFSD_FILE_REFERENCED, &nf->nf_flags);
>>>>> if (test_bit(NFSD_FILE_HASHED, &nf->nf_flags) == 0) {
>>>>> - nfsd_file_flush(nf);
>>>>> + filemap_flush(nf->nf_file->f_mapping);
>>>>> nfsd_file_put_noref(nf);
>>>>> } else {
>>>>> nfsd_file_put_noref(nf);
>>>>>
>>>>>
>>>> filemap_flush() sleeps, and so does nfsd_file_put_noref() (it can
>>>> call
>>>> filp_close() + fput()), so this kind of change isn't going to work
>>>> no
>>>> matter how you massage it.
>>> Er. Wouldn't that mean we would have seen "sleep while spinlock is
>>> held" BUGs since nfsd4_release_lockowner() was added? It's done
>>> at least an fput() while holding clp->cl_lock since it was added,
>>> I think.
>>
>> There is nothing magical about using WB_SYNC_NONE as far as the
>> writeback code is concerned. write_cache_pages() will still happily
>> call lock_page() and sleep on that lock if it is contended. The
>> writepage/writepages code will happily try to allocate memory if
>> necessary.
>>
>> The only difference is that it won't sleep waiting for the PG_writeback
>> bit.
>>
>> So, no, you can't safely call filemap_flush() from a spin locked
>> context, and
>> yes, the bug has been there from day 1. It was not introduced by me.
>>
>> Also, as I said, nfsd_file_put_noref() is not safe to call from a spin
>> locked context. Again, this was not introduced any time recently.
> OK. I'm trying to figure out how bad the problem is and how
> far back to apply the fix.
>
> I agree that the arrangement of the code path means
> nfsd4_release_lockowner() has always called fput() or
> filemap_flush() while cl_lock was held.
>
> But again, I'm not aware of recent instances of this particular
> splat. So I'm wondering if a recent change has made this issue
> easier to hit. We might not be able to answer that until we
> find out how the bake-a-thon testers managed to trigger the
> issue on Dai's server.
I'm able to reproduce this problem with unmodified 5.18.0-rc6
by enhancing the pynfs st_releaselockowner.py to remove the
file before sending the RELEASE_LOCKOWNER.
Here is the diff of the test:
[root@nfsvmf25 pynfs]# git diff nfs4.0/servertests/st_releaselockowner.py
diff --git a/nfs4.0/servertests/st_releaselockowner.py b/nfs4.0/servertests/st_releaselockowner.py
index ccd10fff3ffa..a32f8ed70634 100644
--- a/nfs4.0/servertests/st_releaselockowner.py
+++ b/nfs4.0/servertests/st_releaselockowner.py
@@ -19,8 +19,12 @@ def testFile(t, env):
res = c.unlock_file(1, fh, res.lockid)
check(res)
+ ops = c.use_obj(c.homedir) + [op.remove(t.word())]
+ res = c.compound(ops)
+ check(res)
# Release lockowner
owner = lock_owner4(c.clientid, b"lockowner_RLOWN1")
res = c.compound([op.release_lockowner(owner)])
check(res)
[root@nfsvmf25 pynfs]#
Here is the stack traces of the problem:
May 11 18:11:54 localhost kernel: BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:1513
May 11 18:11:54 localhost kernel: in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 4234, name: nfsd
May 11 18:11:54 localhost kernel: preempt_count: 1, expected: 0
May 11 18:11:54 localhost kernel: RCU nest depth: 0, expected: 0
May 11 18:11:54 localhost kernel: INFO: lockdep is turned off.
May 11 18:11:54 localhost kernel: Preemption disabled at:
May 11 18:11:54 localhost kernel: [<0000000000000000>] 0x0
May 11 18:11:54 localhost kernel: CPU: 0 PID: 4234 Comm: nfsd Kdump: loaded Tainted: G W 5.18.0-rc6 #1
May 11 18:11:54 localhost kernel: Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
May 11 18:11:54 localhost kernel: Call Trace:
May 11 18:11:54 localhost kernel: <TASK>
May 11 18:11:54 localhost kernel: dump_stack_lvl+0x57/0x7d
May 11 18:11:54 localhost kernel: __might_resched.cold+0x222/0x26b
May 11 18:11:54 localhost kernel: down_write+0x61/0x130
May 11 18:11:54 localhost kernel: ? down_write_killable_nested+0x150/0x150
May 11 18:11:54 localhost kernel: btrfs_inode_lock+0x38/0x70
May 11 18:11:54 localhost kernel: btrfs_sync_file+0x291/0x1020
May 11 18:11:54 localhost kernel: ? rcu_read_lock_bh_held+0x90/0x90
May 11 18:11:54 localhost kernel: ? lock_acquire+0x2f1/0x490
May 11 18:11:54 localhost kernel: ? start_ordered_ops.constprop.0+0xe0/0xe0
May 11 18:11:54 localhost kernel: ? find_any_file+0x64/0x100 [nfsd]
May 11 18:11:54 localhost kernel: ? nfsd_file_put+0x100/0x100 [nfsd]
May 11 18:11:54 localhost kernel: nfsd_file_flush.isra.0+0x1b/0x220 [nfsd]
May 11 18:11:54 localhost kernel: nfsd_file_put+0x79/0x100 [nfsd]
May 11 18:11:54 localhost kernel: check_for_locks+0x152/0x200 [nfsd]
May 11 18:11:54 localhost kernel: nfsd4_release_lockowner+0x4cf/0x850 [nfsd]
May 11 18:11:54 localhost kernel: ? lock_downgrade+0x680/0x680
May 11 18:11:54 localhost kernel: ? nfsd4_locku+0xd10/0xd10 [nfsd]
May 11 18:11:54 localhost kernel: ? rcu_read_lock_bh_held+0x90/0x90
May 11 18:11:54 localhost kernel: nfsd4_proc_compound+0xd15/0x25a0 [nfsd]
May 11 18:11:54 localhost kernel: ? rcu_read_lock_held_common+0xe/0xa0
May 11 18:11:54 localhost kernel: nfsd_dispatch+0x4ed/0xc30 [nfsd]
May 11 18:11:54 localhost kernel: ? svc_reserve+0xb5/0x130 [sunrpc]
May 11 18:11:54 localhost kernel: svc_process_common+0xd8e/0x1b20 [sunrpc]
May 11 18:11:54 localhost kernel: ? svc_generic_rpcbind_set+0x450/0x450 [sunrpc]
May 11 18:11:54 localhost kernel: ? nfsd_svc+0xc50/0xc50 [nfsd]
May 11 18:11:54 localhost kernel: ? svc_sock_secure_port+0x36/0x40 [sunrpc]
May 11 18:11:54 localhost kernel: ? svc_recv+0x1100/0x2390 [sunrpc]
May 11 18:11:54 localhost kernel: svc_process+0x361/0x4f0 [sunrpc]
May 11 18:11:54 localhost kernel: nfsd+0x2d6/0x570 [nfsd]
May 11 18:11:54 localhost kernel: ? nfsd_shutdown_threads+0x2a0/0x2a0 [nfsd]
May 11 18:11:54 localhost kernel: kthread+0x29f/0x340
May 11 18:11:54 localhost kernel: ? kthread_complete_and_exit+0x20/0x20
May 11 18:11:54 localhost kernel: ret_from_fork+0x22/0x30
May 11 18:11:54 localhost kernel: </TASK>
-Dai
>
>
>>>> Is there any reason why you need a reference to the nfs_file there?
>>>> Wouldn't holding the fp->fi_lock be sufficient, since you're
>>>> already in
>>>> an atomic context? As long as one of the entries in fp->fi_fds[] is
>>>> non-zero then you should be safe.
>>> Sure, check_for_locks() seems to be the only problematic caller.
>>> Other callers appear to be careful to call nfsd_file_put() only
>>> after releasing spinlocks.
>>>
>>> I would like a fix that can be backported without fuss. I was
>>> thinking changing check_for_locks() might get a little too
>>> hairy for that, but I'll check it out.
> Notably: check_for_locks() needs to drop fi_lock before taking
> flc_lock because the OPEN path can take flc_lock first, then
> fi_lock, via nfsd_break_deleg_cb(). Holding a reference to the
> nfsd_file guarantees that the inode won't go away while
> check_for_locks() examines the flc_posix list without holding
> fi_lock.
>
> So my first take on this was we need nfsd4_release_lockowner()
> to drop cl_lock before check_for_locks() is called.
>
>
> --
> Chuck Lever
>
>
>
> On May 11, 2022, at 2:57 PM, Chuck Lever III <[email protected]> wrote:
>
>
>
>> On May 11, 2022, at 1:31 PM, Trond Myklebust <[email protected]> wrote:
>>
>> On Wed, 2022-05-11 at 16:27 +0000, Chuck Lever III wrote:
>>>
>>>
>>>> On May 11, 2022, at 12:09 PM, Trond Myklebust
>>>> <[email protected]> wrote:
>>>>
>>>> On Wed, 2022-05-11 at 14:57 +0000, Chuck Lever III wrote:
>>>>>
>>>>>
>>>>>> On May 10, 2022, at 10:24 AM, Chuck Lever III
>>>>>> <[email protected]> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>> On May 3, 2022, at 3:11 PM, [email protected] wrote:
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I just noticed there were couple of oops in my Oracle6 in
>>>>>>> nfs4.dev network.
>>>>>>> I'm not sure who ran which tests (be useful to know) that
>>>>>>> caused
>>>>>>> these oops.
>>>>>>>
>>>>>>> Here is the stack traces:
>>>>>>>
>>>>>>> [286123.154006] BUG: sleeping function called from invalid
>>>>>>> context at kernel/locking/rwsem.c:1585
>>>>>>> [286123.155126] in_atomic(): 1, irqs_disabled(): 0,
>>>>>>> non_block: 0,
>>>>>>> pid: 3983, name: nfsd
>>>>>>> [286123.155872] preempt_count: 1, expected: 0
>>>>>>> [286123.156443] RCU nest depth: 0, expected: 0
>>>>>>> [286123.156771] 1 lock held by nfsd/3983:
>>>>>>> [286123.156786] #0: ffff888006762520 (&clp->cl_lock){+.+.}-
>>>>>>> {2:2}, at: nfsd4_release_lockowner+0x306/0x850 [nfsd]
>>>>>>> [286123.156949] Preemption disabled at:
>>>>>>> [286123.156961] [<0000000000000000>] 0x0
>>>>>>> [286123.157520] CPU: 1 PID: 3983 Comm: nfsd Kdump: loaded Not
>>>>>>> tainted 5.18.0-rc4+ #1
>>>>>>> [286123.157539] Hardware name: innotek GmbH
>>>>>>> VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
>>>>>>> [286123.157552] Call Trace:
>>>>>>> [286123.157565] <TASK>
>>>>>>> [286123.157581] dump_stack_lvl+0x57/0x7d
>>>>>>> [286123.157609] __might_resched.cold+0x222/0x26b
>>>>>>> [286123.157644] down_read_nested+0x68/0x420
>>>>>>> [286123.157671] ? down_write_nested+0x130/0x130
>>>>>>> [286123.157686] ? rwlock_bug.part.0+0x90/0x90
>>>>>>> [286123.157705] ? rcu_read_lock_sched_held+0x81/0xb0
>>>>>>> [286123.157749] xfs_file_fsync+0x3b9/0x820
>>>>>>> [286123.157776] ? lock_downgrade+0x680/0x680
>>>>>>> [286123.157798] ? xfs_filemap_pfn_mkwrite+0x10/0x10
>>>>>>> [286123.157823] ? nfsd_file_put+0x100/0x100 [nfsd]
>>>>>>> [286123.157921] nfsd_file_flush.isra.0+0x1b/0x220 [nfsd]
>>>>>>> [286123.158007] nfsd_file_put+0x79/0x100 [nfsd]
>>>>>>> [286123.158088] check_for_locks+0x152/0x200 [nfsd]
>>>>>>> [286123.158191] nfsd4_release_lockowner+0x4cf/0x850 [nfsd]
>>>>>>> [286123.158393] ? nfsd4_locku+0xd10/0xd10 [nfsd]
>>>>>>> [286123.158488] ? rcu_read_lock_bh_held+0x90/0x90
>>>>>>> [286123.158525] nfsd4_proc_compound+0xd15/0x25a0 [nfsd]
>>>>>>> [286123.158699] nfsd_dispatch+0x4ed/0xc30 [nfsd]
>>>>>>> [286123.158974] ? rcu_read_lock_bh_held+0x90/0x90
>>>>>>> [286123.159010] svc_process_common+0xd8e/0x1b20 [sunrpc]
>>>>>>> [286123.159043] ? svc_generic_rpcbind_set+0x450/0x450
>>>>>>> [sunrpc]
>>>>>>> [286123.159043] ? nfsd_svc+0xc50/0xc50 [nfsd]
>>>>>>> [286123.159043] ? svc_sock_secure_port+0x27/0x40 [sunrpc]
>>>>>>> [286123.159043] ? svc_recv+0x1100/0x2390 [sunrpc]
>>>>>>> [286123.159043] svc_process+0x361/0x4f0 [sunrpc]
>>>>>>> [286123.159043] nfsd+0x2d6/0x570 [nfsd]
>>>>>>> [286123.159043] ? nfsd_shutdown_threads+0x2a0/0x2a0 [nfsd]
>>>>>>> [286123.159043] kthread+0x29f/0x340
>>>>>>> [286123.159043] ? kthread_complete_and_exit+0x20/0x20
>>>>>>> [286123.159043] ret_from_fork+0x22/0x30
>>>>>>> [286123.159043] </TASK>
>>>>>>> [286123.187052] BUG: scheduling while atomic:
>>>>>>> nfsd/3983/0x00000002
>>>>>>> [286123.187551] INFO: lockdep is turned off.
>>>>>>> [286123.187918] Modules linked in: nfsd auth_rpcgss nfs_acl
>>>>>>> lockd
>>>>>>> grace sunrpc
>>>>>>> [286123.188527] Preemption disabled at:
>>>>>>> [286123.188535] [<0000000000000000>] 0x0
>>>>>>> [286123.189255] CPU: 1 PID: 3983 Comm: nfsd Kdump: loaded
>>>>>>> Tainted: G W 5.18.0-rc4+ #1
>>>>>>> [286123.190233] Hardware name: innotek GmbH
>>>>>>> VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
>>>>>>> [286123.190910] Call Trace:
>>>>>>> [286123.190910] <TASK>
>>>>>>> [286123.190910] dump_stack_lvl+0x57/0x7d
>>>>>>> [286123.190910] __schedule_bug.cold+0x133/0x143
>>>>>>> [286123.190910] __schedule+0x16c9/0x20a0
>>>>>>> [286123.190910] ? schedule_timeout+0x314/0x510
>>>>>>> [286123.190910] ? __sched_text_start+0x8/0x8
>>>>>>> [286123.190910] ? internal_add_timer+0xa4/0xe0
>>>>>>> [286123.190910] schedule+0xd7/0x1f0
>>>>>>> [286123.190910] schedule_timeout+0x319/0x510
>>>>>>> [286123.190910] ? rcu_read_lock_held_common+0xe/0xa0
>>>>>>> [286123.190910] ? usleep_range_state+0x150/0x150
>>>>>>> [286123.190910] ? lock_acquire+0x331/0x490
>>>>>>> [286123.190910] ? init_timer_on_stack_key+0x50/0x50
>>>>>>> [286123.190910] ? do_raw_spin_lock+0x116/0x260
>>>>>>> [286123.190910] ? rwlock_bug.part.0+0x90/0x90
>>>>>>> [286123.190910] io_schedule_timeout+0x26/0x80
>>>>>>> [286123.190910] wait_for_completion_io_timeout+0x16a/0x340
>>>>>>> [286123.190910] ? rcu_read_lock_bh_held+0x90/0x90
>>>>>>> [286123.190910] ? wait_for_completion+0x330/0x330
>>>>>>> [286123.190910] submit_bio_wait+0x135/0x1d0
>>>>>>> [286123.190910] ? submit_bio_wait_endio+0x40/0x40
>>>>>>> [286123.190910] ? xfs_iunlock+0xd5/0x300
>>>>>>> [286123.190910] ? bio_init+0x295/0x470
>>>>>>> [286123.190910] blkdev_issue_flush+0x69/0x80
>>>>>>> [286123.190910] ? blk_unregister_queue+0x1e0/0x1e0
>>>>>>> [286123.190910] ? bio_kmalloc+0x90/0x90
>>>>>>> [286123.190910] ? xfs_iunlock+0x1b4/0x300
>>>>>>> [286123.190910] xfs_file_fsync+0x354/0x820
>>>>>>> [286123.190910] ? lock_downgrade+0x680/0x680
>>>>>>> [286123.190910] ? xfs_filemap_pfn_mkwrite+0x10/0x10
>>>>>>> [286123.190910] ? nfsd_file_put+0x100/0x100 [nfsd]
>>>>>>> [286123.190910] nfsd_file_flush.isra.0+0x1b/0x220 [nfsd]
>>>>>>> [286123.190910] nfsd_file_put+0x79/0x100 [nfsd]
>>>>>>> [286123.190910] check_for_locks+0x152/0x200 [nfsd]
>>>>>>> [286123.190910] nfsd4_release_lockowner+0x4cf/0x850 [nfsd]
>>>>>>> [286123.190910] ? nfsd4_locku+0xd10/0xd10 [nfsd]
>>>>>>> [286123.190910] ? rcu_read_lock_bh_held+0x90/0x90
>>>>>>> [286123.190910] nfsd4_proc_compound+0xd15/0x25a0 [nfsd]
>>>>>>> [286123.190910] nfsd_dispatch+0x4ed/0xc30 [nfsd]
>>>>>>> [286123.190910] ? rcu_read_lock_bh_held+0x90/0x90
>>>>>>> [286123.190910] svc_process_common+0xd8e/0x1b20 [sunrpc]
>>>>>>> [286123.190910] ? svc_generic_rpcbind_set+0x450/0x450
>>>>>>> [sunrpc]
>>>>>>> [286123.190910] ? nfsd_svc+0xc50/0xc50 [nfsd]
>>>>>>> [286123.190910] ? svc_sock_secure_port+0x27/0x40 [sunrpc]
>>>>>>> [286123.190910] ? svc_recv+0x1100/0x2390 [sunrpc]
>>>>>>> [286123.190910] svc_process+0x361/0x4f0 [sunrpc]
>>>>>>> [286123.190910] nfsd+0x2d6/0x570 [nfsd]
>>>>>>> [286123.190910] ? nfsd_shutdown_threads+0x2a0/0x2a0 [nfsd]
>>>>>>> [286123.190910] kthread+0x29f/0x340
>>>>>>> [286123.190910] ? kthread_complete_and_exit+0x20/0x20
>>>>>>> [286123.190910] ret_from_fork+0x22/0x30
>>>>>>> [286123.190910] </TASK>
>>>>>>>
>>>>>>> The problem is the process tries to sleep while holding the
>>>>>>> cl_lock by nfsd4_release_lockowner. I think the problem was
>>>>>>> introduced with the filemap_flush in nfsd_file_put since
>>>>>>> 'b6669305d35a nfsd: Reduce the number of calls to
>>>>>>> nfsd_file_gc()'.
>>>>>>> The filemap_flush is later replaced by nfsd_file_flush by
>>>>>>> '6b8a94332ee4f nfsd: Fix a write performance regression'.
>>>>>>
>>>>>> That seems plausible, given the traces above.
>>>>>>
>>>>>> But it begs the question: why was a vfs_fsync() needed by
>>>>>> RELEASE_LOCKOWNER in this case? I've tried to reproduce the
>>>>>> problem, and even added a might_sleep() call in
>>>>>> nfsd_file_flush()
>>>>>> but haven't been able to reproduce.
>>>>>
>>>>> Trond, I'm assuming you switched to a synchronous flush here to
>>>>> capture writeback errors. There's no other requirement for
>>>>> waiting for the flush to complete, right?
>>>>
>>>> It's because the file is unhashed, so it is about to be closed and
>>>> garbage collected as soon as the refcount goes to zero.
>>>>
>>>>>
>>>>> To enable nfsd_file_put() to be invoked in atomic contexts again,
>>>>> would the following be a reasonable short term fix:
>>>>>
>>>>> diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
>>>>> index 2c1b027774d4..96c8d07788f4 100644
>>>>> --- a/fs/nfsd/filecache.c
>>>>> +++ b/fs/nfsd/filecache.c
>>>>> @@ -304,7 +304,7 @@ nfsd_file_put(struct nfsd_file *nf)
>>>>> {
>>>>> set_bit(NFSD_FILE_REFERENCED, &nf->nf_flags);
>>>>> if (test_bit(NFSD_FILE_HASHED, &nf->nf_flags) == 0) {
>>>>> - nfsd_file_flush(nf);
>>>>> + filemap_flush(nf->nf_file->f_mapping);
>>>>> nfsd_file_put_noref(nf);
>>>>> } else {
>>>>> nfsd_file_put_noref(nf);
>>>>>
>>>>>
>>>>
>>>> filemap_flush() sleeps, and so does nfsd_file_put_noref() (it can
>>>> call
>>>> filp_close() + fput()), so this kind of change isn't going to work
>>>> no
>>>> matter how you massage it.
>>>
>>> Er. Wouldn't that mean we would have seen "sleep while spinlock is
>>> held" BUGs since nfsd4_release_lockowner() was added? It's done
>>> at least an fput() while holding clp->cl_lock since it was added,
>>> I think.
>>
>>
>> There is nothing magical about using WB_SYNC_NONE as far as the
>> writeback code is concerned. write_cache_pages() will still happily
>> call lock_page() and sleep on that lock if it is contended. The
>> writepage/writepages code will happily try to allocate memory if
>> necessary.
>>
>> The only difference is that it won't sleep waiting for the PG_writeback
>> bit.
>>
>> So, no, you can't safely call filemap_flush() from a spin locked
>> context, and
>> yes, the bug has been there from day 1. It was not introduced by me.
>>
>> Also, as I said, nfsd_file_put_noref() is not safe to call from a spin
>> locked context. Again, this was not introduced any time recently.
>
> OK. I'm trying to figure out how bad the problem is and how
> far back to apply the fix.
>
> I agree that the arrangement of the code path means
> nfsd4_release_lockowner() has always called fput() or
> filemap_flush() while cl_lock was held.
>
> But again, I'm not aware of recent instances of this particular
> splat. So I'm wondering if a recent change has made this issue
> easier to hit. We might not be able to answer that until we
> find out how the bake-a-thon testers managed to trigger the
> issue on Dai's server.
>
>
>>>> Is there any reason why you need a reference to the nfs_file there?
>>>> Wouldn't holding the fp->fi_lock be sufficient, since you're
>>>> already in
>>>> an atomic context? As long as one of the entries in fp->fi_fds[] is
>>>> non-zero then you should be safe.
>>>
>>> Sure, check_for_locks() seems to be the only problematic caller.
>>> Other callers appear to be careful to call nfsd_file_put() only
>>> after releasing spinlocks.
>>>
>>> I would like a fix that can be backported without fuss. I was
>>> thinking changing check_for_locks() might get a little too
>>> hairy for that, but I'll check it out.
>
> Notably: check_for_locks() needs to drop fi_lock before taking
> flc_lock because the OPEN path can take flc_lock first, then
Bzzt. Not OPEN, but rather nfsd_file_acquire().
> fi_lock, via nfsd_break_deleg_cb(). Holding a reference to the
> nfsd_file guarantees that the inode won't go away while
> check_for_locks() examines the flc_posix list without holding
> fi_lock.
>
> So my first take on this was we need nfsd4_release_lockowner()
> to drop cl_lock before check_for_locks() is called.
>
>
> --
> Chuck Lever
--
Chuck Lever