2022-01-12 23:53:22

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH RFC v9 0/2] nfsd: Initial implementation of NFSv4 Courteous Server

Could you look back over previous comments? I notice there's a couple
unaddressed (circular locking dependency, Documentation/filesystems/).

I agree with Chuck that we don't need to reschedule the laundromat, it's
OK if it takes longer to get around to cleaning up a dead client.

--b.

On Mon, Jan 10, 2022 at 10:50:51AM -0800, Dai Ngo wrote:
> Hi Bruce, Chuck
>
> This series of patches implement the NFSv4 Courteous Server.
>
> A server which does not immediately expunge the state on lease expiration
> is known as a Courteous Server. A Courteous Server continues to recognize
> previously generated state tokens as valid until conflict arises between
> the expired state and the requests from another client, or the server
> reboots.
>
> The v2 patch includes the following:
>
> . add new callback, lm_expire_lock, to lock_manager_operations to
> allow the lock manager to take appropriate action with conflict lock.
>
> . handle conflicts of NFSv4 locks with NFSv3/NLM and local locks.
>
> . expire courtesy client after 24hr if client has not reconnected.
>
> . do not allow expired client to become courtesy client if there are
> waiters for client's locks.
>
> . modify client_info_show to show courtesy client and seconds from
> last renew.
>
> . fix a problem with NFSv4.1 server where the it keeps returning
> SEQ4_STATUS_CB_PATH_DOWN in the successful SEQUENCE reply, after
> the courtesy client re-connects, causing the client to keep sending
> BCTS requests to server.
>
> The v3 patch includes the following:
>
> . modified posix_test_lock to check and resolve conflict locks
> to handle NLM TEST and NFSv4 LOCKT requests.
>
> . separate out fix for back channel stuck in SEQ4_STATUS_CB_PATH_DOWN.
>
> The v4 patch includes:
>
> . rework nfsd_check_courtesy to avoid dead lock of fl_lock and client_lock
> by asking the laudromat thread to destroy the courtesy client.
>
> . handle NFSv4 share reservation conflicts with courtesy client. This
> includes conflicts between access mode and deny mode and vice versa.
>
> . drop the patch for back channel stuck in SEQ4_STATUS_CB_PATH_DOWN.
>
> The v5 patch includes:
>
> . fix recursive locking of file_rwsem from posix_lock_file.
>
> . retest with LOCKDEP enabled.
>
> The v6 patch includes:
>
> . merge witn 5.15-rc7
>
> . fix a bug in nfs4_check_deny_bmap that did not check for matched
> nfs4_file before checking for access/deny conflict. This bug causes
> pynfs OPEN18 to fail since the server taking too long to release
> lots of un-conflict clients' state.
>
> . enhance share reservation conflict handler to handle case where
> a large number of conflict courtesy clients need to be expired.
> The 1st 100 clients are expired synchronously and the rest are
> expired in the background by the laundromat and NFS4ERR_DELAY
> is returned to the NFS client. This is needed to prevent the
> NFS client from timing out waiting got the reply.
>
> The v7 patch includes:
>
> . Fix race condition in posix_test_lock and posix_lock_inode after
> dropping spinlock.
>
> . Enhance nfsd4_fl_expire_lock to work with with new lm_expire_lock
> callback
>
> . Always resolve share reservation conflicts asynchrously.
>
> . Fix bug in nfs4_laundromat where spinlock is not used when
> scanning cl_ownerstr_hashtbl.
>
> . Fix bug in nfs4_laundromat where idr_get_next was called
> with incorrect 'id'.
>
> . Merge nfs4_destroy_courtesy_client into nfsd4_fl_expire_lock.
>
> The v8 patch includes:
>
> . Fix warning in nfsd4_fl_expire_lock reported by test robot.
>
> The V9 patch include:
>
> . Simplify lm_expire_lock API by (1) remove the 'testonly' flag
> and (2) specifying return value as true/false to indicate
> whether conflict was succesfully resolved.
>
> . Rework nfsd4_fl_expire_lock to mark client with
> NFSD4_DESTROY_COURTESY_CLIENT then tell the laundromat to expire
> the client in the background.
>
> . Add a spinlock in nfs4_client to synchronize access to the
> NFSD4_COURTESY_CLIENT and NFSD4_DESTROY_COURTESY_CLIENT flag to
> handle race conditions when resolving lock and share reservation
> conflict.
>
> . Courtesy client that was marked as NFSD4_DESTROY_COURTESY_CLIENT
> are now consisdered 'dead', waiting for the laundromat to expire
> it. This client is no longer allowed to use its states if it
> re-connects before the laundromat finishes expiring the client.
>
> For v4.1 client, the detection is done in the processing of the
> SEQUENCE op and returns NFS4ERR_BAD_SESSION to force the client
> to re-establish new clientid and session.
> For v4.0 client, the detection is done in the processing of the
> RENEW and state-related ops and return NFS4ERR_EXPIRE to force
> the client to re-establish new clientid.


2022-01-12 23:53:31

by Dai Ngo

[permalink] [raw]
Subject: Re: [PATCH RFC v9 0/2] nfsd: Initial implementation of NFSv4 Courteous Server


On 1/12/22 10:59 AM, J. Bruce Fields wrote:
> Could you look back over previous comments? I notice there's a couple
> unaddressed (circular locking dependency, Documentation/filesystems/).

I think v9 addresses the circular locking dependency. I will update the
Documentation/filesystems/locking.rst in v10.

>
> I agree with Chuck that we don't need to reschedule the laundromat, it's
> OK if it takes longer to get around to cleaning up a dead client.

Yes, it is now implemented for lock conflict and share reservation
resolution. I'm doing the same for delegation conflict.

-Dai

>
> --b.
>
> On Mon, Jan 10, 2022 at 10:50:51AM -0800, Dai Ngo wrote:
>> Hi Bruce, Chuck
>>
>> This series of patches implement the NFSv4 Courteous Server.
>>
>> A server which does not immediately expunge the state on lease expiration
>> is known as a Courteous Server. A Courteous Server continues to recognize
>> previously generated state tokens as valid until conflict arises between
>> the expired state and the requests from another client, or the server
>> reboots.
>>
>> The v2 patch includes the following:
>>
>> . add new callback, lm_expire_lock, to lock_manager_operations to
>> allow the lock manager to take appropriate action with conflict lock.
>>
>> . handle conflicts of NFSv4 locks with NFSv3/NLM and local locks.
>>
>> . expire courtesy client after 24hr if client has not reconnected.
>>
>> . do not allow expired client to become courtesy client if there are
>> waiters for client's locks.
>>
>> . modify client_info_show to show courtesy client and seconds from
>> last renew.
>>
>> . fix a problem with NFSv4.1 server where the it keeps returning
>> SEQ4_STATUS_CB_PATH_DOWN in the successful SEQUENCE reply, after
>> the courtesy client re-connects, causing the client to keep sending
>> BCTS requests to server.
>>
>> The v3 patch includes the following:
>>
>> . modified posix_test_lock to check and resolve conflict locks
>> to handle NLM TEST and NFSv4 LOCKT requests.
>>
>> . separate out fix for back channel stuck in SEQ4_STATUS_CB_PATH_DOWN.
>>
>> The v4 patch includes:
>>
>> . rework nfsd_check_courtesy to avoid dead lock of fl_lock and client_lock
>> by asking the laudromat thread to destroy the courtesy client.
>>
>> . handle NFSv4 share reservation conflicts with courtesy client. This
>> includes conflicts between access mode and deny mode and vice versa.
>>
>> . drop the patch for back channel stuck in SEQ4_STATUS_CB_PATH_DOWN.
>>
>> The v5 patch includes:
>>
>> . fix recursive locking of file_rwsem from posix_lock_file.
>>
>> . retest with LOCKDEP enabled.
>>
>> The v6 patch includes:
>>
>> . merge witn 5.15-rc7
>>
>> . fix a bug in nfs4_check_deny_bmap that did not check for matched
>> nfs4_file before checking for access/deny conflict. This bug causes
>> pynfs OPEN18 to fail since the server taking too long to release
>> lots of un-conflict clients' state.
>>
>> . enhance share reservation conflict handler to handle case where
>> a large number of conflict courtesy clients need to be expired.
>> The 1st 100 clients are expired synchronously and the rest are
>> expired in the background by the laundromat and NFS4ERR_DELAY
>> is returned to the NFS client. This is needed to prevent the
>> NFS client from timing out waiting got the reply.
>>
>> The v7 patch includes:
>>
>> . Fix race condition in posix_test_lock and posix_lock_inode after
>> dropping spinlock.
>>
>> . Enhance nfsd4_fl_expire_lock to work with with new lm_expire_lock
>> callback
>>
>> . Always resolve share reservation conflicts asynchrously.
>>
>> . Fix bug in nfs4_laundromat where spinlock is not used when
>> scanning cl_ownerstr_hashtbl.
>>
>> . Fix bug in nfs4_laundromat where idr_get_next was called
>> with incorrect 'id'.
>>
>> . Merge nfs4_destroy_courtesy_client into nfsd4_fl_expire_lock.
>>
>> The v8 patch includes:
>>
>> . Fix warning in nfsd4_fl_expire_lock reported by test robot.
>>
>> The V9 patch include:
>>
>> . Simplify lm_expire_lock API by (1) remove the 'testonly' flag
>> and (2) specifying return value as true/false to indicate
>> whether conflict was succesfully resolved.
>>
>> . Rework nfsd4_fl_expire_lock to mark client with
>> NFSD4_DESTROY_COURTESY_CLIENT then tell the laundromat to expire
>> the client in the background.
>>
>> . Add a spinlock in nfs4_client to synchronize access to the
>> NFSD4_COURTESY_CLIENT and NFSD4_DESTROY_COURTESY_CLIENT flag to
>> handle race conditions when resolving lock and share reservation
>> conflict.
>>
>> . Courtesy client that was marked as NFSD4_DESTROY_COURTESY_CLIENT
>> are now consisdered 'dead', waiting for the laundromat to expire
>> it. This client is no longer allowed to use its states if it
>> re-connects before the laundromat finishes expiring the client.
>>
>> For v4.1 client, the detection is done in the processing of the
>> SEQUENCE op and returns NFS4ERR_BAD_SESSION to force the client
>> to re-establish new clientid and session.
>> For v4.0 client, the detection is done in the processing of the
>> RENEW and state-related ops and return NFS4ERR_EXPIRE to force
>> the client to re-establish new clientid.