Return-Path: Received: from mx1.redhat.com ([209.132.183.28]:33372 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751599AbeDKRdj (ORCPT ); Wed, 11 Apr 2018 13:33:39 -0400 Date: Wed, 11 Apr 2018 13:33:37 -0400 From: "J. Bruce Fields" To: "J. Bruce Fields" Cc: Olga Kornievskaia , Olga Kornievskaia , linux-nfs Subject: Re: [PATCH v7 10/10] NFSD stop queued async copies on client shutdown Message-ID: <20180411173337.GQ16717@parsley.fieldses.org> References: <20180220164229.65404-1-kolga@netapp.com> <20180220164229.65404-11-kolga@netapp.com> <20180308170511.GF10782@fieldses.org> <20180410202123.GB5685@parsley.fieldses.org> <20180410211307.GB314@fieldses.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii In-Reply-To: <20180410211307.GB314@fieldses.org> Sender: linux-nfs-owner@vger.kernel.org List-ID: On Tue, Apr 10, 2018 at 05:13:07PM -0400, J. Bruce Fields wrote: > On Tue, Apr 10, 2018 at 05:07:02PM -0400, Olga Kornievskaia wrote: > > On Tue, Apr 10, 2018 at 4:21 PM, J. Bruce Fields wrote: > > > DESTROY_CLIENTID doesn't throw away all the client's state for it, it's > > > only meant to be called after the client has already cleaned up > > > everything else. So: > > > > > > https://tools.ietf.org/html/rfc5661#section-18.50.3 > > > > > > If there are sessions (both idle and non-idle), opens, locks, > > > delegations, layouts, and/or wants (Section 18.49) associated > > > with the unexpired lease of the client ID, the server MUST > > > return NFS4ERR_CLIENTID_BUSY. > > > > > > My feeling is that "ongoing copies" also belongs on that list. And come to think of it we should actually be adding that check to client_has_state()--it should return clientid_busy if there are any copies in progress. > > > So the server behavior you're seeing sounds correct to me--the client > > > should cancel any ongoing copies before calling DESTROY_CLIENTID. > > > > If the behavior of returning ERR_DELAY until the copy is done is > > correct one, then I don't think I need this patch at all. Since copy > > takes a reference on the nfs4_client structure, then in > > __destroy_client() where nfsd4_shutdown_copy() is called the list will > > always be empty. > > Actually I guess it should be returning CLIENTID_BUSY. Maybe that's a > preexisting bug. So the copy should be caught by the earlier client_has_state() check before it gets to the later mark_client_expired_locked(). And after reminding myself how this works.... We only hold references on clients temporarily such as while we're actually processing an RPC from a client. An elevated cl_refcount prevents the server from removing the client even after the lease expires, or after the client reboots and attempts to clear its old state with a new EXCHANGE_ID/CREATE_SESSION. I don't think that's what we want. Clients still need to renew their lease in the usual way, a long-running async copy doesn't keep the lease renewed automatically. So, the asynchronous copy shouldn't hold a reference on the client. The copy thread can still safely use the client while it's running, because it knows that anyone destroying the client will first cancel the copy and wait for the thread to die. --b.