Return-Path: linux-nfs-owner@vger.kernel.org Received: from mail-vc0-f181.google.com ([209.85.220.181]:41720 "EHLO mail-vc0-f181.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750725AbaIXWpy (ORCPT ); Wed, 24 Sep 2014 18:45:54 -0400 Received: by mail-vc0-f181.google.com with SMTP id hq12so868143vcb.12 for ; Wed, 24 Sep 2014 15:45:53 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: References: Date: Wed, 24 Sep 2014 18:45:53 -0400 Message-ID: Subject: Re: nfs4_lock_delegation_recall() improperly handles errors such as ERROR_GRACE From: Trond Myklebust To: Olga Kornievskaia Cc: linux-nfs Content-Type: text/plain; charset=UTF-8 Sender: linux-nfs-owner@vger.kernel.org List-ID: On Wed, Sep 24, 2014 at 6:31 PM, Olga Kornievskaia wrote: > On Wed, Sep 24, 2014 at 3:57 PM, Trond Myklebust > wrote: >> Hi Olga, >> >> On Wed, Sep 24, 2014 at 2:20 PM, Olga Kornievskaia wrote: >>> Hi Trond, >>> >>> nfs_delegation_claim_opens() return EAGAIN to nfs_end_delegation_return(). >>> issync is always 0 (as its called by the >>> nfs_client_return_marked_delegations) and it breaks out of the loop... >>> as a result the error just doesn't get handled. >> >> Ah. OK, so this is being called from >> nfs_client_return_marked_delegations. That makes sense. >> >> So for that case, I'd expect the call to return to the loop in >> nfs4_state_manager(), and then to retry through that after doing >> whatever is needed to recover. >> Essentially, we should be setting NFS4CLNT_DELEGRETURN again, and then >> bouncing back into nfs_client_return_marked_delegations (after all the >> recovery work has been done). > > Yes I don't fully understand what it should be. It never does anything > about recovering from the lock error and simply returns the > delegation. Ok I don't know if it means anything to you, but the 2nd > time around (when it returns the delegation even though it hasn't > recovered the lock), it never goes into the > nfs4_open_delegation_recall() because stateid condition doesn't hold > true. > > If it's not too much trouble, could you explain why lock error > shouldn't be handled as I suggested instead of resending the open with > claim_cur over again. As I understand in your case, it'll be a series > of successful open with claim_cur paired with a failed lock with > err_grace. In my case, it'll be one open with claim_cur and a number > of lock with err_grace. There is only 1 state manager thread allowed per nfs_client (i.e. per server) and so we want to avoid having it busy wait in any one state handler. Doing so would basically mean that all other state recovery on that nfs_client is on hold; i.e. we could not deal with exceptions like ADMIN_REVOKED, CB_PATH_DOWN, etc until the busy wait is over. This is why that code has been designed to fall all the way back to nfs4_state_manager() in the event of any error/exception. -- Trond Myklebust Linux NFS client maintainer, PrimaryData trond.myklebust@primarydata.com