Return-Path: linux-nfs-owner@vger.kernel.org Received: from mail-ig0-f175.google.com ([209.85.213.175]:52201 "EHLO mail-ig0-f175.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750793AbaIXXU5 (ORCPT ); Wed, 24 Sep 2014 19:20:57 -0400 Received: by mail-ig0-f175.google.com with SMTP id h18so7507887igc.2 for ; Wed, 24 Sep 2014 16:20:57 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: References: Date: Wed, 24 Sep 2014 19:20:57 -0400 Message-ID: Subject: Re: nfs4_lock_delegation_recall() improperly handles errors such as ERROR_GRACE From: Olga Kornievskaia To: Trond Myklebust Cc: linux-nfs Content-Type: text/plain; charset=UTF-8 Sender: linux-nfs-owner@vger.kernel.org List-ID: On Wed, Sep 24, 2014 at 6:45 PM, Trond Myklebust wrote: > On Wed, Sep 24, 2014 at 6:31 PM, Olga Kornievskaia wrote: >> On Wed, Sep 24, 2014 at 3:57 PM, Trond Myklebust >> wrote: >>> Hi Olga, >>> >>> On Wed, Sep 24, 2014 at 2:20 PM, Olga Kornievskaia wrote: >>>> Hi Trond, >>>> >>>> nfs_delegation_claim_opens() return EAGAIN to nfs_end_delegation_return(). >>>> issync is always 0 (as its called by the >>>> nfs_client_return_marked_delegations) and it breaks out of the loop... >>>> as a result the error just doesn't get handled. >>> >>> Ah. OK, so this is being called from >>> nfs_client_return_marked_delegations. That makes sense. >>> >>> So for that case, I'd expect the call to return to the loop in >>> nfs4_state_manager(), and then to retry through that after doing >>> whatever is needed to recover. >>> Essentially, we should be setting NFS4CLNT_DELEGRETURN again, and then >>> bouncing back into nfs_client_return_marked_delegations (after all the >>> recovery work has been done). >> >> Yes I don't fully understand what it should be. It never does anything >> about recovering from the lock error and simply returns the >> delegation. Ok I don't know if it means anything to you, but the 2nd >> time around (when it returns the delegation even though it hasn't >> recovered the lock), it never goes into the >> nfs4_open_delegation_recall() because stateid condition doesn't hold >> true. >> >> If it's not too much trouble, could you explain why lock error >> shouldn't be handled as I suggested instead of resending the open with >> claim_cur over again. As I understand in your case, it'll be a series >> of successful open with claim_cur paired with a failed lock with >> err_grace. In my case, it'll be one open with claim_cur and a number >> of lock with err_grace. > > There is only 1 state manager thread allowed per nfs_client (i.e. per > server) and so we want to avoid having it busy wait in any one state > handler. Doing so would basically mean that all other state recovery > on that nfs_client is on hold; i.e. we could not deal with exceptions > like ADMIN_REVOKED, CB_PATH_DOWN, etc until the busy wait is over. > This is why that code has been designed to fall all the way back to > nfs4_state_manager() in the event of any error/exception. Ok, thanks. It make sense. And makes things complicated. I'm sure you'll beat me to figuring out why the error is not handled but I'll keep trying. > > -- > Trond Myklebust > > Linux NFS client maintainer, PrimaryData > > trond.myklebust@primarydata.com