Return-Path: Received: from mail-it0-f45.google.com ([209.85.214.45]:37332 "EHLO mail-it0-f45.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751502AbdDNVRn (ORCPT ); Fri, 14 Apr 2017 17:17:43 -0400 Received: by mail-it0-f45.google.com with SMTP id a140so1131167ita.0 for ; Fri, 14 Apr 2017 14:17:43 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <1492197391.123147.1.camel@primarydata.com> References: <1492109480.7917.1.camel@primarydata.com> <9eb6ccc9-501a-5635-05f9-7bd2fd4f4563@Netapp.com> <1492116646.11542.1.camel@primarydata.com> <1492197391.123147.1.camel@primarydata.com> From: Olga Kornievskaia Date: Fri, 14 Apr 2017 17:17:42 -0400 Message-ID: Subject: Re: RFC: fixing kernel oops on interrupted COMMIT from nfs_commit_file To: Trond Myklebust Cc: "Anna.Schumaker@netapp.com" , "linux-nfs@vger.kernel.org" , "bjschuma@netapp.com" Content-Type: text/plain; charset=UTF-8 Sender: linux-nfs-owner@vger.kernel.org List-ID: On Fri, Apr 14, 2017 at 3:16 PM, Trond Myklebust wrote: > On Fri, 2017-04-14 at 11:56 -0400, Olga Kornievskaia wrote: >> On Thu, Apr 13, 2017 at 4:50 PM, Trond Myklebust >> wrote: >> > >> > Secondly, if you want to release the request and you are not sure >> > whether or not it got cleared off the inode's cinfo commit list >> > yet, >> > you may also need to lock that request and call >> > nfs_clear_request_commit(). >> >> Looking at what the function does, I don't see why this is needed. >> "wb_page" is NULL for this type of commit and there is no pnfs in >> this >> case. > > Something needs to ensure that the request is not sitting on a commit > list. That can happen if the commit succeeded, but returned a different > verifier, or it can happen if nfs_scan_commit() exits early. First point I'd like you to consider is that your last statement as a separate problem that needs to be solved with a different patch. Secondly, I don't understand how to determine "if nfs_commit_file() isn't sure whether or not request got cleared off the inode's cinfo commit list". There are two cases you pointed to 1) verifier mismatch and 2) nfs_scan_commit didn't even get to the request (caz INT_MAX commits were queued before that)? For the verifier mismatch: "NFS_CONTEXT_RESEND_WRITES" is set. So nfs_commit_file() can check for that. But I guess I'm not sure what is suppose to be done? Resend the commit or resend the copy? I don't really understand how to handle #2. nfs_commit_inode() return from doing INT_MAX commits but not doing the commit that it was suppose to do nfs_commit_file() asked and I'm at a loss how to determine that. Shouldn't there have been a loop somewhere that handles *all* commit requests and not just INT_MAX requests. Is this problem not a problem from the nfs_commit_file() but rather a generic problem?