Return-Path: Date: Thu, 14 Dec 2017 13:55:14 -0500 From: "J. Bruce Fields" To: Benjamin Coddington Cc: linux-nfs@vger.kernel.org, hch@infradead.org Subject: Re: spurious sillyrename after O_DIRECT writes get ENOSPC Message-ID: <20171214185514.GE9205@fieldses.org> References: <20171208221626.GB22508@fieldses.org> <20171213171815.GB9205@fieldses.org> <9383ABA6-B994-4DB3-95F4-C0D6F59580F9@redhat.com> <20171214163654.GD9205@fieldses.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii In-Reply-To: <20171214163654.GD9205@fieldses.org> List-ID: On Thu, Dec 14, 2017 at 11:36:54AM -0500, J. Bruce Fields wrote: > On Thu, Dec 14, 2017 at 08:08:53AM -0500, Benjamin Coddington wrote: > > > > On 13 Dec 2017, at 12:18, J. Bruce Fields wrote: > > > > > On Fri, Dec 08, 2017 at 05:16:26PM -0500, J. Bruce Fields wrote: > > >> Last year Christoph noticed a bug that could result in a file being > > >> unnecessarily sillyrenamed after O_DIRECT writes get ENOSPC: > > >> > > >> http://lkml.kernel.org/r/20160616150146.GA14015@infradead.org > > >> > > >> It's reproduceable on upstream, over v3 or v4. > > >> > > >> I looked into it some more, and it seems to reproduce whenever a write > > >> system call results in multiple WRITE calls, only some of which receive > > >> ENOSPC. I think that's resulting in a leak of the wb_kref on some > > >> nfs_pages (possibly the ones corresponding to the ENOSPC failures?). > > >> Those nfs_pages in turn hold references on nfs_{lock,open}_contexts. So > > >> a "rm" on the client (even after the file is closed) results in a > > >> sillyrename. > > >> > > >> I'll keep looking at this, but the relevant code is pretty opaque to me > > >> so far. Any ideas welcomed. > > > > > > Actually it looks like a leak of dreq->io_count? That prevents commits > > > from being sent (which I'm also seeing in network traces--the succesfull > > > WRITEs are unstable but never get committed), which means > > > nfs_direct_commit_complete() is never called, and the reference taken on > > > wb_kref in the request_commit case of nfs_direct_write_completion is > > > never put. > > > > This sounds to me like the problem Scott's working - he sent a patch > > yesterday "nfs/pnfs: fix nfs_direct_req ref leak when i/o falls back to the > > mds". > > > > I think the the rule should be that once we call > > nfs_pgio_completion_ops->init_hdr, we have to finish with ->completion. > > However, there are some paths where that is not the case. > > Yes, I wondered about that.... > > Unfortunately after some more tests now I was think I was wrong, the > dreq_get()s and put()s are balanced and the bug is somewhere else--in > the case of my particular reproducer. Argh. But yes I can easily > believe there could be a leak there. So actually what happens is if you do a direct io write where some WRITEs succeed and the one fails, then this: if (test_bit(NFS_IOHDR_ERROR, &hdr->flags)) { dreq->flags = 0; dreq->error = hdr->error; } clears the NFS_ODIRECT_DO_COMMIT flag, so nfs_direct_write_complete never scheduels the commit calls. It looks like that leaves a bunch of nfs_pages on some to-be-committed list, so we end up leaking a bunch of stuff, with the most visible symptom being an unnecessarily sillyrename on close. I can just remove that clear of dreq_flags and that fixes the problem, but I doubt that's correct. --b.