Return-Path: Received: from linuxhacker.ru ([217.76.32.60]:56892 "EHLO fiona.linuxhacker.ru" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750899AbcGGQw2 convert rfc822-to-8bit (ORCPT ); Thu, 7 Jul 2016 12:52:28 -0400 Subject: Re: [PATCH 1/2] nfs: Fix spurios EPERM when mkdir of existing dentry Mime-Version: 1.0 (Apple Message framework v1283) Content-Type: text/plain; charset=windows-1252 From: Oleg Drokin In-Reply-To: <9F6E5530-CBA4-4A0A-BAE1-72AAC28DB4D1@primarydata.com> Date: Thu, 7 Jul 2016 12:52:12 -0400 Cc: Jeff Layton , Viro Alexander , "linux-fsdevel@vger.kernel.org" , "linux-nfs@vger.kernel.org" Message-Id: References: <1467870827-2959489-1-git-send-email-green@linuxhacker.ru> <1467870827-2959489-2-git-send-email-green@linuxhacker.ru> <9F6E5530-CBA4-4A0A-BAE1-72AAC28DB4D1@primarydata.com> To: Trond Myklebust Sender: linux-nfs-owner@vger.kernel.org List-ID: On Jul 7, 2016, at 12:16 PM, Trond Myklebust wrote: > >> On Jul 7, 2016, at 01:53, Oleg Drokin wrote: >> >> It's great when we can shave an extra RPC, but not at the expense >> of correctness. >> We should not return EPERM (from vfs_create/mknod/mkdir) if the >> name already exists, even if we have no write access in parent. >> >> Since the check in nfs_permission is clearly not enough to stave >> off this, just throw in the extra READ access to actually >> go through. >> >> Signed-off-by: Oleg Drokin >> --- >> fs/nfs/dir.c | 4 +++- >> 1 file changed, 3 insertions(+), 1 deletion(-) >> >> diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c >> index d8015a0..8c7835b 100644 >> --- a/fs/nfs/dir.c >> +++ b/fs/nfs/dir.c >> @@ -1383,8 +1383,10 @@ struct dentry *nfs_lookup(struct inode *dir, struct dentry * dentry, unsigned in >> /* >> * If we're doing an exclusive create, optimize away the lookup >> * but don't hash the dentry. >> + * This optimization only works if we can write in the parent. >> */ >> - if (nfs_is_exclusive_create(dir, flags)) >> + if (nfs_is_exclusive_create(dir, flags) && >> + (inode_permission(dir, MAY_WRITE | MAY_READ | MAY_EXEC) == 0)) >> return NULL; >> > > NACK. The only write permission we should care about on the client side is whether or not the filesystem is mounted read-only. All other permissions are checked by the server. Right. This was mostly a discussion piece. The problem here is nfs_permission() returns 0 if you check for inode_permission(dir, MAY_WRITE | MAY_EXEC) (as in may_create), but then some other checks in the kernel still catch on to the fact that the directory is not writeable, so we have a premature failure with EPERM and server never sees this request which breaks things. (the read-only mount is not handled as well at the moment of course and my patch does not address this issue either, but it's easier to address in the VFS, like in filename_create() or something). I see that two major consumers of this nfs_permission MAY_WRITE|!MAY_READ check are creates and deletes and with deletes we had a lookup already, so it already looked up the child and revalidated the parent. For creates, a revalidation still might be needed, I guess and that was the main driver behind this check? And that only when you do current dir creates, because otherwise the parent would have been revalidated in lookup? Is this the major case why that check is actually there? Just trying to see how to approach this better without breaking the applications.