Received: by 2002:a25:868d:0:0:0:0:0 with SMTP id z13csp419378ybk; Fri, 15 May 2020 04:19:51 -0700 (PDT) X-Google-Smtp-Source: ABdhPJy9ZfDqufF/4/BtxcL1zk/bmCERkZ3oGfQ4WQeMEkpuNs/8W5KAmdM9V7BivpVgKWcKjSiS X-Received: by 2002:a17:906:c838:: with SMTP id dd24mr2310046ejb.28.1589541591552; Fri, 15 May 2020 04:19:51 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1589541591; cv=none; d=google.com; s=arc-20160816; b=E5eDz1/B6hlKxSHW9dDuCsb5WWRgjKE/wNGNadDfjUVeOgddGpzuf3px5U/CFa75Bq KEYzq0C0SQO2MHdOLMcTM+6pOwAZh61skA0m/ADmSsfBm7fa0hyV8vxA0QEwZ8whO8cI 3BEFc0aW3+0rbVAv58hnnnPdg1Wt1rqHGJveSU/pEGTEYlXsdrTsnhF46Muvpb+q0qUQ v567QqfYWkyxHh0fJKpszOoqIuY0QdPzFWvAylp/OhdoVBX9CufelhxSCwrZFPs8S31f ZU1ILjdQhneS+1+AWhBflYiIA9mboIHhzP6jl/oxVkNMsq1WF8k/i5qkWc3+sMPMi0QW MRhA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=siE2LOwOElQSUR3WnlJmVayOSLe+Smsh4bIwDrofu2g=; b=1GS/qsDwhGsIGC+xM1ds+Bbc1Z6qYEjDlyv/aW9a1oQ9jLGvZy6R6ee3fpAA9ND5sP vLiAyutneQzFGmNDc3vVnl4ptMt3HQgnrol/qamT/JtKFOLN2DCZBsMQqEgS+bQEtqMK pE/j/OPJkn6F4LbMuJEbMHJLGktS8gJr2lBVF01REStCcuUmwLO1jsM0GAQigFAjjSEE XBza6Kb6DoWeoSdsOP4eNmjgoY0VFz8xr8LYJC6oiOw+ogVhRBnyQiLrrw2CC6OuaN2K HWmQHT8pTPQjUXQH7laPdNjB6grqBSs+vLFeAYp3Wd2TZlc2slcXn83zbseeohyvYtBO pzpg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id i13si1023343ejg.502.2020.05.15.04.19.27; Fri, 15 May 2020 04:19:51 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726188AbgEOLPy (ORCPT + 99 others); Fri, 15 May 2020 07:15:54 -0400 Received: from mx2.suse.de ([195.135.220.15]:38828 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726084AbgEOLPx (ORCPT ); Fri, 15 May 2020 07:15:53 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id BC8B5ABBD; Fri, 15 May 2020 11:15:52 +0000 (UTC) Received: from localhost (webern.olymp [local]) by webern.olymp (OpenSMTPD) with ESMTPA id 57d1585e; Fri, 15 May 2020 12:15:48 +0100 (WEST) Date: Fri, 15 May 2020 12:15:48 +0100 From: Luis Henriques To: Amir Goldstein Cc: Jeff Layton , Ilya Dryomov , ceph-devel@vger.kernel.org, linux-kernel , fstests , Dave Chinner , Christoph Hellwig , Miklos Szeredi Subject: Re: [PATCH] ceph: don't return -ESTALE if there's still an open file Message-ID: <20200515111548.GA54598@suse.com> References: <20200514111453.GA99187@suse.com> <8497fe9a11ac1837813ee5f14b6ebae8fa6bf707.camel@kernel.org> <20200514124845.GA12559@suse.com> <4e5bf0e3bf055e53a342b19d168f6cf441781973.camel@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, May 15, 2020 at 09:42:24AM +0300, Amir Goldstein wrote: > +CC: fstests > > On Thu, May 14, 2020 at 4:15 PM Jeff Layton wrote: > > > > On Thu, 2020-05-14 at 13:48 +0100, Luis Henriques wrote: > > > On Thu, May 14, 2020 at 08:10:09AM -0400, Jeff Layton wrote: > > > > On Thu, 2020-05-14 at 12:14 +0100, Luis Henriques wrote: > > > > > Similarly to commit 03f219041fdb ("ceph: check i_nlink while converting > > > > > a file handle to dentry"), this fixes another corner case with > > > > > name_to_handle_at/open_by_handle_at. The issue has been detected by > > > > > xfstest generic/467, when doing: > > > > > > > > > > - name_to_handle_at("/cephfs/myfile") > > > > > - open("/cephfs/myfile") > > > > > - unlink("/cephfs/myfile") > > > > > - open_by_handle_at() > > > > > > > > > > The call to open_by_handle_at should not fail because the file still > > > > > exists and we do have a valid handle to it. > > > > > > > > > > Signed-off-by: Luis Henriques > > > > > --- > > > > > fs/ceph/export.c | 13 +++++++++++-- > > > > > 1 file changed, 11 insertions(+), 2 deletions(-) > > > > > > > > > > diff --git a/fs/ceph/export.c b/fs/ceph/export.c > > > > > index 79dc06881e78..8556df9d94d0 100644 > > > > > --- a/fs/ceph/export.c > > > > > +++ b/fs/ceph/export.c > > > > > @@ -171,12 +171,21 @@ struct inode *ceph_lookup_inode(struct super_block *sb, u64 ino) > > > > > > > > > > static struct dentry *__fh_to_dentry(struct super_block *sb, u64 ino) > > > > > { > > > > > + struct ceph_inode_info *ci; > > > > > struct inode *inode = __lookup_inode(sb, ino); > > > > > + > > > > > if (IS_ERR(inode)) > > > > > return ERR_CAST(inode); > > > > > if (inode->i_nlink == 0) { > > > > > - iput(inode); > > > > > - return ERR_PTR(-ESTALE); > > > > > + bool is_open; > > > > > + ci = ceph_inode(inode); > > > > > + spin_lock(&ci->i_ceph_lock); > > > > > + is_open = __ceph_is_file_opened(ci); > > > > > + spin_unlock(&ci->i_ceph_lock); > > > > > + if (!is_open) { > > > > > + iput(inode); > > > > > + return ERR_PTR(-ESTALE); > > > > > + } > > > > > } > > > > > return d_obtain_alias(inode); > > > > > } > > > > > > > > Thanks Luis. Out of curiousity, is there any reason we shouldn't ignore > > > > the i_nlink value here? Does anything obviously break if we do? > > > > > > Yes, the scenario described in commit 03f219041fdb is still valid, which > > > is basically the same but without the extra open(2): > > > > > > - name_to_handle_at("/cephfs/myfile") > > > - unlink("/cephfs/myfile") > > > - open_by_handle_at() > > > > > > > Ok, I guess we end up doing some delayed cleanup, and that allows the > > inode to be found in that situation. > > > > > The open_by_handle_at man page isn't really clear about these 2 scenarios, > > > but generic/426 will fail if -ESTALE isn't returned. Want me to add a > > > comment to the code, describing these 2 scenarios? > > > > > > > (cc'ing Amir since he added this test) > > > > I don't think there is any hard requirement that open_by_handle_at > > should fail in that situation. It generally does for most filesystems > > due to the way they handle cl794798fa xfsqa: test open_by_handle() on unlinked and freed inode clusters > eaning up unlinked inodes, but I don't > > think it's technically illegal to allow the inode to still be found. If > > the caller cares about whether it has been unlinked it can always test > > i_nlink itself. > > > > Amir, is this required for some reason that I'm not aware of? > > Hi Jeff, > > The origin of this test is in fstests commit: > 794798fa xfsqa: test open_by_handle() on unlinked and freed inode clusters > > It was introduced to catch an xfs bug, so this behavior is the expectation > of xfs filesystem, but note that it is not a general expectation to fail > open_by_handle() after unlink(), it is an expectation to fail open_by_handle() > after unlink() + sync() + drop_caches. Yes, sorry I should have mentioned the sync+drop_caches in the description. > I have later converted the test to generic, because I needed to check the > same expectation for overlayfs use case, which is: > The original inode is always there (in lower layer), unlink creates a whiteout > mark and open_by_handle should treat that as ESTALE, otherwise the > unlinked files would be accessible to nfs clients forever. > > In overlayfs, we handle the open file case by returning a dentry only > in case the inode with deletion mark in question is already in inode cache, > but we take care not to populate inode cache with the check. > It is easier, because we do not need to get inode into cache for checking > the delete marker. > > Maybe you could instead check in __fh_to_dentry(): > > if (inode->i_nlink == 0 && atomic_read(&inode->i_count) == 1)) { > iput(inode); > return ERR_PTR(-ESTALE); > } > > The above is untested, so I don't know if it's enough to pass generic/426. Yes, I can confirm that this also fixes the issue -- both tests pass. __ceph_is_file_opened() uses some internal counters per inode, incremented each time a file is open in a specific mode. The problem is that these counters require some extra locking (maybe they should be atomic_t?), so you're suggestion is probably better. > Note that generic/467 also checks the same behavior for rmdir(). Yeah, but the only test-case failing with cephfs is the one described above (i.e. "open_by_handle -dkr ..."). > If you decide that ceph does not need to comply to this behavior, > then we probably need to whitelist/blocklist the filesystems that > want to test this behavior, which will be a shame. Unless Jeff has any objection, I'm happy sending v2, simplifying the patch to use your simpler solution (and mentioning sync+drop_caches in the commit message). Cheers, -- Luis