Received: by 2002:a05:6a10:af89:0:0:0:0 with SMTP id iu9csp557743pxb; Thu, 20 Jan 2022 20:18:54 -0800 (PST) X-Google-Smtp-Source: ABdhPJyv9hmCG0swqlgjAsHTBdqXDbQoO4wqblF69rJIfIuZL0Noalbef4wF12C5p5Cbq9Ct7ZsG X-Received: by 2002:a63:451b:: with SMTP id s27mr1681962pga.39.1642738733845; Thu, 20 Jan 2022 20:18:53 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1642738733; cv=none; d=google.com; s=arc-20160816; b=EOrVugB+XgOFdjGV4X8yfUUIqzt6H6iknP/5bjZMFtnc18z2N315gOUeJeP5s+rJhL ElbrCmuMkKCHhnW3qrQ/M6kfKZJRPaMWSmqIf6bxGJIlMOnqi5OhF8cTZ0j/U60uMhyb GxJCXQHxQgc2DZq7OMk548dsoY6P9Rd8wJiYgML5TXJYDr8vWHnT/1sORBmjjkmzJ+Bz D3QCt5NbI4rRMMGjL7RqYRGS/h//SxP0I2dyGt2E+QRF+WmXGpQ8kxZpct/miImC5LbE P3hF/U8i4q5CsnJxFLnkxJnA9WEdBYQ5seLW6LcDO3G9emOcOqw4Iz37hniyPrMWRCh2 COXQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=eUehlIGYjbqyukigrQOPwzYrtPbBtBt59jPaRfknhSo=; b=IO8ZJ2thY0L6FR7sOUZvz+4nNqOo353y2+62oV8T96DYhrk3Ir0TeqYiA9XeKLrKMU 3rL25CAuTELRlvPGDWlOmPJAnjOvC/OKlGQTHZrPnffz6C7QaLOdCb9JjFxwT4HKfUTC kdJZcpVyEigYKob/IrKcaVvv/e6fWRktLs3QAzE46kXh3wKCpn2r0gcyxWr7zKKwVb9k DbqkWj6suOP7V8yr+UMS0rKBqZ2dewJrg0HE0cDMmctN7RD4jBASgF6S49pv/tka7xuC Ymfhk7XHAswqqqYsKNvCaLIsgW7yUjMkUyQaWzEkPv1p8pzWdMZ1G4ArvERUh0PRfhaC YrDA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id p1si11403415pjn.18.2022.01.20.20.18.41; Thu, 20 Jan 2022 20:18:53 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1348682AbiARTUv (ORCPT + 99 others); Tue, 18 Jan 2022 14:20:51 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40270 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1348346AbiARTUq (ORCPT ); Tue, 18 Jan 2022 14:20:46 -0500 Received: from zeniv-ca.linux.org.uk (zeniv-ca.linux.org.uk [IPv6:2607:5300:60:148a::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7ACDAC061574; Tue, 18 Jan 2022 11:20:46 -0800 (PST) Received: from viro by zeniv-ca.linux.org.uk with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1n9u1z-002tBd-OQ; Tue, 18 Jan 2022 19:20:35 +0000 Date: Tue, 18 Jan 2022 19:20:35 +0000 From: Al Viro To: Brian Foster Cc: Ian Kent , "Darrick J. Wong" , Christoph Hellwig , Miklos Szeredi , David Howells , Kernel Mailing List , linux-fsdevel , xfs , Linus Torvalds Subject: Re: [PATCH] vfs: check dentry is still valid in get_link() Message-ID: References: <164180589176.86426.501271559065590169.stgit@mickey.themaw.net> <275358741c4ee64b5e4e008d514876ed4ec1071c.camel@themaw.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: Al Viro Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jan 18, 2022 at 01:25:15PM -0500, Brian Foster wrote: > If I go back to the inactive -> reclaimable grace period variant and > also insert a start_poll_synchronize_rcu() and > poll_state_synchronize_rcu() pair across the inactive processing > sequence, I start seeing numbers closer to ~36k cycles. IOW, the > xfs_inodegc_inactivate() helper looks something like this: > > if (poll_state_synchronize_rcu(ip->i_destroy_gp)) > xfs_inodegc_set_reclaimable(ip); > else > call_rcu(&VFS_I(ip)->i_rcu, xfs_inodegc_set_reclaimable_callback); > > ... to skip the rcu grace period if one had already passed while the > inode sat on the inactivation queue and was processed. > > However my box went haywire shortly after with rcu stall reports or > something if I let that continue to run longer, so I'll probably have to > look into that lockdep splat (complaining about &pag->pag_ici_lock in > rcu context, perhaps needs to become irq safe?) or see if something else > is busted.. Umm... Could you (or Dave) describe where does the mainline do the RCU delay mentioned several times in these threads, in case of * unlink() * overwriting rename() * open() + unlink() + close() (that one, obviously, for regular files) The thing is, if we already do have an RCU delay in there, there might be a better solution - making sure it happens downstream of d_drop() (in case when dentry has other references) or dentry going negative (in case we are holding the sole reference to it). If we can do that, RCU'd dcache lookups won't run into inode reuse at all. Sure, right now d_delete() comes last *and* we want the target inode to stay around past the call of ->unlink(). But if you look at do_unlinkat() you'll see an interesting-looking wart with ihold/iput around vfs_unlink(). Not sure if you remember the story on that one; basically, it's "we'd rather have possible IO on inode freeing to happen outside of the lock on parent". nfsd and mqueue do the same thing; ksmbd does not. OTOH, ksmbd appears to force the "make victim go unhashed, sole reference or not". ecryptfs definitely does that forcing (deliberately so). That area could use some rethinking, and if we can deal with the inode reuse delay while we are at it... Overwriting rename() is also interesting in that respect, of course. I can go and try to RTFS my way through xfs iget-related code, but I'd obviously prefer to do so with at least some overview of that thing from the folks familiar with it. Seeing that it's a lockless search structure, missing something subtle there is all too easy, especially with the lookup-by-fhandle stuff in the mix...