Received: by 2002:a05:6a10:af89:0:0:0:0 with SMTP id iu9csp682370pxb; Fri, 21 Jan 2022 00:20:34 -0800 (PST) X-Google-Smtp-Source: ABdhPJyJsoIj9EgYkddM1L6aBW73bXBYci/Ofq1oU6LAc6Kcvp5HnaFbji1ufDeGere8H8v3ulgh X-Received: by 2002:a17:90a:f184:: with SMTP id bv4mr12525699pjb.13.1642753234484; Fri, 21 Jan 2022 00:20:34 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1642753234; cv=none; d=google.com; s=arc-20160816; b=kmqLfd45qkvOrZ5AZ685MWzUD0WMiRQRzPyFiVJ9yUcsS5PxA2oKjt8QnAPrm/x58E BTJciwnd+Q4UMQ3pZ0/Zof/WwLxLh4i4gT+MNC8DDutNbjog+Hl+OzYh5ouIavr1REA/ V307Zf7vSDWO/71uYOHcTn2ojVNOsczN4DxfPqAoLaxvBvuPy9ZgLdkwfIJXqIx4iMHM cfPgFMx2EXar1mxcKC8jGN9oRWNHfHAavX1nm7oYHIkvvCE2hHpwkq+zY6fzK4wGfjR7 FPKCyp0J6XmpApPVBh/7as21JCBSTCdt57ThF3/ScVTtbgRiZpA/G1H/N9cedwl95UNo 9/3A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=b9WqescKjIAzil7l6cr9HCVfXjlyRssVLY+HORpld6I=; b=MIjNKlQ6Z0EMOHJ3dDyUpUZ1/WGWTXcDYWmpM6q0VwKIyLJZsb1oTTj1aJBgnYxrjl bmbGJ0qTYV06Pu806TqNFocQMSKUwylZiGbA+j9FjXqFPANbMdmP7QHJdwSDOjBZop4p /M8WfQwI24cxNFDiKAXOTwxlwqf6TSBrBxjWh+kRJ9svAvO+OQw5j1wU8v8rbQWAQIk1 cNLKg5p36gZ2j3r7Q5qEB3jTfoyl5Uk6KArjtn68qe1wZUm9ww9Wo3pvfWLlHkUAcqrB b5sKH6+oIlLjXaIBp2knNFfFK/bBvgUhQcoM5GRkmosYdZjn5WrzJ09QWT6+KRenhqs5 8TUw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=PwBnb0Eg; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id nn12si6169699pjb.65.2022.01.21.00.20.22; Fri, 21 Jan 2022 00:20:34 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=PwBnb0Eg; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1349435AbiARU6L (ORCPT + 99 others); Tue, 18 Jan 2022 15:58:11 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:49536 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1349276AbiARU6J (ORCPT ); Tue, 18 Jan 2022 15:58:09 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1642539489; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=b9WqescKjIAzil7l6cr9HCVfXjlyRssVLY+HORpld6I=; b=PwBnb0Egc7pTUBTaL2QM9J7a1MnoZd9pL1LL8FwGUs0+Pcq/Rq70Nklro/A9Xgb/mkxsHz XEEwgvzjC0k01D+INW1le0D4oreCsh/LsQCoN/jh6HVYZag3+BOGHAcgPA7AmfaR1GQFNW Kh2ewv3iMss/QxPy1IJ8Qg15fIJuVD8= Received: from mail-qv1-f71.google.com (mail-qv1-f71.google.com [209.85.219.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-270-9gCoO90WMWmLdkxF8K4O0A-1; Tue, 18 Jan 2022 15:58:07 -0500 X-MC-Unique: 9gCoO90WMWmLdkxF8K4O0A-1 Received: by mail-qv1-f71.google.com with SMTP id a10-20020a056214062a00b0041f0242eb11so522543qvx.4 for ; Tue, 18 Jan 2022 12:58:07 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=b9WqescKjIAzil7l6cr9HCVfXjlyRssVLY+HORpld6I=; b=HoSY2Ge0j978YDI+JWAhUZ70fsUJ+CafkE+Yc14hpF3BzqUw0Z+TTSOGcKW1oXxH2U PL5MtF560ffshU9cyYT1DtiMevLZt6UCLbtDGLIpcUCoc5jnsoSbpooQYnYpLLNqzILz KrEE0ozc5Iit6R02OKvIAtbSE3QqUtDPRxryFWxTqGVDnPOTOpRdgE2dvA4WeBOttkXS cEhJJsJU4/b6QS5s8wvRyvtLwQDZAbn1g1/j42HJVPDqodkgwyZ2lRqC/3Bdoj7GWXaa nT+bbabtqRtvE8Ks1mt/jUiaCb7P/8LUpMyBzbn+y7Z/3V3QAnukhmDTF51ffLt3RMMr Rj4A== X-Gm-Message-State: AOAM533bS3YO9hHApeVF8scfXcnGVTKVU4+M5VBNt2Jtrc3EnN2RSwI7 is3KgT0rIjA5XIiodt6q0KAvxo2Nj+WAAekdfoVrhwOu9IaX/J+V8abmLOQjTAYTvWO2PuXv96Z Jg09mbirLnjUmCnxGIHfI2Rna X-Received: by 2002:a05:620a:2448:: with SMTP id h8mr4958217qkn.231.1642539487281; Tue, 18 Jan 2022 12:58:07 -0800 (PST) X-Received: by 2002:a05:620a:2448:: with SMTP id h8mr4958206qkn.231.1642539487045; Tue, 18 Jan 2022 12:58:07 -0800 (PST) Received: from bfoster (c-24-61-119-116.hsd1.ma.comcast.net. [24.61.119.116]) by smtp.gmail.com with ESMTPSA id az15sm5128276qkb.124.2022.01.18.12.58.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 18 Jan 2022 12:58:06 -0800 (PST) Date: Tue, 18 Jan 2022 15:58:04 -0500 From: Brian Foster To: Al Viro Cc: Ian Kent , "Darrick J. Wong" , Christoph Hellwig , Miklos Szeredi , David Howells , Kernel Mailing List , linux-fsdevel , xfs , Linus Torvalds Subject: Re: [PATCH] vfs: check dentry is still valid in get_link() Message-ID: References: <275358741c4ee64b5e4e008d514876ed4ec1071c.camel@themaw.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jan 18, 2022 at 07:20:35PM +0000, Al Viro wrote: > On Tue, Jan 18, 2022 at 01:25:15PM -0500, Brian Foster wrote: > > > If I go back to the inactive -> reclaimable grace period variant and > > also insert a start_poll_synchronize_rcu() and > > poll_state_synchronize_rcu() pair across the inactive processing > > sequence, I start seeing numbers closer to ~36k cycles. IOW, the > > xfs_inodegc_inactivate() helper looks something like this: > > > > if (poll_state_synchronize_rcu(ip->i_destroy_gp)) > > xfs_inodegc_set_reclaimable(ip); > > else > > call_rcu(&VFS_I(ip)->i_rcu, xfs_inodegc_set_reclaimable_callback); > > > > ... to skip the rcu grace period if one had already passed while the > > inode sat on the inactivation queue and was processed. > > > > However my box went haywire shortly after with rcu stall reports or > > something if I let that continue to run longer, so I'll probably have to > > look into that lockdep splat (complaining about &pag->pag_ici_lock in > > rcu context, perhaps needs to become irq safe?) or see if something else > > is busted.. > > Umm... Could you (or Dave) describe where does the mainline do the > RCU delay mentioned several times in these threads, in case of > * unlink() > * overwriting rename() > * open() + unlink() + close() (that one, obviously, for regular files) > If you're referring to the existing rcu delay in XFS, I suspect that refers to xfs_reclaim_inode(). The last bits of that function remove the inode from the perag tree and then calls __xfs_inode_free(), which frees the inode via rcu callback. BTW, I think the experiment above is either going to require an audit to make the various _set_reclaimable() locks irq safe or something a bit more ugly like putting the inode back on a workqueue after the rcu delay to make the state transition. Given the incremental improvement from using start_poll_synchronize_rcu(), I replaced the above destroy side code with a cond_synchronize_rcu(ip->i_destroy_gp) call on the iget/recycle side and see similar results (~36k cycles per 60s, but so far without any explosions). Brian > The thing is, if we already do have an RCU delay in there, there might be > a better solution - making sure it happens downstream of d_drop() (in case > when dentry has other references) or dentry going negative (in case we are > holding the sole reference to it). > > If we can do that, RCU'd dcache lookups won't run into inode reuse at all. > Sure, right now d_delete() comes last *and* we want the target inode to stay > around past the call of ->unlink(). But if you look at do_unlinkat() you'll > see an interesting-looking wart with ihold/iput around vfs_unlink(). Not > sure if you remember the story on that one; basically, it's "we'd rather > have possible IO on inode freeing to happen outside of the lock on parent". > > nfsd and mqueue do the same thing; ksmbd does not. OTOH, ksmbd appears to > force the "make victim go unhashed, sole reference or not". ecryptfs > definitely does that forcing (deliberately so). > > That area could use some rethinking, and if we can deal with the inode reuse > delay while we are at it... > > Overwriting rename() is also interesting in that respect, of course. > > I can go and try to RTFS my way through xfs iget-related code, but I'd > obviously prefer to do so with at least some overview of that thing > from the folks familiar with it. Seeing that it's a lockless search > structure, missing something subtle there is all too easy, especially > with the lookup-by-fhandle stuff in the mix... >