Received: by 2002:a05:6a10:af89:0:0:0:0 with SMTP id iu9csp4759497pxb; Wed, 19 Jan 2022 04:38:13 -0800 (PST) X-Google-Smtp-Source: ABdhPJw9PULXcIa/kfvu3flPfMnr1QR9EhoCifS8nEPgB90NRQ0u2HlQj4xYkYLQP+kGWL/BOrMt X-Received: by 2002:a62:384b:0:b0:4bd:ea83:95c5 with SMTP id f72-20020a62384b000000b004bdea8395c5mr30245255pfa.75.1642595893022; Wed, 19 Jan 2022 04:38:13 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1642595893; cv=none; d=google.com; s=arc-20160816; b=FK3CgHYdG24eEp1CKl8zh4/gPsVgvbwvC3sVxVlM4AfeTJ+uq/irD+I+t40iQEJZhO jUp1YpBZlBkECEqU1GZG7fzdCNidZjjW1PmxFvq33VEj8w1F6aH3lZqjYX87Dtc5Cega INKFtqV/7mv78Os0zzaGVU7cvlMdhOsA7jxx6g5FXMMhMUQRF4kqgUuuuGGH2GeTNukG yJNXVUA+foEmEfdgw2Ncoslc0UGFmGKxnlkvJ4ji9xTrq/8QNu2wycvPuxBxEV3mT8EE LEKO6I//qHC/JIG5w43AdlVLoFCuoPcwOILQLL3WFGJvy7p+W/MNEx4YQgGsEHkapZsk jVWw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=ntQh5t5i0pILlH7lmBcwU8gH1vabl9EwI4MNkBZ4Wl0=; b=sRgYOJxf7qhiAdlVj2W5Rf+fWUoLFrz58j944vqkN27mV7Q2UWhHWoUygDZQ1eFKUC Q1TZ1UapCyWh63oA59EO4PBfSSpUbyF96P9/VzurkWuc3ycNarA0n3BrAwOMcSXQUL8p vVXqkvMuT4FbLMRWDVUY51iOJPgmG7GLHC4Ju/9WvHiJKQSbIeRvLZC6IwKv3uc/ldSN 4mI0Cfpsg7NKFuw3/9xOvikvN9UTS/uM3hLxNtR1RAuSU8GTEDr5qxut4crvn9RqScxa m4IxGT1lv85pQDjlpZMV5bkisHSRYOIaTmINupB7N73aNjlsplfRxbJTQk5AFuo/sY1T VwiA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id e21si19428356pgh.406.2022.01.19.04.38.00; Wed, 19 Jan 2022 04:38:13 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1349552AbiARDR3 (ORCPT + 99 others); Mon, 17 Jan 2022 22:17:29 -0500 Received: from mail104.syd.optusnet.com.au ([211.29.132.246]:46717 "EHLO mail104.syd.optusnet.com.au" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1352933AbiARDAt (ORCPT ); Mon, 17 Jan 2022 22:00:49 -0500 Received: from dread.disaster.area (pa49-179-45-11.pa.nsw.optusnet.com.au [49.179.45.11]) by mail104.syd.optusnet.com.au (Postfix) with ESMTPS id 80E6562BFEC; Tue, 18 Jan 2022 14:00:43 +1100 (AEDT) Received: from dave by dread.disaster.area with local (Exim 4.92.3) (envelope-from ) id 1n9ejh-00180W-6I; Tue, 18 Jan 2022 14:00:41 +1100 Date: Tue, 18 Jan 2022 14:00:41 +1100 From: Dave Chinner To: Al Viro Cc: Brian Foster , Ian Kent , "Darrick J. Wong" , Christoph Hellwig , Miklos Szeredi , David Howells , Kernel Mailing List , linux-fsdevel , xfs Subject: Re: [PATCH] vfs: check dentry is still valid in get_link() Message-ID: <20220118030041.GB59729@dread.disaster.area> References: <164180589176.86426.501271559065590169.stgit@mickey.themaw.net> <275358741c4ee64b5e4e008d514876ed4ec1071c.camel@themaw.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.4 cv=deDjYVbe c=1 sm=1 tr=0 ts=61e62d5f a=Eslsx4mF8WGvnV49LKizaA==:117 a=Eslsx4mF8WGvnV49LKizaA==:17 a=kj9zAlcOel0A:10 a=DghFqjY3_ZEA:10 a=7-415B0cAAAA:8 a=ZbRbMrvnt3dbDbI5YVEA:9 a=CjuIK1q_8ugA:10 a=biEYGPWJfzWAr4FL6Ov7:22 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Jan 17, 2022 at 04:28:52PM +0000, Al Viro wrote: > On Mon, Jan 17, 2022 at 09:35:58AM -0500, Brian Foster wrote: > > > To Al's question, at the end of the day there is no rcu delay involved > > with inode reuse in XFS. We do use call_rcu() for eventual freeing of > > inodes (see __xfs_inode_free()), but inode reuse occurs for inodes that > > have been put into a "reclaim" state before getting to the point of > > freeing the struct inode memory. This lead to the long discussion [1] > > Ian references around ways to potentially deal with that. I think the > > TLDR of that thread is there are various potential options for > > improvement, such as to rcu wait on inode creation/reuse (either > > explicitly or via more open coded grace period cookie tracking), to rcu > > wait somewhere in the destroy sequence before inodes become reuse > > candidates, etc., but none of them seemingly agreeable for varying > > reasons (IIRC mostly stemming from either performance or compexity) [2]. > > > > The change that has been made so far in XFS is to turn rcuwalk for > > symlinks off once again, which looks like landed in Linus' tree as > > commit 7b7820b83f23 ("xfs: don't expose internal symlink metadata > > buffers to the vfs"). The hope is that between that patch and this > > prospective vfs tweak, we can have a couple incremental fixes that at > > least address the practical problem users have been running into (which > > is a crash due to a NULL ->get_link() callback pointer due to inode > > reuse). The inode reuse vs. rcu thing might still be a broader problem, > > but AFAIA that mechanism has been in place in XFS on Linux pretty much > > forever. > > My problem with that is that pathname resolution very much relies upon > the assumption that any inode it observes will *not* change its nature > until the final rcu_read_unlock(). Papering over ->i_op->get_link reads > in symlink case might be sufficient at the moment (I'm still not certain > about that, though), but that's rather brittle. E.g. if some XFS change > down the road adds ->permission() on some inodes, you'll get the same > problem in do_inode_permission(). We also have places where we rely upon > sample ->d_seq > fetch ->d_flags > fetch ->d_inode > validate ->d_seq > ... > assume that inode type matches the information in flags > > How painful would it be to make xfs_destroy_inode() a ->free_inode() instance? > IOW, how far is xfs_inode_mark_reclaimable() from being callable in RCU > callback context? AIUI, not very close at all, I'm pretty sure we can't put it under RCU callback context at all because xfs_fs_destroy_inode() can take sleeping locks, perform transactions, do IO, run rcu_read_lock() critical sections, etc. This means that needs to run an a full task context and so can't run from RCU callback context at all. > IOW, ->free_inode() is RCU-delayed part of ->destroy_inode(). If both > are present, ->destroy_inode() will be called synchronously, followed > by ->free_inode() from RCU callback, so you can have both - moving just > the "finally mark for reuse" part into ->free_inode() would be OK. > Any blocking stuff (if any) can be left in ->destroy_inode()... Yup, that's pretty much what we already do, except that we run the RCU-delayed part of freeing the inode once XFS has finished with the inode internally and the background inode GC reclaims it. Cheers, Dave. -- Dave Chinner david@fromorbit.com