Received: by 2002:a25:4158:0:0:0:0:0 with SMTP id o85csp1577338yba; Tue, 2 Apr 2019 11:30:18 -0700 (PDT) X-Google-Smtp-Source: APXvYqz1cKm06FlIKrcx5nNWoRzfStTvoo/3dU0jX2w+A+nQltAp1mGU6v5V1OxtjX+L+YMCJ/yQ X-Received: by 2002:a17:902:f089:: with SMTP id go9mr31731361plb.309.1554229818671; Tue, 02 Apr 2019 11:30:18 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1554229818; cv=none; d=google.com; s=arc-20160816; b=cBOY6gZJbr2mbaGZNMNOgJb6cKXNzzmW6OyTlkc1dW6bBio6PDVVAoakAox6+NPRq8 xpZLBa65oYbTuWcRH2IbJUbShL3Mew3y7ada616okyhFRR8kZkfnSxGwhxbMZnxpzGoT W/ov1Pkzc/2Af70MUrZqR87A1UNA8WgQ1fJHhE4mz07L6vdEqhTAamz6Z1crsa8FBsjB MsjN3jwxMXAY4z2hmhWPtuPRiqqslZvKE04lpc5Hhw3of/OO3/xoMwjDdvpw4sqOteqi qWQBx9EjC3OfCSuSOz/+HAWmLhBZq0IwCn8XbUCrESfTWZYT+jLen7ZG1HDa/mu4ZKi2 PXvQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=8Ro6fE+VI8CAxLnmca5cIZn7EGo6t9Fpr1DW9wPJjkk=; b=tgteVubStfOwG6BcDBGDB+wOjY9nzdQR624HINF9F56ASLjJMSSwE9LSVF+yztTC07 qa4ikwW/WSz8mngcZ6a/STaYncz5ipwXEjRk5KS+gC2S1dAulMYXILgaTIj0i0F8jg+m Jts19QSpA+PknXjR/UOBlVccYWeR0ydUPPF5r2I0BwBH6VVPw3wsNM370e/Jnk9Ls6DL lWY66qUqCU9WX5np0iikLwOOvZMaLeuZJbz3dVARel1jiey/bsnihEgpOVfV+xTSSu3r pjcnx1JQvPw+Rni6xYvlQhGtPF7iY3YOXc2CFBSZEcBxUkL1USaomLIXBzQTkF2g/Q9C vIKA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id z70si11811249pgd.86.2019.04.02.11.30.03; Tue, 02 Apr 2019 11:30:18 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731547AbfDBRyK (ORCPT + 99 others); Tue, 2 Apr 2019 13:54:10 -0400 Received: from zeniv.linux.org.uk ([195.92.253.2]:46054 "EHLO ZenIV.linux.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730220AbfDBRyJ (ORCPT ); Tue, 2 Apr 2019 13:54:09 -0400 Received: from viro by ZenIV.linux.org.uk with local (Exim 4.92 #3 (Red Hat Linux)) id 1hBNbl-0004hR-OS; Tue, 02 Apr 2019 17:54:01 +0000 Date: Tue, 2 Apr 2019 18:54:01 +0100 From: Al Viro To: Jonathan Corbet Cc: "Tobin C. Harding" , Mauro Carvalho Chehab , Neil Brown , Randy Dunlap , linux-doc@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Ian Kent Subject: Re: [PATCH v3 00/24] Convert vfs.txt to vfs.rst Message-ID: <20190402175401.GL2217@ZenIV.linux.org.uk> References: <20190327051717.23225-1-tobin@kernel.org> <20190402094934.5b242dc0@lwn.net> <20190402164824.GK2217@ZenIV.linux.org.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190402164824.GK2217@ZenIV.linux.org.uk> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Apr 02, 2019 at 05:48:24PM +0100, Al Viro wrote: > ->d_prune() must not drop/regain any of the locks held by caller. > It must _not_ free anything attached to dentry - that belongs > later in the shutdown sequence. If anything, I'm tempted to > make it take const struct dentry * as argument, just to make > that clear. > > No new (counted) references can be acquired by that point; > lockless dcache lookup might find our dentry a match, but > result of such lookup is not going to be legitimized - it's > doomed to be thrown out as stale. > > It really makes more sense as part of struct dentry lifecycle > description... > > [1] in theory, ->d_time might be changed by overlapping lockless > call of ->d_revalidate(). Up to filesystem - VFS doesn't touch > that field (and AFAICS only NFS uses it these days). One addition: ->d_prune() can overlap only with * lockless ->d_hash()/->d_compare() * lockless ->d_revalidate() * lockless ->d_manage() So it must not destroy anything used by those without an RCU delay. The same goes for ->d_release() - both the list of the things it can overlap with and requirements re RCU delays. In-tree ->d_prune() instances are fine and so's the majority of ->d_release(). However, autofs ->d_release() has something that looks like an RCU use-after-free waiting to happen: static void autofs_dentry_release(struct dentry *de) { struct autofs_info *ino = autofs_dentry_ino(de); struct autofs_sb_info *sbi = autofs_sbi(de->d_sb); pr_debug("releasing %p\n", de); if (!ino) return; ... autofs_free_ino(ino); } with autofs_free_ino() being straight kfree(). Which means that the lockless case of autofs_d_manage() can run into autofs_dentry_ino(dentry) getting freed right under it. And there we do have this reachable: int autofs_expire_wait(const struct path *path, int rcu_walk) { struct dentry *dentry = path->dentry; struct autofs_sb_info *sbi = autofs_sbi(dentry->d_sb); struct autofs_info *ino = autofs_dentry_ino(dentry); int status; int state; /* Block on any pending expire */ if (!(ino->flags & AUTOFS_INF_WANT_EXPIRE)) return 0; if (rcu_walk) return -ECHILD; the second check buggers off in lockless mode; the first one can be done in lockless mode just fine, so AFAICS we do have a problem there. Smells like we ought to make that kfree in autofs_free_ino() RCU-delayed... Ian, have you, by any chance, run into reports like that? Use-after-free or oopsen in autofs_expire_wait() and friends, that is... AFAICS, everything else is safe; however, looking through those has turned up a fishy spot in ceph_d_revalidate(): } else if (d_really_is_positive(dentry) && ceph_snap(d_inode(dentry)) == CEPH_SNAPDIR) { valid = 1; Again, lockless ->d_revalidate() is called without anything that would hold ->d_inode stable; the first part of the condition does not guarantee that we won't run into ceph_snap(NULL) and oops. Sure, compiler is almost certainly not going to reload here, but we ought to use d_inode_rcu(dentry) there. Sigh...