Received: by 2002:a25:8b91:0:0:0:0:0 with SMTP id j17csp1820539ybl; Thu, 30 Jan 2020 06:36:55 -0800 (PST) X-Google-Smtp-Source: APXvYqz5uiD2DDRPB706GE8av1DVzk1uEp+7MlZ6NYgxPZs3ORfgtFHN0zRj/td5FJgABEe60tWP X-Received: by 2002:a9d:7f16:: with SMTP id j22mr3732753otq.256.1580395015737; Thu, 30 Jan 2020 06:36:55 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1580395015; cv=none; d=google.com; s=arc-20160816; b=RWvuEdztvE+gCvzbwz3B7t6cv+moi1jhl69+nubnScDqvUhkDIayeRRJtJ5wWSzhj2 rdZAnzlPJw6wsrnq2EVpfFCQZEuxjUE4V7Fx0NphkPNSKMyzJdqgE7a2L9NVOnKVXMbc ZHvFyMGgIgbdu26Y/fSy0Pog0wgrgfqcykS2RHeA6t5tTOUJy+X5KtEzrrL86n3vBy8d GS9HzlJUuJAd6WdVfGQw0iPytYUpVFZPdSPNVv52movTSMPOr67sw2hI+9+ASgxSfl6h Vq2V/KiNz/x6MsPFDnMnI2PG1YTIEJjNqdZP+inl+9i48us1FnYPxK5F2gykiQ2J44hz 9ZDA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=8iqHh35LrZlNjPkx168eZG0erPApFil8kVHZ52YW4tk=; b=TWh4pwOQRaAVlzI3AQQbQkeJivmDdHdaXA9VNQmwdOkNoMmOxhfXYL7ekWB0YDqPWv w6qFw4ylMzElzfNu8uFmPJIHnj12eEd4z7DlqSApZnn0CQp8v9ozp575iig6I40LU5Hz fcEwKWNp89K/6BWEJw9Q0dRQVN29YEycHLrbZbe9lMOPH1Nm3y4Oi/zcW0XDkGdXUbOh xTVeHWG556jq0kFqklG3d1hmvX0UadGi1FZFwHvCMApiWdPqnJjP6EEreN6J2VRS/KxS a8MmkEv9uh7vfWSnjAXC8KEkFv6TJB3+GKPxFPqLotYrb2MCReSwf+j8JIed6o0Cm09B Jduw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 128si2809469oih.78.2020.01.30.06.36.42; Thu, 30 Jan 2020 06:36:55 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727341AbgA3OeU (ORCPT + 99 others); Thu, 30 Jan 2020 09:34:20 -0500 Received: from youngberry.canonical.com ([91.189.89.112]:60349 "EHLO youngberry.canonical.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726980AbgA3OeT (ORCPT ); Thu, 30 Jan 2020 09:34:19 -0500 Received: from [109.134.33.162] (helo=wittgenstein) by youngberry.canonical.com with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1ixAtc-0001Hg-Pm; Thu, 30 Jan 2020 14:34:16 +0000 Date: Thu, 30 Jan 2020 15:34:16 +0100 From: Christian Brauner To: Al Viro Cc: linux-fsdevel@vger.kernel.org, Linus Torvalds , linux-kernel@vger.kernel.org, Aleksa Sarai , David Howells , Eric Biederman Subject: Re: [PATCH 02/17] fix automount/automount race properly Message-ID: <20200130143416.zr2od5u42fpdl7n3@wittgenstein> References: <20200119031423.GV8904@ZenIV.linux.org.uk> <20200119031738.2681033-1-viro@ZenIV.linux.org.uk> <20200119031738.2681033-2-viro@ZenIV.linux.org.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20200119031738.2681033-2-viro@ZenIV.linux.org.uk> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, Jan 19, 2020 at 03:17:14AM +0000, Al Viro wrote: > From: Al Viro > > Protection against automount/automount races (two threads hitting the same > referral point at the same time) is based upon do_add_mount() prevention of > identical overmounts - trying to overmount the root of mounted tree with > the same tree fails with -EBUSY. It's unreliable (the other thread might've > mounted something on top of the automount it has triggered) *and* causes > no end of headache for follow_automount() and its caller, since > finish_automount() behaves like do_new_mount() - if the mountpoint to be is > overmounted, it mounts on top what's overmounting it. It's not only wrong > (we want to go into what's overmounting the automount point and quietly > discard what we planned to mount there), it introduces the possibility of > original parent mount getting dropped. That's what 8aef18845266 (VFS: Fix > vfsmount overput on simultaneous automount) deals with, but it can't do > anything about the reliability of conflict detection - if something had > been overmounted the other thread's automount (e.g. that other thread > having stepped into automount in mount(2)), we don't get that -EBUSY and > the result is > referral point under automounted NFS under explicit overmount > under another copy of automounted NFS > > What we need is finish_automount() *NOT* digging into overmounts - if it > finds one, it should just quietly discard the thing it was asked to mount. > And don't bother with actually crossing into the results of finish_automount() - > the same loop that calls follow_automount() will do that just fine on the > next iteration. > > IOW, instead of calling lock_mount() have finish_automount() do it manually, > _without_ the "move into overmount and retry" part. And leave crossing into > the results to the caller of follow_automount(), which simplifies it a lot. > > Moral: if you end up with a lot of glue working around the calling conventions > of something, perhaps these calling conventions are simply wrong... > > Fixes: 8aef18845266 (VFS: Fix vfsmount overput on simultaneous automount) > Signed-off-by: Al Viro I mean, just reading this is awefully complicated but the code seems fine. Acked-by: Christian Brauner > --- > fs/namei.c | 29 ++++------------------------- > fs/namespace.c | 41 ++++++++++++++++++++++++++++++++++------- > 2 files changed, 38 insertions(+), 32 deletions(-) > > diff --git a/fs/namei.c b/fs/namei.c > index d2720dc71d0e..bd036dfdb0d9 100644 > --- a/fs/namei.c > +++ b/fs/namei.c > @@ -1133,11 +1133,9 @@ EXPORT_SYMBOL(follow_up); > * - return -EISDIR to tell follow_managed() to stop and return the path we > * were called with. > */ > -static int follow_automount(struct path *path, struct nameidata *nd, > - bool *need_mntput) > +static int follow_automount(struct path *path, struct nameidata *nd) > { > struct vfsmount *mnt; > - int err; > > if (!path->dentry->d_op || !path->dentry->d_op->d_automount) > return -EREMOTE; > @@ -1178,29 +1176,10 @@ static int follow_automount(struct path *path, struct nameidata *nd, > return PTR_ERR(mnt); > } > > - if (!mnt) /* mount collision */ > - return 0; > - > - if (!*need_mntput) { > - /* lock_mount() may release path->mnt on error */ > - mntget(path->mnt); > - *need_mntput = true; > - } > - err = finish_automount(mnt, path); > - > - switch (err) { > - case -EBUSY: > - /* Someone else made a mount here whilst we were busy */ > + if (!mnt) > return 0; > - case 0: > - path_put(path); > - path->mnt = mnt; > - path->dentry = dget(mnt->mnt_root); > - return 0; > - default: > - return err; > - } > > + return finish_automount(mnt, path); > } > > /* > @@ -1258,7 +1237,7 @@ static int follow_managed(struct path *path, struct nameidata *nd) > > /* Handle an automount point */ > if (flags & DCACHE_NEED_AUTOMOUNT) { > - ret = follow_automount(path, nd, &need_mntput); > + ret = follow_automount(path, nd); > if (ret < 0) > break; > continue; > diff --git a/fs/namespace.c b/fs/namespace.c > index 5f0a80f17651..f1817eb5f87d 100644 > --- a/fs/namespace.c > +++ b/fs/namespace.c > @@ -2823,6 +2823,7 @@ static int do_new_mount(struct path *path, const char *fstype, int sb_flags, > > int finish_automount(struct vfsmount *m, struct path *path) > { > + struct dentry *dentry = path->dentry; > struct mount *mnt = real_mount(m); > struct mountpoint *mp; > int err; > @@ -2832,21 +2833,47 @@ int finish_automount(struct vfsmount *m, struct path *path) > BUG_ON(mnt_get_count(mnt) < 2); > > if (m->mnt_sb == path->mnt->mnt_sb && > - m->mnt_root == path->dentry) { > + m->mnt_root == dentry) { > err = -ELOOP; > - goto fail; > + goto discard; > } > > - mp = lock_mount(path); > + /* > + * we don't want to use lock_mount() - in this case finding something > + * that overmounts our mountpoint to be means "quitely drop what we've > + * got", not "try to mount it on top". > + */ > + inode_lock(dentry->d_inode); > + if (unlikely(cant_mount(dentry))) { > + err = -ENOENT; > + goto discard1; > + } > + namespace_lock(); > + rcu_read_lock(); > + if (unlikely(__lookup_mnt(path->mnt, dentry))) { That means someone has already performed that mount in the meantime, I take it. > + rcu_read_unlock(); > + err = 0; > + goto discard2; > + } > + rcu_read_unlock(); > + mp = get_mountpoint(dentry); > if (IS_ERR(mp)) { > err = PTR_ERR(mp); > - goto fail; > + goto discard2; > } > + > err = do_add_mount(mnt, mp, path, path->mnt->mnt_flags | MNT_SHRINKABLE); > unlock_mount(mp); > - if (!err) > - return 0; > -fail: > + if (unlikely(err)) > + goto discard; > + mntput(m); Probably being dense here but better safe than sorry: this mntput() corresponds to the get_mountpoint() above, right?