Received: by 2002:a4a:311b:0:0:0:0:0 with SMTP id k27-v6csp4805624ooa; Tue, 14 Aug 2018 10:48:01 -0700 (PDT) X-Google-Smtp-Source: AA+uWPxKALeP8m9zqjqx9Y/QoS9opsO6293HWpmfxwuhCtlZX25yhxUyvuHP5LIt9lZoOzH1gsqD X-Received: by 2002:a17:902:aa07:: with SMTP id be7-v6mr20974986plb.109.1534268881275; Tue, 14 Aug 2018 10:48:01 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1534268881; cv=none; d=google.com; s=arc-20160816; b=ziWH9Wy6z4AiFGWyCT+zUtS5VKEzgKWL4JIdRtlAWv6i1F9yBzHkPrqc08TP0iYAk2 QRYMPZZCwTtJRdZuXXcK6YHzVF/RWzOUZZLmFA1W9nH/Jy+y+rUQDf3vDTM8xSKCzHH7 6pv58vZLI+rgRywIxUicAAHk+P2xwe6kBGAX3Nbxm9tgcNgs+vf+LB38Anf2BQnQhk0e 2YPDOD0DkwdLd+eL+c7wo9fhFCatRhmFPUl0NZOoOTud/4MCBKPKm7YhB9u1bPDbzA7M 5XczpYDmfyF9V+65MzJouDXTjiekYv7SnVyrR9OH0dNRllAp9AWyi8P4LK9jb4vhop5f 8h9Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :in-reply-to:message-id:date:subject:cc:to:from :arc-authentication-results; bh=IBAwFLTIvpTnnENoJRKHYGFCPPMZ5ToQWXt5cgRVch0=; b=W4OS/jXnmE6CXTUrGzn9P2j6IuKU5z7DRv8VhrZT+3ex/G1q2GY8w7G3h4aRtUdXrd N6Zn20u84ZoL5gyU2TldZf9muPKk77qwke/ZgSycMl96/9gCLLWcFT+JRpb3HHoMcsv4 SRqMM1FcWF4iPeKgY0WBuXAwDHp2Aa1C7KQ4yH7e/QlMDLM4WJPvpM+feUsdNha4JFyT dLAPkjVQDY+DUxjE8AL6FyuFJKkrP6g+PAKNk8jR1hztOQU772RxdgUWIYFVSELGx9X4 PoSF55fLaXvAlBHn1UCYvvn50O88v+hDzrRRWEW85tlYSFjVtxHx5yQr99Dy52hdBp/Z QR7g== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id w14-v6si17335383pgv.462.2018.08.14.10.47.45; Tue, 14 Aug 2018 10:48:01 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2390991AbeHNUeN (ORCPT + 99 others); Tue, 14 Aug 2018 16:34:13 -0400 Received: from mail.linuxfoundation.org ([140.211.169.12]:60316 "EHLO mail.linuxfoundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2390473AbeHNUeM (ORCPT ); Tue, 14 Aug 2018 16:34:12 -0400 Received: from localhost (unknown [194.244.16.108]) by mail.linuxfoundation.org (Postfix) with ESMTPSA id F3427D08; Tue, 14 Aug 2018 17:45:59 +0000 (UTC) From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Jann Horn , Al Viro Subject: [PATCH 4.4 11/43] fix mntput/mntput race Date: Tue, 14 Aug 2018 19:17:47 +0200 Message-Id: <20180814171517.835654211@linuxfoundation.org> X-Mailer: git-send-email 2.18.0 In-Reply-To: <20180814171517.014285600@linuxfoundation.org> References: <20180814171517.014285600@linuxfoundation.org> User-Agent: quilt/0.65 X-stable: review MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 4.4-stable review patch. If anyone has any objections, please let me know. ------------------ From: Al Viro commit 9ea0a46ca2c318fcc449c1e6b62a7230a17888f1 upstream. mntput_no_expire() does the calculation of total refcount under mount_lock; unfortunately, the decrement (as well as all increments) are done outside of it, leading to false positives in the "are we dropping the last reference" test. Consider the following situation: * mnt is a lazy-umounted mount, kept alive by two opened files. One of those files gets closed. Total refcount of mnt is 2. On CPU 42 mntput(mnt) (called from __fput()) drops one reference, decrementing component * After it has looked at component #0, the process on CPU 0 does mntget(), incrementing component #0, gets preempted and gets to run again - on CPU 69. There it does mntput(), which drops the reference (component #69) and proceeds to spin on mount_lock. * On CPU 42 our first mntput() finishes counting. It observes the decrement of component #69, but not the increment of component #0. As the result, the total it gets is not 1 as it should've been - it's 0. At which point we decide that vfsmount needs to be killed and proceed to free it and shut the filesystem down. However, there's still another opened file on that filesystem, with reference to (now freed) vfsmount, etc. and we are screwed. It's not a wide race, but it can be reproduced with artificial slowdown of the mnt_get_count() loop, and it should be easier to hit on SMP KVM setups. Fix consists of moving the refcount decrement under mount_lock; the tricky part is that we want (and can) keep the fast case (i.e. mount that still has non-NULL ->mnt_ns) entirely out of mount_lock. All places that zero mnt->mnt_ns are dropping some reference to mnt and they call synchronize_rcu() before that mntput(). IOW, if mntput() observes (under rcu_read_lock()) a non-NULL ->mnt_ns, it is guaranteed that there is another reference yet to be dropped. Reported-by: Jann Horn Tested-by: Jann Horn Fixes: 48a066e72d97 ("RCU'd vsfmounts") Cc: stable@vger.kernel.org Signed-off-by: Al Viro Signed-off-by: Greg Kroah-Hartman --- fs/namespace.c | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-) --- a/fs/namespace.c +++ b/fs/namespace.c @@ -1124,12 +1124,22 @@ static DECLARE_DELAYED_WORK(delayed_mntp static void mntput_no_expire(struct mount *mnt) { rcu_read_lock(); - mnt_add_count(mnt, -1); - if (likely(mnt->mnt_ns)) { /* shouldn't be the last one */ + if (likely(READ_ONCE(mnt->mnt_ns))) { + /* + * Since we don't do lock_mount_hash() here, + * ->mnt_ns can change under us. However, if it's + * non-NULL, then there's a reference that won't + * be dropped until after an RCU delay done after + * turning ->mnt_ns NULL. So if we observe it + * non-NULL under rcu_read_lock(), the reference + * we are dropping is not the final one. + */ + mnt_add_count(mnt, -1); rcu_read_unlock(); return; } lock_mount_hash(); + mnt_add_count(mnt, -1); if (mnt_get_count(mnt)) { rcu_read_unlock(); unlock_mount_hash();