Received: by 10.192.165.156 with SMTP id m28csp926202imm; Thu, 19 Apr 2018 09:46:06 -0700 (PDT) X-Google-Smtp-Source: AIpwx48mRiVmo6gk1KfbvdpqZ8qKtiiZumi2SkyFY1OD0l9LMa8wtNNQzmYax4fGYgSkQAZTsWUh X-Received: by 2002:a17:902:71cc:: with SMTP id t12-v6mr6942942plm.247.1524156366740; Thu, 19 Apr 2018 09:46:06 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1524156366; cv=none; d=google.com; s=arc-20160816; b=k8Hvu/wCwW74LJf4HJW7XZgTYoryroN3oLudGx0yhAPd44HRYHkON99MGicb7X9Dd/ uTR40Vsjx0NAnVh+LtA/8Ra9Gq+LX12zVEu0rD1J0sw/fklhjQtEkddOjJYtEuReU2wG sndMOCLEsMK1M0qOzH1quhp8wgLJyeiH/yj7cSVSWvkVZERMJwePpyD3O916yfVHV3a5 I2TmYvIuWRPR7tP7L4AT0GPPisyKX2jjx+FJRSP/hepKC4o5/7Z77JzmcY/zkYnp1FE5 QEhstnwLaIecAZk9uK/rHcMuKr4cDddLLEmXy4uT7S6bEFsd6A8ewtN/oR6aOliMTp5T 2Bjw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=nmiw+vaAvM3qHZfSmlchcrS6oiOnIhmUlqyHLkrCtYg=; b=FpXLIjrgyV+s+5ijcdNLCYtkUDYDPcgbMuNK2yhGmLfn4uYgd9sYGTwkpUYGou1Bih pa8uHiR57X6wCttCy6rmXfRLn5JFNpeWmQeqSmvSvfXISURKSrJw438V9Gk3o1dChIKp syM+7AiDboxBtyMuJrF0EPrhZAR3lFp+p9RMvYIDAcOmuJ7PubOFSoAbMhkksebKWsJr GKfJSvWNt2FDtoIhrKJ/xkRsDAgDIidxm4IZtbkc84goAJvpEZ44WCJERaVyX0W9FO0z rX7yycpP2RMShf2JqcYfFubJZNZk4eQJhwjl92ghib/hO4JQSpd1oZD9JCpz74m4xqq3 snUA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id u7-v6si3651232plr.165.2018.04.19.09.45.51; Thu, 19 Apr 2018 09:46:06 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754030AbeDSQo3 (ORCPT + 99 others); Thu, 19 Apr 2018 12:44:29 -0400 Received: from zeniv.linux.org.uk ([195.92.253.2]:37174 "EHLO ZenIV.linux.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753622AbeDSQo2 (ORCPT ); Thu, 19 Apr 2018 12:44:28 -0400 Received: from viro by ZenIV.linux.org.uk with local (Exim 4.87 #1 (Red Hat Linux)) id 1f9CfY-00045H-Qy; Thu, 19 Apr 2018 16:44:24 +0000 Date: Thu, 19 Apr 2018 17:44:24 +0100 From: Al Viro To: Kirill Tkhai Cc: Alexander Aring , linux-kernel@vger.kernel.org, netdev@vger.kernel.org, Jamal Hadi Salim Subject: Re: [bisected] Stack overflow after fs: "switch the IO-triggering parts of umount to fs_pin" (was net namespaces kernel stack overflow) Message-ID: <20180419164424.GI30522@ZenIV.linux.org.uk> References: <20180419153447.GH30522@ZenIV.linux.org.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180419153447.GH30522@ZenIV.linux.org.uk> User-Agent: Mutt/1.9.1 (2017-09-22) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Apr 19, 2018 at 04:34:48PM +0100, Al Viro wrote: > IOW, we only get there if our vfsmount was an MNT_INTERNAL one. > So we have mnt->mnt_umount of some MNT_INTERNAL mount found in > ->mnt_pins of some other mount. Which, AFAICS, means that > it used to be mounted on that other mount. How the hell can > that happen? > > It looks like you somehow get a long chain of MNT_INTERNAL mounts > stacked on top of each other, which ought to be prevented by > mnt_flags &= ~MNT_INTERNAL_FLAGS; > in do_add_mount(). Nuts... Arrrrrgh... Nuts is right - clone_mnt() preserves the sodding MNT_INTERNAL, with obvious results. netns is related to the problem, by exposing MNT_INTERNAL mounts (in /proc/*/ns/*) for mount --bind to copy and attach to the tree. AFAICS, the minimal reproducer is touch /tmp/a unshare -m sh -c 'for i in `seq 10000`; do mount --bind /proc/1/ns/net /tmp/a; done' (and it can be anything in /proc/*/ns/*, really) I think the fix should be along the lines of the following: Don't leak MNT_INTERNAL away from internal mounts We want it only for the stuff created by SB_KERNMOUNT mounts, *not* for their copies. Cc: stable@kernel.org Signed-off-by: Al Viro --- diff --git a/fs/namespace.c b/fs/namespace.c --- a/fs/namespace.c +++ b/fs/namespace.c @@ -1089,7 +1089,8 @@ static struct mount *clone_mnt(struct mount *old, struct dentry *root, goto out_free; } - mnt->mnt.mnt_flags = old->mnt.mnt_flags & ~(MNT_WRITE_HOLD|MNT_MARKED); + mnt->mnt.mnt_flags = old->mnt.mnt_flags; + mnt->mnt.mnt_flags &= ~(MNT_WRITE_HOLD|MNT_MARKED|MNT_INTERNAL); /* Don't allow unprivileged users to change mount flags */ if (flag & CL_UNPRIVILEGED) { mnt->mnt.mnt_flags |= MNT_LOCK_ATIME;