Received: by 2002:a25:6193:0:0:0:0:0 with SMTP id v141csp3947993ybb; Mon, 6 Apr 2020 20:08:47 -0700 (PDT) X-Google-Smtp-Source: APiQypIEMDDbQ68BIUrJ+mi1mjUl5lmYHkDPtzC4CTF9lvzuSuUKhtpapznGMV0Mrfz6U1WjQwMu X-Received: by 2002:a05:6830:23ba:: with SMTP id m26mr19960464ots.133.1586228927669; Mon, 06 Apr 2020 20:08:47 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1586228927; cv=none; d=google.com; s=arc-20160816; b=Zr438LSCpZj0qI0si9uCpDvHwJz2gjB6+36bdyLNZaj/tqtgjRh1ZKVvvF/x7zw1UU LpyqdhCBjI7/4wDw1oa7nRHmQX+Xl/jQH/r83bNqsXFgzGK4KHDc04isHJUGxB/aQmVg 1xqHSrEPt+vWoH/XHr8UcPTB2hllpMe4xsnazCiDfSa2k4RTG+LrJs66IT3/Pnb5hirp CO7CsOGyeIGuboOIkR0bhlrp8pyARDe1sdvTARZ4aGG0wUgt7ytc63Uncn7sK1BUMA9U 7Wzu51U7GU6jr9+nmGOaWqQo92U/Q29QjNs3WKWcAfm95fHn9AmWMDSFiw1ngt0LXbul 2uFQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=scpIcE1zjXUioR7I6W96QVDEcybrU2mhEcMC1nOdCWo=; b=R7cwhNW/8U2Mjjhj/WtArap8ZYWoZe1MPgCCQ6Cg/VuXsq2c0+3PR6SCQkpPC67IKx tN1fJcddneICRsbEf9KwdgkxMtykiMNMYq8/3yhmuP+XdCsyEXoFZxl5dwGsWkY+rcfQ oBZv7tiuYkSFQHRdaoc4i4VgBnuqQKw2pE3pmzSWxBMPZ7069zTV+iHUkYispbFSbmEm aS5FYsdyBzPqyx0ySxyermq6wHXH22Z+/vPNs2UPoQgMrov9klEjKAHDd7ZA5F1ZcVCm EsJJlJIl0n0zOh74CtA740oqLoDnRHmtS5FVW76oEP6LyqHW9+sIzRMnYHtHffIhbzgX jgMg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id w1si289522otj.207.2020.04.06.20.08.33; Mon, 06 Apr 2020 20:08:47 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726691AbgDGDGy (ORCPT + 99 others); Mon, 6 Apr 2020 23:06:54 -0400 Received: from zeniv.linux.org.uk ([195.92.253.2]:35264 "EHLO ZenIV.linux.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726303AbgDGDGx (ORCPT ); Mon, 6 Apr 2020 23:06:53 -0400 Received: from viro by ZenIV.linux.org.uk with local (Exim 4.92.3 #3 (Red Hat Linux)) id 1jLeXx-00Cj0A-0P; Tue, 07 Apr 2020 03:05:05 +0000 Date: Tue, 7 Apr 2020 04:05:04 +0100 From: Al Viro To: Levi Cc: davem@davemloft.net, kuba@kernel.org, gnault@redhat.com, nicolas.dichtel@6wind.com, edumazet@google.com, lirongqing@baidu.com, tglx@linutronix.de, johannes.berg@intel.com, dhowells@redhat.com, daniel@iogearbox.net, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, netdev@vger.kernel.org Subject: Re: [PATCH] netns: dangling pointer on netns bind mount point. Message-ID: <20200407030504.GX23230@ZenIV.linux.org.uk> References: <20200407023512.GA25005@ubuntu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200407023512.GA25005@ubuntu> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Apr 07, 2020 at 11:35:46AM +0900, Levi wrote: > When we try to bind mount on network namespace (ex) /proc/{pid}/ns/net, > inode's private data can have dangling pointer to net_namespace that was > already freed in below case. > > 1. Forking the process. > 2. [PARENT] Waiting the Child till the end. > 3. [CHILD] call unshare for creating new network namespace > 4. [CHILD] Bind mount with /proc/self/ns/net to some mount point. > 5. [CHILD] Exit child. > 6. [PARENT] Try to setns with binded mount point > > In step 5, net_namespace made by child process'll be freed, > But in bind mount point, it still held the pointer to net_namespace made > by child process. > In this situation, when parent try to call "setns" systemcall with the > bind mount point, parent process try to access to freed memory, That > makes memory corruption. > > This patch fix the above scenario by increaseing reference count. This can't be the right fix. > +#ifdef CONFIG_NET_NS > + if (!(flag & CL_COPY_MNT_NS_FILE) && is_net_ns_file(root)) { > + ns = get_proc_ns(d_inode(root)); > + if (ns == NULL || ns->ops->type != CLONE_NEWNET) { > + err = -EINVAL; > + > + goto out_free; > + } > + > + net = to_net_ns(ns); > + net = get_net(net); No. This is completely wrong. You have each struct mount pointing to that sucker to grab an extra reference on an object; you do *NOT* drop said reference when struct mount is destroyed. You are papering over a dangling pointer of some sort by introducing a trivially exploitable leak that happens to hit your scenario. Hell, do (your step 4 + umount your mountpoint) in a loop, then watch what happens to refcounts with that patch. This is bollocks; the reference is *NOT* in struct mount. At all. It's not even in struct dentry. What it's attached to is struct inode and it should be pinned as long as that inode is alive - it's dropped in nsfs_evict(). And if _that_ gets called while dentry is still pinned (as ->mnt_root of something), you have much worse problems. Could you post a reproducer, preferably one that would trigger an oops on mainline?