Received: by 10.213.65.68 with SMTP id h4csp276851imn; Mon, 12 Mar 2018 13:30:29 -0700 (PDT) X-Google-Smtp-Source: AG47ELsJPnmDuxL3bNV6uqupGaqVpF8uISpXTEuXOC+ywT1oIj0bhhh3X9uFDhz7kk0M+BjjgwY5 X-Received: by 10.98.89.85 with SMTP id n82mr9084442pfb.233.1520886629315; Mon, 12 Mar 2018 13:30:29 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1520886629; cv=none; d=google.com; s=arc-20160816; b=wWTVse7Uh9SFKCvlyApNFwCaeuvtruMI8ipKWobKq6SUKFVA3a2LfJYjmhXukRbWv9 1+Nc2fNQ/zYDTv0b2WORTm9Di+P6IPCv61rL5cRckT/gljYE7lBrdgezx2gwjsUslnAH DBEHlPxgnNBY5uvJJNVzO9EgB+aXNJWZIKWZDLv4pIfCJTtDHbBXX2qT9w6/VCpuKWS4 al4fvfO3AP5mcDGMDZh+HNLXEDkNXqAw8m27JJtnSD0vu1jEOirg2Yl3ilIXExo82gVw lwO/f9DfHQRRi1uyKP7XgoWEGV101GHLwbYEDib0+Vqz/3yxqTm7D7As/PKMricXFa+t myvw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:subject:mime-version:user-agent :message-id:in-reply-to:date:references:cc:to:from :arc-authentication-results; bh=3G/QKyAL+37VFXD3EakVmCbO04L9edwPxQcy9tYf0kI=; b=F+r56BdyfYSifGMCB3LnHshYxsQyaueoHBlb1IasGZzdBx+VvbWxY0BvTz7G0BRVpV wDKP//UMS3eukQBPTPTxBmNDfv52wHE9WkunXzo2sPdP/CTUBMraQfPyy9ssX54ng2s0 sVLXgRShV5ANGiI+jevIW+Lud8eTGVIZ7wLvdI4DgsfL76TG5yjtehSEGm/PKKvlAVF1 1LjjMw3iMI2/LUw/t9+yyjXkJecnE1DIj+v9SXL4LZh+uOhuT5dj9i+9/QO3i8YPWcIo Ldj2nd4G7Sul0Pxri6Jt4LWlP8fOtLuN7kp9D4hWX29SqOLkdpL6a1EqiBICOU0ji8aR HKmA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id x70si5528655pgd.724.2018.03.12.13.30.13; Mon, 12 Mar 2018 13:30:29 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932294AbeCLU3T (ORCPT + 99 others); Mon, 12 Mar 2018 16:29:19 -0400 Received: from out01.mta.xmission.com ([166.70.13.231]:59134 "EHLO out01.mta.xmission.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932171AbeCLU3S (ORCPT ); Mon, 12 Mar 2018 16:29:18 -0400 Received: from in01.mta.xmission.com ([166.70.13.51]) by out01.mta.xmission.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.87) (envelope-from ) id 1evU4L-0005w5-3y; Mon, 12 Mar 2018 14:29:17 -0600 Received: from 174-19-85-160.omah.qwest.net ([174.19.85.160] helo=x220.xmission.com) by in01.mta.xmission.com with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.87) (envelope-from ) id 1evU4J-0001k2-Mx; Mon, 12 Mar 2018 14:29:16 -0600 From: ebiederm@xmission.com (Eric W. Biederman) To: Christian Brauner Cc: Christian Brauner , viro@zeniv.linux.org.uk, linux-kernel@vger.kernel.org, torvalds@linux-foundation.org References: <20180312171330.32054-1-christian.brauner@ubuntu.com> <20180312171330.32054-3-christian.brauner@ubuntu.com> <87a7vdf4ve.fsf@xmission.com> <20180312201022.GA16689@gmail.com> Date: Mon, 12 Mar 2018 15:28:30 -0500 In-Reply-To: <20180312201022.GA16689@gmail.com> (Christian Brauner's message of "Mon, 12 Mar 2018 21:10:23 +0100") Message-ID: <87y3ixavip.fsf@xmission.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-XM-SPF: eid=1evU4J-0001k2-Mx;;;mid=<87y3ixavip.fsf@xmission.com>;;;hst=in01.mta.xmission.com;;;ip=174.19.85.160;;;frm=ebiederm@xmission.com;;;spf=neutral X-XM-AID: U2FsdGVkX19jBjbdZUO/3lSBQqYShN5qVHHG/+JXZ2E= X-SA-Exim-Connect-IP: 174.19.85.160 X-SA-Exim-Mail-From: ebiederm@xmission.com X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on sa06.xmission.com X-Spam-Level: * X-Spam-Status: No, score=1.5 required=8.0 tests=ALL_TRUSTED,BAYES_50, DCC_CHECK_NEGATIVE,TVD_RCVD_IP,T_TM2_M_HEADER_IN_MSG,T_TooManySym_01, T_XMDrugObfuBody_14,XMNoVowels autolearn=disabled version=3.4.1 X-Spam-Report: * -1.0 ALL_TRUSTED Passed through trusted hosts only via SMTP * 1.5 XMNoVowels Alpha-numberic number with no vowels * 0.0 TVD_RCVD_IP Message was received from an IP address * 0.0 T_TM2_M_HEADER_IN_MSG BODY: No description available. * 0.8 BAYES_50 BODY: Bayes spam probability is 40 to 60% * [score: 0.5000] * -0.0 DCC_CHECK_NEGATIVE Not listed in DCC * [sa06 1397; Body=1 Fuz1=1 Fuz2=1] * 0.0 T_TooManySym_01 4+ unique symbols in subject * 0.2 T_XMDrugObfuBody_14 obfuscated drug references X-Spam-DCC: XMission; sa06 1397; Body=1 Fuz1=1 Fuz2=1 X-Spam-Combo: *;Christian Brauner X-Spam-Relay-Country: X-Spam-Timing: total 545 ms - load_scoreonly_sql: 0.03 (0.0%), signal_user_changed: 2.9 (0.5%), b_tie_ro: 1.97 (0.4%), parse: 1.23 (0.2%), extract_message_metadata: 24 (4.5%), get_uri_detail_list: 7 (1.3%), tests_pri_-1000: 12 (2.2%), tests_pri_-950: 1.27 (0.2%), tests_pri_-900: 0.97 (0.2%), tests_pri_-400: 40 (7.3%), check_bayes: 39 (7.1%), b_tokenize: 16 (3.0%), b_tok_get_all: 12 (2.2%), b_comp_prob: 4.1 (0.8%), b_tok_touch_all: 4.2 (0.8%), b_finish: 0.65 (0.1%), tests_pri_0: 455 (83.5%), check_dkim_signature: 0.75 (0.1%), check_dkim_adsp: 2.8 (0.5%), tests_pri_500: 4.1 (0.8%), rewrite_mail: 0.00 (0.0%) Subject: Re: [PATCH 2/3 v3] devpts: resolve devpts bind-mounts X-Spam-Flag: No X-SA-Exim-Version: 4.2.1 (built Thu, 05 May 2016 13:38:54 -0600) X-SA-Exim-Scanned: Yes (on in01.mta.xmission.com) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Christian Brauner writes: > On Mon, Mar 12, 2018 at 02:52:53PM -0500, Eric W. Biederman wrote: >> Christian Brauner writes: >> >> > Most libcs will still look at /dev/ptmx when opening the master fd of a pty >> > device. When /dev/ptmx is a bind-mount of /dev/pts/ptmx and the TIOCGPTPEER >> > ioctl() is used to safely retrieve a file descriptor for the slave side of >> > the pty based on the master fd, the /proc/self/fd/{0,1,2} symlinks will >> > point to /. >> > >> > When the kernel tries to look up the root mount of the dentry for the slave >> > file descriptor it will detect that the dentry is escaping its bind-mount >> > since the root mount of the dentry is /dev/pts where the devpts is mounted >> > but the root mount of /dev/ptmx is /dev. >> > >> > Having bind-mounts of /dev/pts/ptmx to /dev/ptmx not working correctly is a >> > regression. In addition, it is also a fairly common scenario in containers >> > employing user namespaces. >> > >> > To handle bind-mounts of /dev/pts/ptmx to /dev/ptmx correctly we need to >> > walk up the bind-mounts for /dev/ptmx in devpts_mntget(). Since the >> > contents of /proc//fd/ symlinks attached to the slave side of a >> > file descriptor will always point to a path under the devpts mount we need >> > to try and ensure that the kernel doesn't falsely get the impression that a >> > pty slave file descriptor retrieved via TIOCGPTPEER based on a pty master >> > file descriptor opened via a bind-mount of the ptmx device escapes its >> > bind-mount. To clarify in pseudo code: >> > >> > * bind-mount /dev/pts/ptmx to /dev/ptmx >> > * master = open("/dev/ptmx", O_RDWR | O_NOCTTY | O_CLOEXEC); >> > * slave = ioctl(master, TIOCGPTPEER, O_RDWR | O_NOCTTY | O_CLOEXEC); >> > >> > would cause the kernel to think that slave is escaping its bind-mount. The >> > reason is that while the devpts mounted at /dev/pts has devtmpfs mounted at >> > /dev as its parent mount: >> > >> > 21 -- -- / /dev >> > -- 21 -- / /dev/pts >> > >> > they are on different devices >> > >> > -- -- 0:6 / /dev >> > -- -- 0:20 / /dev/pts >> > >> > which has the consequence that the pathname of the directory which forms >> > the root of the /dev/pts mount is /. So if we bind-mount /dev/pts/ptmx to >> > /dev/ptmx we will end up on the same device as the devtmpfs mount at >> > /dev/pts >> > >> > -- -- 0:20 /ptmx /dev/ptmx >> > >> > Without the bind-mount resolution patch here the kernel will now perform >> > the bind-mount escape check directly on /dev/ptmx. When it hits >> > devpts_ptmx_path() calls pts_path() which in turn calls >> > path_parent_directory(). While one would expect that >> > path_parent_directory() *should* yield /dev it will yield / since >> > /dev and /dev/pts are on different devices. This will cause path_pts() to >> > fail finding a "pts" directory since there is none under /. Thus, the >> > kernel detects that /dev/ptmx is escaping its bind-mount and will set >> > /proc//fd/ to /. >> > >> > This patch changes the logic to do bind-mount resolution and after the >> > bind-mount has been resolved (i.e. we have traced it back to the devpts >> > mount) we can safely perform devpts_ptmx_path() and check whether we find a >> > "pts" directory in the parent directory of the devpts mount. Since >> > path_parent_directory() will now correctly yield /dev as parent directory >> > for the devpts mount at /dev/pts. >> > >> > However, we can only perform devpts_ptmx_path() devpts_mntget() if we >> > either did resolve a bind-mount or did not find a suitable devpts >> > filesystem. The reason is that we want and need to support non-standard >> > mountpoints for the devpts filesystem. If we call devpts_ptmx_path() >> > although we did already find a devpts filesystem and did not resolve >> > bind-mounts we will fail on devpts mounts such as: >> > >> > mount -t devpts devpts /mnt >> > >> > where no "pts" directory will be under /. So change the logic to account >> > for this. >> > >> > Here's a little reproducer that presupposes a libc that uses TIOCGPTPEER in >> > its openpty() implementation: >> > >> > unshare --mount >> > mount --bind /dev/pts/ptmx /dev/ptmx >> > chmod 666 /dev/ptmx >> > script >> > ls -al /proc/self/fd/0 >> > >> > with output: >> > >> > lrwx------ 1 chb chb 64 Mar 7 16:41 /proc/self/fd/0 -> / >> > >> > Signed-off-by: Christian Brauner >> > Suggested-by: Eric Biederman >> > Suggested-by: Linus Torvalds >> > --- >> > ChangeLog v2->v3: >> > * rework logic to account for non-standard devpts mounts such as >> > mount -t devpts devpts /mnt >> > ChangeLog v1->v2: >> > * move removal of if (path->mnt->mnt_sb->s_magic == DEVPTS_SUPER_MAGIC) >> > condition to separate patch with non-functional changes >> > ChangeLog v0->v1: >> > * remove >> > /* Has the devpts filesystem already been found? */ >> > if (path->mnt->mnt_sb->s_magic == DEVPTS_SUPER_MAGIC) >> > return 0 >> > from devpts_ptmx_path() >> > * check superblock after devpts_ptmx_path() returned >> > --- >> > fs/devpts/inode.c | 24 ++++++++++++++++-------- >> > 1 file changed, 16 insertions(+), 8 deletions(-) >> > >> > diff --git a/fs/devpts/inode.c b/fs/devpts/inode.c >> > index d3ddbb888874..b31362c6c548 100644 >> > --- a/fs/devpts/inode.c >> > +++ b/fs/devpts/inode.c >> > @@ -154,27 +154,35 @@ static int devpts_ptmx_path(struct path *path) >> >> There is a question I asked in my original version and I haven't >> seen it answered. >> >> What do we want to do in the case where TIOCGPTPEER is called and >> the ptmx file descriptor is not from a mount that has pty's under it. >> >> a) We can continue the existing problematic behavior and return >> a pty fd whose proc path is '/' >> >> b) We can return an error which changes the observable behavior. >> >> My comments below are presuming we change the kernel to error. >> >> From my experience introducing the path following version of /dev/ptmx >> no one in practice has a /dev/ptmx dentry without an accompoanying >> /dev/pts/ptmx. Therefore I do not expect changing the behavior to >> error will introduce a regression in userspace. >> >> So let's just change the behavior of devpts_mntget error if a mount with >> the pty underneath it can not be found. > > I'm sort of torn but I think I'd prefer changing the behavior too. The > /dev/ptmx, /dev/pts/ptmx schisma is really a special case and I would > like it to be the only one. I really don't want to support bind-mounting > the ptmx character device under the devpts mounts across the system. > That seems like unnecessary complexity that is also unused. So if Linus > is fine with this change as well I don't mind sending a v4. > > There's only one thing here I'd like to throw into the ring. This error > condition only works with TIOCGPTPEER, right? If userspace does a > path-based retrieveal of the pty file descriptor by doing e.g. a > readlink on the slave file descriptor and calls open() on it they would > still hit the /proc//fd/ -> / problem? It doesn't hit the same > codepath, does it? Or am I missing someting? The error is preventing TIOCGPTPEER from returning problem file descriptors to userspace. It is uniquely the source of such file descriptors so correct the behavior when we can and returning an error when we can't should prevent their existence. Eric >> > struct vfsmount *devpts_mntget(struct file *filp, struct pts_fs_info *fsi) >> > { >> > + bool unwind; >> > struct path path; >> > + int err = 0; >> >> The only use I see for unwind is to ensure we don't change path >> when we would want to return the mount from filp->f_path even it >> is not connected to it's mount. > > Yes. > >> >> > >> > path = filp->f_path; >> > path_get(&path); >> >> > - /* Has the devpts filesystem already been found? */ >> > - if (path.mnt->mnt_sb->s_magic != DEVPTS_SUPER_MAGIC) { >> > - int err; >> > + unwind = (DEVPTS_SB(path.mnt->mnt_sb) == fsi) && >> > + (path.mnt->mnt_root == fsi->ptmx_dentry); >> > + /* Walk upward while the start point is a bind mount of >> > + * a single file. >> > + */ >> > + while (path.mnt->mnt_root == path.dentry && unwind) >> > + if (follow_up(&path) == 0) >> > + break; >> This look can become simply: >> while (path.mnt->mnt_root == path.dentry) >> if (follow_up(&path) == 0) >> break; >> >> > + if ((path.mnt->mnt_sb->s_magic != DEVPTS_SUPER_MAGIC) || unwind) >> This test can become simply: >> >> if ((path.mnt->mnt_sb->s_magic != DEVPTS_SUPER_MAGIC) || >> (DEVPTS_SB(path->mnt.mnt_sb) != fsi)) >> > err = devpts_ptmx_path(&path); >> > - dput(path.dentry); >> > - if (err) { >> > - mntput(path.mnt); >> > - return ERR_PTR(err); >> > - } >> > + dput(path.dentry); >> > + if (err) { >> > + mntput(path.mnt); >> > + return ERR_PTR(err); >> > } >> > >> > if (DEVPTS_SB(path.mnt->mnt_sb) != fsi) { >> > mntput(path.mnt); >> > return ERR_PTR(-ENODEV); >> > } >> > + >> > return path.mnt; >> > } >> >> Eric > > Christian