Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756799AbcDEAOJ (ORCPT ); Mon, 4 Apr 2016 20:14:09 -0400 Received: from out02.mta.xmission.com ([166.70.13.232]:59503 "EHLO out02.mta.xmission.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752175AbcDEAOH (ORCPT ); Mon, 4 Apr 2016 20:14:07 -0400 From: ebiederm@xmission.com (Eric W. Biederman) To: Linus Torvalds Cc: "H. Peter Anvin" , Peter Hurley , Greg KH , Jiri Slaby , Aurelien Jarno , Andy Lutomirski , Florian Weimer , Al Viro , Serge Hallyn , Jann Horn , "security\@kernel.org" , , security@debian.org, Willy Tarreau , Linux Kernel Mailing List References: <43AD2BA7-B594-4299-95F3-D86FD38AF21B@zytor.com> <87egexpf4o.fsf@x220.int.ebiederm.org> <1CB621EF-1647-463B-A144-D611DB150E15@zytor.com> <20151208223135.GA8352@kroah.com> <87oae0h2bo.fsf@x220.int.ebiederm.org> <56677DE3.5040705@zytor.com> <20151209012311.GA11794@kroah.com> <84B136DF-55E4-476A-9CB2-062B15677EE5@zytor.com> <20151209013859.GA12442@kroah.com> <20151209083225.GA30452@1wt.eu> <87wpskyds7.fsf_-_@x220.int.ebiederm.org> <566F1CD7.20502@hurleysoftware.com> <87y4cq6t1j.fsf@x220.int.ebiederm.org> <87a8p54v3t.fsf@x220.int.ebiederm.org> <5AFEA8FF-BDEA-40AF-8C45-19F3E9CC36D3@zytor.com> <87r3ih1mnp.fsf@x220.int.ebiederm.org> <87zix3xxvl.fsf@x220.int.ebiederm.org> Date: Mon, 04 Apr 2016 19:03:30 -0500 In-Reply-To: (Linus Torvalds's message of "Mon, 21 Dec 2015 14:23:04 -0800") Message-ID: <878u0s3orx.fsf_-_@x220.int.ebiederm.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.5 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-XM-AID: U2FsdGVkX18wyqOL/88zXcyrUGAV91TfHURHcvr73qY= X-SA-Exim-Connect-IP: 67.3.249.252 X-SA-Exim-Mail-From: ebiederm@xmission.com X-Spam-Report: * -1.0 ALL_TRUSTED Passed through trusted hosts only via SMTP * 0.0 TVD_RCVD_IP Message was received from an IP address * 1.5 XMNoVowels Alpha-numberic number with no vowels * 0.8 BAYES_50 BODY: Bayes spam probability is 40 to 60% * [score: 0.5000] * -0.0 DCC_CHECK_NEGATIVE Not listed in DCC * [sa05 1397; Body=1 Fuz1=1 Fuz2=1] X-Spam-DCC: XMission; sa05 1397; Body=1 Fuz1=1 Fuz2=1 X-Spam-Combo: *;Linus Torvalds X-Spam-Relay-Country: X-Spam-Timing: total 1037 ms - load_scoreonly_sql: 0.06 (0.0%), signal_user_changed: 4.7 (0.4%), b_tie_ro: 3.3 (0.3%), parse: 1.60 (0.2%), extract_message_metadata: 11 (1.0%), get_uri_detail_list: 8 (0.8%), tests_pri_-1000: 4.1 (0.4%), tests_pri_-950: 1.15 (0.1%), tests_pri_-900: 0.94 (0.1%), tests_pri_-400: 44 (4.3%), check_bayes: 43 (4.1%), b_tokenize: 14 (1.3%), b_tok_get_all: 15 (1.4%), b_comp_prob: 7 (0.6%), b_tok_touch_all: 4.5 (0.4%), b_finish: 0.97 (0.1%), tests_pri_0: 955 (92.1%), check_dkim_signature: 0.79 (0.1%), check_dkim_adsp: 3.4 (0.3%), tests_pri_500: 4.6 (0.4%), rewrite_mail: 0.00 (0.0%) Subject: [PATCH 00/13] devpts: New instances for every mount X-Spam-Flag: No X-SA-Exim-Version: 4.2.1 (built Wed, 24 Sep 2014 11:00:52 -0600) X-SA-Exim-Scanned: Yes (on in01.mta.xmission.com) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 7646 Lines: 152 To recap the situation for those who have not been following closely. There are programs such as xen-create-image that run as root and setup a chroot environment with: "mknod dev/ptmx c 5 2" "mkdir dev/pts" "mount -t devpts none dev/pts" Which mostly works but stomps the mount options of the system /dev/pts. In particular the options of "gid=5,mode=620" are lost resulting in a situation where creating a new pty by opening /dev/ptmx results in that pty having the wrong permissions. Some distributions have been working around this problem by continuing to install a setuid root pt_chown binary that will be called by glibc to fix the permissions. This solution isn't too scary as long as there is only one instance of devpts but add a second instance of devpts and it becomes possible to trick the setuid root pt_chown binary into operating on the wrong files and directories. The following patchset attempts to dig use out of this mess by carefully chaning devpts in a way that does not induce any userspace regressions. My expectation is that userspace developers will love us as it makes their problems go away, and kernel developers will not be happy with the changes because what is required to preserve backwards compatibility is not what anyone would have designed with a clean sheet implementation. To dig our selves out of this mess it has been generally agreed that it makes sense for each mount of devpts to result in a separate instance of devpts. That works for ensuring that a new mount of devpts does not stomp the permissions of another mount of devpts but then programs such as xen-create-image break as they expect opening of /dev/ptmx to create new ptys. The problem of /dev/ptmx needing to create ptys on different instances of devpts can be resolved by performing a path based look up from the "/dev/ptmx" node to find the devpts filesystem. How we get each mount of devpts to result in a distinct instance of devpts matters. The kernel treats the system instance of devpts specially and to maintain backwards compatibility that needs to be preserved. As there must be a system instance of devpts the code continues in it's technique of mounting the system instance of devpts internally and then exporting it userspace when devpts is mounted under the right circumstances. There is one pattern in userspace where an intial ramdisk mounts devpts then unmounts devpts and the usual system startup scripts then mount devpts on /dev/pts. I believe Centos5 and Openwrt use this pattern. There is another pattern of userspace where devpts is mounted in the initial ramdisk and then "mount --move" is used to move it onto the final /dev/pts. However devpts remains listed in /etc/fstab and later in the boot a "mount -a" honors that listing starts mounting devpts on top of itself. Fails but only after succeeding in changing the mount options. I believe this is the pattenr that Centos6 uses. Then there is the question of which permissions checks should apply when /dev/ptmx is opened. The way it works in v4.6-rc1 is that a pty on the system instance of devtps can be created either by opening /dev/ptmx or /dev/pts/ptmx (which happens to reside on the system instance. The only restriction are the normal unix permission of opening those files. A pty on a non-system instance of devpts must be created by opening the ptmx file on the non-systme devpts instance. The default permisions for /dev/ptmx are 0666 and for /dev/pts/ptmx 0000. Today for non-system instances of devpts the permission on the ptmx file are always changed to something else (typically 0666), and typically on the system instance the permission of ptmx on devpts are ignoreed and left at 0000. Which leaves the question of what to do about cases such as xen-create-image where the new /dev/ptmx path based lookup is exercised to find the mount of devpts. In that case out of an abundance of caution I require the code to verify that the opener also has permission to open the ptmx file on the non-system instance of devpts. To make that work I have changed the default permissions on the ptmx file for all non-system instance of devpts to 0666. Where the permission check comes in as useful is on any system where a non-system instance of devpts and has permissions on it's ptmx file that are not 0666. If that was not done a simple bind mount of the /dev/ptmx device node would allow overriding the policy of who is allowed to create ptys in an instance of devpts. As you can see above my guiding principle this round has been be very careful to keep existing userspace working. I have tested this code on as wide a range of distributions as I could to look for intesting behavior. Those I managed to get setup and running in a vm for testing are: on openwrt-15.05, centos5, centos6, centos7, debian-6.0.2, debian-7.9, debian-8.2, ubuntu-14.04.3, ubuntu-15.10, fedora23, magia-5, mint-17.3, opensuse-42.1, slackware-14.1, gentoo-20151225 (13.0?), archlinux-2015-12-01. As I could not find an image of Android that was easy to get running in a VM, so I audited the code to see what Android does. Unlike reports earlier in this conversation Android does not use a shell script. Android has a daemon that listens on netlink for device events, consults it's policy data base and creates the device node in a tmpfs instance mounted on /dev according to policy (assuming the policy allows that device node). Furthermore at the start of that daemon devpts is mounted exactly once. Which thankfully means Android poses no special problems for this patchset. I have also run xen-create-image on debian 8.2 (where it was easily installed with apt-get) and confirmed that without these changes it stomps the mount options of devpts and with these changes it only uses atypical mount options on a separate instance of devpts. The first change in this series adds magic to /dev/ptmx. The rest of the changes deal with all of the little issues needed to ensure that every mount of devpts is a distinct instance. Eric W. Biederman (13): devpts: Teach /dev/ptmx to find the associated devpts via path lookup devpts: More obvious check for the system devpts in pty allocation devpts: Cleanup newinstance parsing devpts: Stop rolling devpts_remount by hand in devpts_mount devpts: Fail early (if appropriate) on overmount devpts: Use the same default mode for both /dev/ptmx and dev/pts/ptmx devpts: Move parse_mount_options into fill_super devpts: Make devpts_kill_sb safe if fsi is NULL devpts: Move the creation of /dev/pts/ptmx into fill_super devpts: Simplify devpts_mount by using mount_nodev vfs: Implement mount_super_once devpts: Always return a distinct instance when mounting devpts: Kill the DEVPTS_MULTIPLE_INSTANCE config option Documentation/filesystems/devpts.txt | 122 +++++++---------- drivers/tty/Kconfig | 11 -- drivers/tty/pty.c | 4 + drivers/tty/tty_io.c | 5 +- fs/devpts/inode.c | 250 ++++++++++++++++++++--------------- fs/namei.c | 64 +++++++-- fs/namespace.c | 8 ++ fs/open.c | 19 +++ fs/super.c | 12 ++ include/linux/devpts_fs.h | 5 + include/linux/fs.h | 5 + include/linux/namei.h | 3 + 12 files changed, 302 insertions(+), 206 deletions(-) This code is also available at: git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git devpts-for-testing Eric