Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751440AbcDOPox (ORCPT ); Fri, 15 Apr 2016 11:44:53 -0400 Received: from out03.mta.xmission.com ([166.70.13.233]:36920 "EHLO out03.mta.xmission.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750801AbcDOPou (ORCPT ); Fri, 15 Apr 2016 11:44:50 -0400 From: ebiederm@xmission.com (Eric W. Biederman) To: Linus Torvalds Cc: "H. Peter Anvin" , Andy Lutomirski , security@debian.org, "security\@kernel.org" , Al Viro , "security\@ubuntu.com \>\> security" , Peter Hurley , Serge Hallyn , Willy Tarreau , Aurelien Jarno , One Thousand Gnomes , Jann Horn , Greg KH , Linux Kernel Mailing List , Jiri Slaby , Florian Weimer References: <878u0s3orx.fsf_-_@x220.int.ebiederm.org> <1459819769-30387-1-git-send-email-ebiederm@xmission.com> <87twjcorwg.fsf@x220.int.ebiederm.org> <20160409140909.42315e6d@lxorguk.ukuu.org.uk> <83FE8CD2-C0A2-4ADB-AEBD-8DD89AD4F88A@zytor.com> <87bn5ij0x1.fsf@x220.int.ebiederm.org> <78205895-E11D-417F-91DC-4BCA0B61A122@zytor.com> <570D4781.3070600@zytor.com> Date: Fri, 15 Apr 2016 10:34:00 -0500 In-Reply-To: <570D4781.3070600@zytor.com> (H. Peter Anvin's message of "Tue, 12 Apr 2016 12:07:45 -0700") Message-ID: <877ffyzy1j.fsf_-_@x220.int.ebiederm.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.5 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-XM-AID: U2FsdGVkX1/E9aprNRvD+Ifts3bQeaY5p7bnExMldwk= X-SA-Exim-Connect-IP: 67.3.249.252 X-SA-Exim-Mail-From: ebiederm@xmission.com X-Spam-Report: * -1.0 ALL_TRUSTED Passed through trusted hosts only via SMTP * 0.0 TVD_RCVD_IP Message was received from an IP address * 1.5 XMNoVowels Alpha-numberic number with no vowels * 0.8 BAYES_50 BODY: Bayes spam probability is 40 to 60% * [score: 0.5000] * -0.0 DCC_CHECK_NEGATIVE Not listed in DCC * [sa07 1397; Body=1 Fuz1=1 Fuz2=1] X-Spam-DCC: XMission; sa07 1397; Body=1 Fuz1=1 Fuz2=1 X-Spam-Combo: *;Linus Torvalds X-Spam-Relay-Country: X-Spam-Timing: total 1019 ms - load_scoreonly_sql: 0.03 (0.0%), signal_user_changed: 3.1 (0.3%), b_tie_ro: 2.3 (0.2%), parse: 0.80 (0.1%), extract_message_metadata: 6 (0.5%), get_uri_detail_list: 3.9 (0.4%), tests_pri_-1000: 4.1 (0.4%), tests_pri_-950: 1.15 (0.1%), tests_pri_-900: 0.99 (0.1%), tests_pri_-400: 41 (4.0%), check_bayes: 40 (3.9%), b_tokenize: 12 (1.1%), b_tok_get_all: 16 (1.6%), b_comp_prob: 3.7 (0.4%), b_tok_touch_all: 4.4 (0.4%), b_finish: 0.64 (0.1%), tests_pri_0: 950 (93.2%), check_dkim_signature: 0.52 (0.1%), check_dkim_adsp: 3.0 (0.3%), tests_pri_500: 3.6 (0.4%), rewrite_mail: 0.00 (0.0%) Subject: [PATCH 01/16] devpts: Attempting to get it right X-Spam-Flag: No X-SA-Exim-Version: 4.2.1 (built Wed, 24 Sep 2014 11:00:52 -0600) X-SA-Exim-Scanned: Yes (on in02.mta.xmission.com) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6473 Lines: 130 To recap the situation for those who have not been following closely. There are programs such as xen-create-image that run as root and setup a chroot environment with: "mknod dev/ptmx c 5 2" "mkdir dev/pts" "mount -t devpts none dev/pts" Which mostly works but stomps the mount options of the system /dev/pts. In particular the options of "gid=5,mode=620" are lost resulting in a situation where creating a new pty by opening /dev/ptmx results in that pty having the wrong permissions. Some distributions have been working around this problem by continuing to install a setuid root pt_chown binary that will be called by glibc to fix the permissions. Maintaining a setuid root pt_chown binary is not too scary until multiple instances of devpts are considered at which point it becomes possible to trick the setuid root pt_chown binary into operating on the wrong files and directories. Leading to all of the things one might fear when a setuid root binary goes wrong. The following patchset digs us out of that hole by carefully devpts and /dev/ptmx in a way that does not introduce any userspace regressions, while making each mount of devpts distinct (so pt_chown is unnecessary) and arranging things so that enough information is available so that a secure pt_chown binary is possible to write if that is ever needed. The approach I have chosen to take is to first enhance the /dev/ptmx device node to automount /dev/pts/ptmx on top of it. This leads to a simple high performance solution that allows applications such as xen-create-image (that call "mknod ptmx c 5 2" and mount devpts) to continue to run as before even when they are given a non-system instance of devpts. Using automountic bind mounts of /dev/pts/ptmx results in no new security cases to consider as this can already be done, and actually results in a simplification of the analysis of the code. As now all opens of ptmx are of /dev/pts/ptmx. /dev/ptmx is now just a magic mountpoint that does the right thing. Allowing each mount of devpts to be distinct is also a bit interesting as there is a concept in the code of the primary system devpts instance. /dev/ptmx automounts the primary system instance of devpts if can not find an appropriate devpts instance by path lookup. The sysctl sys.kernel.pty.max is a global maximum of the number of ptys in the system with sys.kernel.pty.reserve the number of those ptys reserved exclusively for the system instance of devpts. Overmounting the system instance of devpts with itself is expected to fail but update the devpts mount options anyway. In my testing I have found pieces of code that depend or at least appear to depend on all of these propeties. The particular challenge in all of this have been distro's that mount devpts in initial ram disks, and then mount devpts again during regular system startup. It took a little bit of careful arranging to ensure that it is the system instance of devpts that always winds up on /dev/pts for all distros. CentOS5 and CentOS6 were particularly challenging examples. To look for surprising userspace behavior I have attempted to test this patchset on a representative sample of linux distributions. The distributions I managed to setup and test in vms are: on openwrt-15.05, centos5, centos6, centos7, debian-6.0.2, debian-7.9, debian-8.2, ubuntu-14.04.3, ubuntu-15.10, fedora23, magia-5, mint-17.3, opensuse-42.1, slackware-14.1, gentoo-20151225 (13.0?), archlinux-2015-12-01. I wanted to test Android (as it is one of the most unique linux distributions) but I could not find a freely available image that was easy to get going in a VM, so I audited the Android code instead. Android has a daemon that is responsible for everything under /dev that listens on for netlink device events, consultis it's policy data base and if the Android policy allows creates the device node in a tmpfs instance mounted on /dev with the attributes specified by policy. Furthermore at system startup this daemon mounts devpts exactly once, which thankfully presents no interesting challenges. I have also run xen-create-image on debian 8.2 (where it was easily installed with apt-get) and confirmed that without these changes it stomps the mount options of devpts and with these changes it only uses atypical mount options on a separate instance of devpts. The current technique of automounting /dev/pts/ptmx onto /dev/ptmx results in the best userspace semantics and the easiest to understand and maintain kernel code that I have seen implemented or heard proposed in this discussion, as semantically and in the implementation each piece is tasked with doing one thing. Eric W. Biederman (16): devpts: Use the same default mode for both /dev/ptmx and dev/pts/ptmx devpts: Set the proper fops for /dev/pts/ptmx vfs: Implement vfs_loopback_mount devpts: Teach /dev/ptmx to automount the appropriate devpts via path lookup vfs: Allow unlink, and rename on expirable file mounts devpts: More obvious check for the system devpts in pty allocation devpts: Cleanup newinstance parsing devpts: Stop rolling devpts_remount by hand in devpts_mount devpts: Fail early (if appropriate) on overmount devpts: Move parse_mount_options into fill_super devpts: Make devpts_kill_sb safe if fsi is NULL devpts: Move the creation of /dev/pts/ptmx into fill_super devpts: Simplify devpts_mount by using mount_nodev vfs: Implement mount_super_once devpts: Always return a distinct instance when mounting devpts: Kill the DEVPTS_MULTIPLE_INSTANCE config option Documentation/filesystems/devpts.txt | 122 +++++++----------- drivers/tty/Kconfig | 11 -- drivers/tty/pty.c | 2 +- drivers/tty/tty_io.c | 5 +- fs/devpts/inode.c | 234 +++++++++++++++++++---------------- fs/inode.c | 3 + fs/namei.c | 83 +++++++++++-- fs/namespace.c | 25 +++- fs/super.c | 34 +++++ include/linux/devpts_fs.h | 18 +++ include/linux/fs.h | 3 + include/linux/mount.h | 1 + include/linux/namei.h | 2 + 13 files changed, 330 insertions(+), 213 deletions(-) This code is also available at: git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git devpts-for-testing Eric