Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933100AbcCKRa1 (ORCPT ); Fri, 11 Mar 2016 12:30:27 -0500 Received: from mail-ob0-f172.google.com ([209.85.214.172]:35866 "EHLO mail-ob0-f172.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932863AbcCKRaY (ORCPT ); Fri, 11 Mar 2016 12:30:24 -0500 MIME-Version: 1.0 In-Reply-To: References: <20160302040947.16685.42926.stgit@birch.djwong.org> <20160302225601.GB21890@birch.djwong.org> <20160303180924.GA4116@infradead.org> <20160303223952.GE24012@thunk.org> <20160303231050.GU29057@dastard> <20160309230819.GB3949@thunk.org> <56E18B9B.5070503@gmail.com> <56E24CA5.3030702@redhat.com> <20160311135952.57a44931@lxorguk.ukuu.org.uk> From: Andy Lutomirski Date: Fri, 11 Mar 2016 09:30:04 -0800 Message-ID: Subject: Re: [PATCH 2/2] block: create ioctl to discard-or-zeroout a range of blocks To: Linus Torvalds Cc: One Thousand Gnomes , Ric Wheeler , "Theodore Ts'o" , Gregory Farnum , Dave Chinner , "Martin K. Petersen" , Christoph Hellwig , "Darrick J. Wong" , Jens Axboe , Andrew Morton , Linux API , Linux Kernel Mailing List , shane.seymour@hpe.com, Bruce Fields , linux-fsdevel , Jeff Layton , Eric Sandeen Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1749 Lines: 40 On Fri, Mar 11, 2016 at 9:23 AM, Linus Torvalds wrote: > On Fri, Mar 11, 2016 at 5:59 AM, One Thousand Gnomes > wrote: >> >> > > We can do the security check at the filesystem level, because we have >> > > sb->s_bdev->bd_inode, and if you have read and write permissions to >> > > that inode, you might as well have permission to create a unsafe hole. >> >> Not if you don't have access to a block device node to open it, or there >> are SELinux rules that control the access. There are cases it isn't >> entirely the same thing as far as I can see. Consider within a container >> for example. > > I agree that it's not the same thing, but I don't think it really ends > up mattering. > > Either the container is properly separated and set up - in which case > the uid mapping is what protects you - or it isn't - in which case the > container could just mknod whatever hell node it wants anyway. > > So we do pretty much have the permission model. This makes me nervous. Suppose I unshare my user namespace, set up very restrictive mounts, drop caps, seccomp the hell out of myself (but allow literally only read, write, and ioctl and keep only a single fd to a file on an ordinary filesystem, which should be safe), and run untrusted code. Now that code can do this unsafe ioctl simply because its uid or gid happens to have read access to a device node that isn't even present in the sandbox. Ick. What if we had an ioctl to do these data-leaking operations that took, as an extra parameter, an fd to the block device node. They allow access if the fd points to the right inode and has FMODE_READ (and LSM checks say it's okay). Sure, it's awkward, but it's much safer. --Andy