Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932743AbcCJVr0 (ORCPT ); Thu, 10 Mar 2016 16:47:26 -0500 Received: from imap.thunk.org ([74.207.234.97]:57976 "EHLO imap.thunk.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932180AbcCJVrW (ORCPT ); Thu, 10 Mar 2016 16:47:22 -0500 Date: Thu, 10 Mar 2016 16:47:04 -0500 From: "Theodore Ts'o" To: Linus Torvalds Cc: Ric Wheeler , Gregory Farnum , Dave Chinner , "Martin K. Petersen" , Christoph Hellwig , "Darrick J. Wong" , Jens Axboe , Andrew Morton , Linux API , Linux Kernel Mailing List , shane.seymour@hpe.com, Bruce Fields , linux-fsdevel , Jeff Layton , Eric Sandeen Subject: Re: [PATCH 2/2] block: create ioctl to discard-or-zeroout a range of blocks Message-ID: <20160310214704.GA8890@thunk.org> Mail-Followup-To: Theodore Ts'o , Linus Torvalds , Ric Wheeler , Gregory Farnum , Dave Chinner , "Martin K. Petersen" , Christoph Hellwig , "Darrick J. Wong" , Jens Axboe , Andrew Morton , Linux API , Linux Kernel Mailing List , shane.seymour@hpe.com, Bruce Fields , linux-fsdevel , Jeff Layton , Eric Sandeen References: <20160303180924.GA4116@infradead.org> <20160303223952.GE24012@thunk.org> <20160303231050.GU29057@dastard> <20160309230819.GB3949@thunk.org> <56E18B9B.5070503@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) X-SA-Exim-Connect-IP: X-SA-Exim-Mail-From: tytso@thunk.org X-SA-Exim-Scanned: No (on imap.thunk.org); SAEximRunCond expanded to false Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2811 Lines: 56 On Thu, Mar 10, 2016 at 10:33:49AM -0800, Linus Torvalds wrote: > On Thu, Mar 10, 2016 at 6:58 AM, Ric Wheeler wrote: > > > > What was objectionable at the time this patch was raised years back (not > > just to me, but to pretty much every fs developer at LSF/MM that year) > > centered on the concern that this would be viewed as a "performance" mode > > and we get pressure to support this for non-priveleged users. It gives any > > user effectively the ability to read the block device content for previously > > allocated data without restriction. Sure, but it was never "any user". We always had group-based permissions from the beginning. Sure, we passed it in via a mount option which was a bit hacky, but we never got to the point of discussing the way that we would modulate the access --- the complaint seemed to be that if it was a non-root user, it was an unacceptable security hole. And the pushback I got was more in the way of a religious objection more than anything else. Heck, even reserving a code point for the out-of-tree patch received a huge amount of pushback. > The sane way to do it would be to just check permissions of the > underlying block device. > > That way, people can just set the permissions for that to whatever > they want. If google right now uses some magical group for this, they > could make the underlying block device be writable for that group. I'd suggest making it be if you had *read* access to the block device. After all, the risk that everyone was all excited about was the risk of being able to read stale (deleted) data from old files. And there's no point giving the userspace cluster file system daemon the ability to corrupt the file system or set the setuid bit on some arbitrary executable. And if we are going to go this far, then I'd also suggest using this permission check to the user the ability to issue BLKDISCARD on a file. Allowing BLKDISCARD on files is one that should have been even more of a no-brainer, since it could never reveal stale data, but simply wasn't guaranteed to have reliable results because it was a hint to the underlying storage device. But this has also received a huge amount of religious pushback, which is why this is also an out-of-tree patch in the Google kernel. (If that means that our competitors have a higher flash TCO than us, again, no skin off my nose. I tried to get it upstream, and cost of forward porting the patch each time we rebase the kernel isn't _that_ annoying.) - Ted > We can do the security check at the filesystem level, because we have > sb->s_bdev->bd_inode, and if you have read and write permissions to > that inode, you might as well have permission to create a unsafe hole. > > That doesn't sound very hacky to me. > > Linus