Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp200359imu; Wed, 19 Dec 2018 16:40:12 -0800 (PST) X-Google-Smtp-Source: AFSGD/V66uAE6Zm/l6OFHH2pDjkUzZsHt3KLmDkDSIhU1V/OGykljYSNoY5e2a01C+WOI9NzTExK X-Received: by 2002:a62:9fd9:: with SMTP id v86mr22265636pfk.191.1545266412587; Wed, 19 Dec 2018 16:40:12 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1545266412; cv=none; d=google.com; s=arc-20160816; b=LCb6muUNUjhtZeAS5iIP99gU02x90m2bj2jHU8Ofm1lUS2wTqBrwQmKXfACePlC8qG aMxEUe7WDQaqFA7fjNO+KMA9Rk+DvKI4796WghpAMPK84Ne4JMCAWo5ffiQPICFwbWE9 rhHPU3h2AoiYdkvDgCutVezQJtOWL2wLroA8MHF9xZ5DOtbvhnRL6nraM9wpTEQKb8Vo DVTWc8zz6OKANo0jMpbJVHOMwH4/9se1J3cWz9S9yuqLiDOxqC7OlyQzKVXQBid8rJQQ vOakeVgVq+GVqJaUJ0AM2g0BoCvzcKPS2zkyj3A5dnKiXZWBvSCrxfzANgy3N7mbrPIt v5fA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:to :from:date; bh=rC08PWQXxyMYsrmc087Q8PG8uIH9O5is0jSs4uH8b/8=; b=aSp8p5eVStSt6jk3K8f+dY320f1L5efFz8h8p5zGmpFGhd47rfcbA13BP3Z0Ep5Zt1 T1RCz1MV2m38vgljnoXN4ndLp/ZY+X8PwWUsv5iPf9cLqiVAMyfgY7VfS40ZVP8kPutx wx3m4Zp8PIZDwYUAfAoUDi+/+ngEl+Dk7dvntvHcKw8k37d3lLkZqjs1ecvm64cav9pQ ST1208nuSlnZHln/+NO3dF4eLClPUbb0oAwewGBM/CwXHTrqGYZDoROuiADLo8ZiHPsq wlLgBy1VAqdxDWxAjJureWTTJE7NF/PKW/8xp13bFrxZ+WqHB6AnJKij8Lhq6U7e/DvI NLOg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id v4si17527505pfm.71.2018.12.19.16.39.43; Wed, 19 Dec 2018 16:40:12 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729036AbeLSVgA (ORCPT + 99 others); Wed, 19 Dec 2018 16:36:00 -0500 Received: from ipmail07.adl2.internode.on.net ([150.101.137.131]:58806 "EHLO ipmail07.adl2.internode.on.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727387AbeLSVf7 (ORCPT ); Wed, 19 Dec 2018 16:35:59 -0500 Received: from ppp59-167-129-252.static.internode.on.net (HELO dastard) ([59.167.129.252]) by ipmail07.adl2.internode.on.net with ESMTP; 20 Dec 2018 08:05:55 +1030 Received: from dave by dastard with local (Exim 4.80) (envelope-from ) id 1gZjVQ-0002EB-WF; Thu, 20 Dec 2018 08:35:53 +1100 Date: Thu, 20 Dec 2018 08:35:52 +1100 From: Dave Chinner To: "Theodore Y. Ts'o" , Christoph Hellwig , "Darrick J. Wong" , Eric Biggers , linux-fscrypt@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-ext4@vger.kernel.org, linux-f2fs-devel@lists.sourceforge.net, linux-integrity@vger.kernel.org, linux-kernel@vger.kernel.org, Jaegeuk Kim , Victor Hsieh , Chandan Rajendra , Linus Torvalds Subject: Re: [PATCH v2 01/12] fs-verity: add a documentation file Message-ID: <20181219213552.GO6311@dastard> References: <20181219071420.GC2628@infradead.org> <20181219021953.GD31274@dastard> <20181219193005.GB6889@mit.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20181219193005.GB6889@mit.edu> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Dec 19, 2018 at 02:30:05PM -0500, Theodore Y. Ts'o wrote: > On Wed, Dec 19, 2018 at 01:19:53PM +1100, Dave Chinner wrote: > > Putting metadata in user files beyond EOF doesn't work with XFS's > > post-EOF speculative allocation algorithms. > > > > i.e. Filesystem design/algorithms often assume that the region > > beyond EOF in user files is a write-only region. e.g. We can allow > > extents beyond EOF to be uninitialised because they are in a write > > only region of the file and so there's no possibility of stale data > > exposure. Unfortunately, putting filesystem/security metadata beyond > > EOF breaks these assumptions - it's no longer a write-only region. > > On Tue, Dec 18, 2018 at 11:14:20PM -0800, Christoph Hellwig wrote: > > Filesystems already use blocks beyond EOF for preallocation, either > > speculative by the file system itself, or explicitly by the user with > > fallocate. I bet you will run into bugs with your creative abuse > > sooner or later. Indepnd of that the interface simply is gross, which > > is enough of a reason not to merge it. > > Both of these concerns aren't applicable for fs-verity because the > entire file will be read-only. So there will be no preallocation or > fallocation going on --- or allowed --- for a file which is protected > by fs-verity. Since no writes are allowed at all, it won't break any > file systems' assumptions about "write-only regions". The file has to be written before it has been protected, which means it may very well have user space allocated beyond EOF before the merkle tree needs to be written. And, well, the fact that it is all read only after creation is a feature implementation detail that allows fsverity to "get away with" storing it's metadata in file data space. But whether or not fsverity is enabled on the filesystem, the fact is that the kernel code now has to support storing and reading data from beyond EOF. Every user, whether they are using fsverity or not, is now exposed to that code and a filesystem that no longer considers the user data region beyond EOF as write only. i.e. it doesn't matter if fsverity is in use, then ext4/f2fs code now allows reading of information beyond EOF from user data files i.e. you've completely changed the way files appear to /everyone/, not just the users of fsverity. Not only that, you now have file data that has a specific metadata on-disk format encoded into file data space. That greatly complicates filesystem checking and scrubbing which typically /doesn't even look at the contents of user data/. So yeah, this hack might make the merkle tree verification "simple" but it greatly complicates everything else the filesystem has to do. That's the problem here - fsverity completely redefines the layout of user data files for everyone, not just fsverity, and not just the filesystems that implement fsverity. You've taken an ext4 fsverity implementation feature and promoted it to being a linux-wide file data layout standard that is encoded into the kernel/user ABI/API forever more. And you're trying to force this into the tree on everyone without adequate review because "a product is already shipping with this code in it". Didn't we learn the lessons of failing to "upstream first" new features more than 15 years ago? > As far as whether it's "gross" --- that's a taste question, and I > happen to think it's more "clever" than "gross". You think it's clever because it's a neat hack that makes what you need simple to implement, and so you can ship it in phones quickly and without needing to involve upstream in more complex design discussions. I think it's gross because it bleeds implementation details all over the API and globally redefines the user data file layout for everyone, kernel wide in a manner that is incompatible with existing filesystem implementations. > It allows for a very > simple implementation, *leveraging* the fact that the file will never > change --- and especially, grow in length. So why not use the space > after EOF? There have been many technical reasons given for why it's a bad interfaces, yet you only address entirely subjective arguments and claim that you have "good taste" in APIs because it is "clever". > The alternative requires adding Solaris-style alternate data streams > support. No, it does not. It simply requires a different userspace API to move the merkle tree data into the kernel, and a different implemetnation abstraction that allows filesystems to provide the merkle tree data pages on request. Darrick and Christoph have already suggested alternative user APIs that would work just fine, and they don't ahve a requirement that the merkle tree is held in the user data space beyond EOF. How filesystems store and retrieve merkle tree data should be a filesystem internal detail. If how metadata is stored in th e filesystem is defined by the userspace API or the kernel library code that implements the verification feature, then it lacks the necessary abstraction to be a generic Linux filesystem feature. IOWs, it needs to be redesigned and reworked before we should consider it for merging. Cheers, Dave. -- Dave Chinner david@fromorbit.com