Received: by 2002:a05:6358:16cc:b0:ea:6187:17c9 with SMTP id r12csp7735044rwl; Fri, 30 Dec 2022 13:38:54 -0800 (PST) X-Google-Smtp-Source: AMrXdXsqXMNF/nJ0qDI8JQ0n+O4/AMHejqaIbMEcE8UfUASQf2A8ghrnmMMVAGY7fTyoKDoKxeev X-Received: by 2002:a17:902:74ca:b0:18d:dd85:303f with SMTP id f10-20020a17090274ca00b0018ddd85303fmr35740102plt.22.1672436334332; Fri, 30 Dec 2022 13:38:54 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1672436334; cv=none; d=google.com; s=arc-20160816; b=OlLZDIWxQE3wOVvfThHWh+lsy7VjYucXOEB+9kqGBTZh43xPFK2vOxvV0G9KbAq3KS uViRkumwTwOb1xvEs5L7Q2edU425JyxLV4BISumcEKxJ+9S2y7E0DUKISnuojcfo8DBN I1zsluiavq0mGcOxyeUaD2CRr5ASWjvC59xZGILf/+GazDJHCRQGyZRAVpL8uYbMH3aP RL7JSNmmmJklS6+sHG5b8pX+/YVmdJkFf9k5cItNoa8LTVF46I7LLA9fCgz6JlC7lyxR Vor9r4CBo5btMLvnV5Qc6ZuquMf7B6IWULzaUCUC/y99dOpeWcROmVR7jLtMTTnlEP6G xwLw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:to:from:date:dkim-signature; bh=euERvs2HU3LqUEQaAF6N/2ul5uzvP/dPc6DKuBcrKNk=; b=FMnCXXwFnFMjfOsCy1ilHuYQ3zR9lsKr0c0SaDZV9e/IHeaqyq7GQH0eLYzTP4DW34 gXIUxVNQB3w6/jw80hofyifZG7ANXk2rk9XBioiZQrk8jBLELmzBFtV108Q6SSR7MG1J pM72ezmi3iUM9WCkA6A9RM43wlg6dRbzhmQhDCWHHsQn9HYfuYqB+kfyPB8UCfy9Ph3p 8zXfiwWEY6Mib21GpVDbX0Yy0CSsdkyAyS8VXyKIjrAwJJkj84hxJ3jWHlxAAeDaFeoi 1cMaEwDDERjf8EF1FvaD9gwca41BnHRbYAH+2c4u/yBXWsZaILfquo9Upo+3LJw74zzu sElw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=EA2Yrb54; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id n7-20020a170902d2c700b0017840d9d42esi22982040plc.582.2022.12.30.13.38.46; Fri, 30 Dec 2022 13:38:54 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=EA2Yrb54; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235558AbiL3VN5 (ORCPT + 62 others); Fri, 30 Dec 2022 16:13:57 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37668 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229519AbiL3VNy (ORCPT ); Fri, 30 Dec 2022 16:13:54 -0500 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F3CBF1C40D; Fri, 30 Dec 2022 13:13:52 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 849AC61C1A; Fri, 30 Dec 2022 21:13:52 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 81276C433D2; Fri, 30 Dec 2022 21:13:51 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1672434831; bh=K8Be9B7MZ1FBqjy0IqxfebJV0CPskx6ZPtnR4uccYco=; h=Date:From:To:Subject:References:In-Reply-To:From; b=EA2Yrb54u88yl5wFtp/iGxl7jW+s4otHjUiIw49mC4zYu7GO1K8HJf1fI1py+rymR jkFv5NkA7lxpZlLjOqrRZguDwLvcGF3voyHEpIKaITsnjxBx/VFm6hZunH8velDZzn 88EtFQyIztL9gF8/Y8TahDpqU9vFYeLAH9I4MT1oJ+QAYjkCiydBfoM4NBh63JP+/I SBB/T5qRWGXRu33scOJ7cHHEnSitj4WiLFB/pj9gdH6VowQ0HczyWgRMMtI6OTC23Q O1+XoaQAmdq81Jw8rPdgE/tnFVA8Ksv9xpqSEcsIE3OxH3hHxPJnEE/M8386Zo68rt a8BaF2I7lUWyg== Date: Fri, 30 Dec 2022 13:13:49 -0800 From: Eric Biggers To: linux-fscrypt@vger.kernel.org, linux-erofs@lists.ozlabs.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Jingbo Xu , Joseph Qi , Liu Jiang , Zefan Li , Xin Yin , Liu Bo , Gao Xiang Subject: Re: [RFC] fs-verity and encryption for EROFS Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Gao, On Thu, Dec 22, 2022 at 11:24:34AM +0800, Gao Xiang wrote: > ( + more lists ) > > On Wed, Dec 21, 2022 at 02:41:40PM +0800, Gao Xiang wrote: > > Hi folks, > > > > (As Eric suggested, I post it on list now..) > > > > In order to outline what we could do next to benefit various image-based > > distribution use cases (especially for signed+verified images and > > confidential computing), I'd like to discuss two potential new > > features for EROFS: verification and encryption. > > > > - Verification > > > > As we're known that currently dm-verity is mainly used for read-only > > devices to keep the image integrity. However, if we consider an > > image-based system with lots of shared blobs (no matter they are > > device-based or file-based). IMHO, it'd be better to have an in-band > > (rather than a device-mapper out-of-band) approach to verify such blobs. > > > > In particular, currently in container image use cases, an EROFS image > > can consist of > > > > - one meta blob for metadata and filesystem tree; > > > > - several data-shared blobs with chunk-based de-duplicated data (in > > layers to form the incremental update way; or some other ways like > > one file-one blob) > > > > Currently data blobs can be varied from (typically) dozen blobs to (in > > principle) 2^16 - 1 blobs. dm-verity setup is much hard to cover such > > usage but that distribution form is more and more common with the > > revolution of containerization. > > > > Also since we have EROFS over fscache infrastructure, file-based > > distribution makes dm-verity almost impossible as well. Generally we > > could enable underlayfs fs-verity I think, but considering on-demand > > lazy pulling from remote, such data may be incomplete before data is > > fully downloaded. (I think that is also almost like what Google did > > fs-verity for incfs.) In addition, IMO it's not good if we rely on > > features of a random underlay fs with generated tree from random > > hashing algorithm and no original signing (by image creator). > > random hashing algorithm, underlay block sizes, (maybe) new underlay > layout and no original signing, which impacts reproduction. > > > > > My preliminary thought for EROFS on verification is to have blob-based > > (or device-based) merkle trees but makes such image integrity > > self-contained so that Android, embedded, system rootfs, and container > > use cases can all benefit from it.. > > > > Also as a self-containerd verfication approaches as the other Linux > > filesystems, it makes bootloaders and individual EROFS image unpacker > > to support/check image integrity and signing easily... > > > > It seems the current fs-verity codebase can almost be well-fitted for > > this with some minor modification. If possible, we could go further > > in this way. More details and background information would be really appreciated here. I thought that EROFS is a simple block-device based filesystem. It sounds like that's fundamentally changed. How does it work now? Part of the issue is that crazy proposals involving fsverity are a dime a dozen; recent examples are https://lore.kernel.org/r/20211112124411.1948809-6-roberto.sassu@huawei.com https://lore.kernel.org/r/D3AF9D1E-12E1-434F-AEA4-5892E8BC66AB@gmail.com and https://lore.kernel.org/r/cover.1669631086.git.alexl@redhat.com. It's hard to know which ones to pay attention to, and they tend to just go away on their own anyway. You haven't provided enough details for me to properly understand your proposal, but to me it sounds similar to the Composefs proposal (https://lore.kernel.org/r/cover.1669631086.git.alexl@redhat.com). That proposal made some amount of sense, and it came with documentation and code. IIUC, in Composefs (a) all filesystem metadata is trusted and provided at mount time, and (b) all file contents are untrusted and are retrieved from external backing files. So to authenticate a file's contents, the filesystem metadata just needs to include a cryptographic hash of that file's contents, and the filesystem just needs to compare the actual hash to that expected hash. Of course, one way to implement that is to use fsverity file digests and to enforce that the backing file has fsverity enabled with the correct digest. It sounds like what you are proposing in EROFS is similar, but differs in that you want block-level data deduplication as well. That presumably means that EROFS will represent file contents as a list of deduplicated data blocks, each of which is fairly small and not randomly accessible. In that case a Merkle tree over each block would not make sense. There should just be a standard cryptographic hash for each block. So I don't see how fsverity would be relevant at all. Does that sound right to you? > > - Encryption > > > > I also have some rough preliminary thought for EROFS encryption. > > (Although that is not quite in details as verification.) Currently we > > have full-disk encryption and file-based encryption, However, in order > > to do finer data sharing between encrypted data (it seems hard to do > > finer data de-duplication with file-based encryption), we could also > > consider modified convergence encryption, especially for image-based > > offline data. > > > > In order to prevent dictionary attack, the key itself may not directly be > > derived from its data hashing, but we could assign some random key > > relating to specific data as an encrypted chunk and find a way to share > > these keys and data in a trusted domain. > > > > The similar thought was also shown in the presentation of AWS Lambda > > sparse filesystem, although they don't show much internal details: > > https://youtu.be/FTwsMYXWGB0 > > > > Anyway, for encryption, it's just a preliminary thought but we're happy > > to have a better encryption solution for data sharing for confidential > > container images... How would this compare to the old-school approach (commonly used by backup software) of just encrypting the deduplicated data blocks with the user's key? That leaks information about the plaintext, but it's usually considered an acceptable tradeoff. - Eric