Received: by 2002:a05:6a10:9afc:0:0:0:0 with SMTP id t28csp2277429pxm; Fri, 4 Mar 2022 12:57:41 -0800 (PST) X-Google-Smtp-Source: ABdhPJyHZYfaH79QmKjafCUyfGje8P6VRf96J6TbeR8fWbOJBXBonYlUHxlbOc246gH2SebaHMk0 X-Received: by 2002:a17:90b:3b81:b0:1bc:d92f:d359 with SMTP id pc1-20020a17090b3b8100b001bcd92fd359mr12599288pjb.36.1646427461121; Fri, 04 Mar 2022 12:57:41 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1646427461; cv=none; d=google.com; s=arc-20160816; b=JdBJt/MEyIYGnorx7JSKsHNfLRCAn7YAuerN94TO7CZJ9h7RYiMEYqb6ngVjNiTO6Y syywuqR/wHscARsmHQzyl7ueaoVIPe2xoezeQDeI4HLN6lBFdELad9AfpJOH2UEQUQqn etmzeWZyHzp9+5j+JKBJlw+71W+WPIKyDaZXyY0NXf7apBpsj+a/y7A4eK18qxNKm6Iv q7D5MYfKKOxlBG7PROlKmcAeHMTesKMgP8mZ/KDE1nq1AfARq7zR6l8WFejbpi9PStqK kY4GIELhVcmhxnH2KFoNE7C40bHR5T9OBo893Rx1v5jXjDZTuY7vWsT6jSFXQlieuXvO sm0A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id:dkim-signature; bh=3aMlrHS4l13iZGJNG2RaYlyFWsDhiCGPeM01CVB0ZCQ=; b=JfHJq4Fu2zLhpQbo6px7L8RNhrUyF/L6YoIglA6nmGiQafcdF5WkD0F0S7QJc1irNe 4m9GjV0nUjR91cSBV9/NzGllyHtLBVlv+Cajf7X3dW/I0glFNUC8vUYqJ+sx8OT1tu5K XADG6/KLyDhM1P7lYNPsMq8ztVQvNxzh1mjEmxAfhrDhFwJ4F9YYtlYSsutgLCI/jW0U atiwn2y5/u9iXZfBypjp9wE9PvMlmqv2Ke99xhXeU+h/kZLkxtrOic70gTR8zFdf1Jjq sM6W9Y2bE12Sq68RILqqaXyFy5Xg79HlzK019QABBDc+JFtuB5zHe3a3wuG4AoJ+q/zY AhtQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=rcP1L14p; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [23.128.96.19]) by mx.google.com with ESMTPS id f7-20020a170902ce8700b0015134d06714si5693113plg.1.2022.03.04.12.57.40 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 04 Mar 2022 12:57:41 -0800 (PST) Received-SPF: softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) client-ip=23.128.96.19; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=rcP1L14p; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id C1299243140; Fri, 4 Mar 2022 12:01:31 -0800 (PST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229497AbiCDUBU (ORCPT + 99 others); Fri, 4 Mar 2022 15:01:20 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36982 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229995AbiCDUA5 (ORCPT ); Fri, 4 Mar 2022 15:00:57 -0500 Received: from ams.source.kernel.org (ams.source.kernel.org [IPv6:2604:1380:4601:e00::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7770F23BF05; Fri, 4 Mar 2022 11:52:33 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id ABE50B82B5E; Fri, 4 Mar 2022 19:24:35 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 2E62EC340E9; Fri, 4 Mar 2022 19:24:32 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1646421874; bh=56U7h/PWatXCc+X0v4IbqX2HG9yyywUYaKIctbWn0sI=; h=Date:Subject:To:Cc:References:From:In-Reply-To:From; b=rcP1L14pFaadmjABQMFUr1EX7rWJpbYgqQlSedShGUvIRlZ8fa4jppJuK+5TIdI2g wibl11/CJ6TjP8fu7MvwoRWIejhd4jB3LvIoeQkylIrREoRzQUkMIQRZTUbjldUszp zCjaWrxHMHV6DfUWk453gSFhDX2+0ofcCUp7G5vUgvHTiBRluWdDHjOHRh/7lb60jq jmoaWcRzOe35+r9+UystrpbhsHpwc/Jxy6rb2chN2Eq8yfmDf+A+JFjNFJ6WuZ/HCl ztl8skrIds3LAyw2HgJ6Wd5mZP+fXAZuInXKeTd+HzMsdWuIL1nnbxfhwG5VK1axZP Jr2SPQZffiFoA== Message-ID: <7cc65bbd-e323-eabb-c576-b5656a3355ac@kernel.org> Date: Fri, 4 Mar 2022 11:24:30 -0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.5.0 Subject: Re: [PATCH v4 01/12] mm/shmem: Introduce F_SEAL_INACCESSIBLE Content-Language: en-US To: Steven Price , Chao Peng Cc: kvm list , Linux Kernel Mailing List , linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, qemu-devel@nongnu.org, Linux API , Paolo Bonzini , Jonathan Corbet , Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Thomas Gleixner , Ingo Molnar , Borislav Petkov , the arch/x86 maintainers , "H. Peter Anvin" , Hugh Dickins , Jeff Layton , "J . Bruce Fields" , Andrew Morton , Yu Zhang , "Kirill A. Shutemov" , "Nakajima, Jun" , Dave Hansen , Andi Kleen , David Hildenbrand References: <20220118132121.31388-1-chao.p.peng@linux.intel.com> <20220118132121.31388-2-chao.p.peng@linux.intel.com> <619547ad-de96-1be9-036b-a7b4e99b09a6@kernel.org> <20220217130631.GB32679@chaop.bj.intel.com> <2ca78dcb-61d9-4c9d-baa9-955b6f4298bb@www.fastmail.com> <20220223114935.GA53733@chaop.bj.intel.com> <71a06402-6743-bfd2-bbd4-997f8e256554@arm.com> From: Andy Lutomirski In-Reply-To: <71a06402-6743-bfd2-bbd4-997f8e256554@arm.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-2.7 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,MAILING_LIST_MULTI, NICE_REPLY_A,RDNS_NONE,SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2/23/22 04:05, Steven Price wrote: > On 23/02/2022 11:49, Chao Peng wrote: >> On Thu, Feb 17, 2022 at 11:09:35AM -0800, Andy Lutomirski wrote: >>> On Thu, Feb 17, 2022, at 5:06 AM, Chao Peng wrote: >>>> On Fri, Feb 11, 2022 at 03:33:35PM -0800, Andy Lutomirski wrote: >>>>> On 1/18/22 05:21, Chao Peng wrote: >>>>>> From: "Kirill A. Shutemov" >>>>>> >>>>>> Introduce a new seal F_SEAL_INACCESSIBLE indicating the content of >>>>>> the file is inaccessible from userspace through ordinary MMU access >>>>>> (e.g., read/write/mmap). However, the file content can be accessed >>>>>> via a different mechanism (e.g. KVM MMU) indirectly. >>>>>> >>>>>> It provides semantics required for KVM guest private memory support >>>>>> that a file descriptor with this seal set is going to be used as the >>>>>> source of guest memory in confidential computing environments such >>>>>> as Intel TDX/AMD SEV but may not be accessible from host userspace. >>>>>> >>>>>> At this time only shmem implements this seal. >>>>>> >>>>> >>>>> I don't dislike this *that* much, but I do dislike this. F_SEAL_INACCESSIBLE >>>>> essentially transmutes a memfd into a different type of object. While this >>>>> can apparently be done successfully and without races (as in this code), >>>>> it's at least awkward. I think that either creating a special inaccessible >>>>> memfd should be a single operation that create the correct type of object or >>>>> there should be a clear justification for why it's a two-step process. >>>> >>>> Now one justification maybe from Stever's comment to patch-00: for ARM >>>> usage it can be used with creating a normal memfd, (partially)populate >>>> it with initial guest memory content (e.g. firmware), and then >>>> F_SEAL_INACCESSIBLE it just before the first time lunch of the guest in >>>> KVM (definitely the current code needs to be changed to support that). >>> >>> Except we don't allow F_SEAL_INACCESSIBLE on a non-empty file, right? So this won't work. >> >> Hmm, right, if we set F_SEAL_INACCESSIBLE on a non-empty file, we will >> need to make sure access to existing mmap-ed area should be prevented, >> but that is hard. >> >>> >>> In any case, the whole confidential VM initialization story is a bit buddy. From the earlier emails, it sounds like ARM expects the host to fill in guest memory and measure it. From my recollection of Intel's scheme (which may well be wrong, and I could easily be confusing it with SGX), TDX instead measures what is essentially a transcript of the series of operations that initializes the VM. These are fundamentally not the same thing even if they accomplish the same end goal. For TDX, we unavoidably need an operation (ioctl or similar) that initializes things according to the VM's instructions, and ARM ought to be able to use roughly the same mechanism. >> >> Yes, TDX requires a ioctl. Steven may comment on the ARM part. > > The Arm story is evolving so I can't give a definite answer yet. Our > current prototyping works by creating the initial VM content in a > memslot as with a normal VM and then calling an ioctl which throws the > big switch and converts all the (populated) pages to be protected. At > this point the RMM performs a measurement of the data that the VM is > being populated with. > > The above (in our prototype) suffers from all the expected problems with > a malicious VMM being able to trick the host kernel into accessing those > pages after they have been protected (causing a fault detected by the > hardware). > > The ideal (from our perspective) approach would be to follow the same > flow but where the VMM populates a memfd rather than normal anonymous > pages. The memfd could then be sealed and the pages converted to > protected ones (with the RMM measuring them in the process). > > The question becomes how is that memfd populated? It would be nice if > that could be done using normal operations on a memfd (i.e. using > mmap()) and therefore this code could be (relatively) portable. This > would mean that any pages mapped from the memfd would either need to > block the sealing or be revoked at the time of sealing. > > The other approach is we could of course implement a special ioctl which > effectively does a memcpy into the (created empty and sealed) memfd and > does the necessary dance with the RMM to measure the contents. This > would match the "transcript of the series of operations" described above > - but seems much less ideal from the viewpoint of the VMM. A VMM that supports Other Vendors will need to understand this sort of model regardless. I don't particularly mind the idea of having the kernel consume a normal memfd and spit out a new object, but I find the concept of changing the type of the object in place, even if it has other references, and trying to control all the resulting races to be somewhat alarming. In pseudo-Rust, this is the difference between: fn convert_to_private(in: &mut Memfd) and fn convert_to_private(in: Memfd) -> PrivateMemoryFd This doesn't map particularly nicely to the kernel, though. --Andy\