Received: by 2002:a05:6a10:9afc:0:0:0:0 with SMTP id t28csp526803pxm; Wed, 23 Feb 2022 05:44:49 -0800 (PST) X-Google-Smtp-Source: ABdhPJwFf1fPUjJ2G0YqoJ9csIAS0OP0QA5wsNC/BI3Ivjh9i7zN9RMATQLetDDyMmjA3EWxVlKb X-Received: by 2002:a17:902:ba8a:b0:14e:e8e6:7215 with SMTP id k10-20020a170902ba8a00b0014ee8e67215mr27893932pls.135.1645623888847; Wed, 23 Feb 2022 05:44:48 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1645623888; cv=none; d=google.com; s=arc-20160816; b=tf8BRi3Ad290WWfjRm/U5FMpVPV4zwXLk2wV7wnVviVqpteq85LGqK48kxKy0HszWT dhsLjWeDe87U6KJuu4CeUvvhwrLxs2si3XzZdcux91MpgBzSm5A6xn/BOjliP/6iBToC bjMonesFoYMtXKqgcLlujjmTG+bcsTXmzTNq7Qd1WFmGGbxWzS+R2/+3CRBVJzmq6Bsy aBuRj1Aw9yxKMYyAQEjoD8jgKWZZYExpM181WZa4gvGSr9hLBx2HEPKn5pe8qf6KdheZ BdrkiX5I3Ni8O+oZ2dNHrjarobNxSwUpIcsmA+kcR8HDLaORdgSM1foOY8m3Aqx8Vbvv wxOw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-disposition :mime-version:references:reply-to:message-id:subject:cc:to:from:date :dkim-signature; bh=OqCPjhXoOr6jXN4VGy0M8KKW8px0O3j3WZLH01Zv53o=; b=ySZaTWSyIFKFS9ThAn+XkflzyZg+KJqAcnKx1canPqC3DGg+mnL6mvdkMd9oA99RP6 UkX3HfS3ztFTbvpVHkp8J2Bj6ChSr+Rh7yK/Jb7qr7miymihaEWUFWZZUgMwb5rDkCBW Ct7MXOpKYB9Z5H2cZZGga3uimv0jjRlygEpzAWJnixslQy1e5kLkVsUQrn2VE7wCVYw6 66/vzRvcwOz84aPuAGzfL6BKnEtY8Jl1zyNW/A17PQJvAzP6LnJDVFP1HJLNHcemSc8Q plUxVutg6rOUCeQW/WhRb6/XVGkoRdPQTeqn9T40p/jEoDO5ZMD9JPpTUwRYKw3Rdmvu 5Eaw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=nRRHW7f0; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id s5si22198353pgj.155.2022.02.23.05.44.32; Wed, 23 Feb 2022 05:44:48 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=nRRHW7f0; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240150AbiBWLug (ORCPT + 99 others); Wed, 23 Feb 2022 06:50:36 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52578 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240101AbiBWLua (ORCPT ); Wed, 23 Feb 2022 06:50:30 -0500 Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E30FA1AF0E; Wed, 23 Feb 2022 03:50:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1645617002; x=1677153002; h=date:from:to:cc:subject:message-id:reply-to:references: mime-version:in-reply-to; bh=h059MTs1GCpelszK6wn0Y7AYm7QzLTn2WIrQltgpYTY=; b=nRRHW7f03jxQyNdvkoUZh5f8aLof2IJfkPM3Gi1QJoyN2RDvBK/HO2Qj vVuK3qlBD5KhKq4NfK9bwruNXAohTAfX4eWqfz/ko1TTVt0bjobwaVx+K hDmJYTV2qO0aB5hZLX9nASNJO/0yHrnluqRNLmrUITUewsktX3ncG9XHG Koj2TPTskERaZTku8Yc9Ry31LMmUMpCaGNlUDqjHIL9+hEqEl9pJ9J3OS eplZZIdYBIciiHLrpYBmC4sYjpvRwcNHw8Fuv40a+4RTA3pFD0pDoeRF5 oWAhpvP41unbXK3PwmVqRUBldnphL2sXAvfmPh/ZvfzwlXixxdQqHJWwh w==; X-IronPort-AV: E=McAfee;i="6200,9189,10266"; a="239335570" X-IronPort-AV: E=Sophos;i="5.88,390,1635231600"; d="scan'208";a="239335570" Received: from orsmga007.jf.intel.com ([10.7.209.58]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Feb 2022 03:50:02 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.88,390,1635231600"; d="scan'208";a="532650031" Received: from chaop.bj.intel.com (HELO localhost) ([10.240.192.101]) by orsmga007.jf.intel.com with ESMTP; 23 Feb 2022 03:49:55 -0800 Date: Wed, 23 Feb 2022 19:49:35 +0800 From: Chao Peng To: Andy Lutomirski Cc: kvm list , Linux Kernel Mailing List , linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, qemu-devel@nongnu.org, Linux API , Paolo Bonzini , Jonathan Corbet , Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Thomas Gleixner , Ingo Molnar , Borislav Petkov , the arch/x86 maintainers , "H. Peter Anvin" , Hugh Dickins , Jeff Layton , "J . Bruce Fields" , Andrew Morton , Yu Zhang , "Kirill A. Shutemov" , "Nakajima, Jun" , Dave Hansen , Andi Kleen , David Hildenbrand , Steven Price Subject: Re: [PATCH v4 01/12] mm/shmem: Introduce F_SEAL_INACCESSIBLE Message-ID: <20220223114935.GA53733@chaop.bj.intel.com> Reply-To: Chao Peng References: <20220118132121.31388-1-chao.p.peng@linux.intel.com> <20220118132121.31388-2-chao.p.peng@linux.intel.com> <619547ad-de96-1be9-036b-a7b4e99b09a6@kernel.org> <20220217130631.GB32679@chaop.bj.intel.com> <2ca78dcb-61d9-4c9d-baa9-955b6f4298bb@www.fastmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <2ca78dcb-61d9-4c9d-baa9-955b6f4298bb@www.fastmail.com> User-Agent: Mutt/1.9.4 (2018-02-28) X-Spam-Status: No, score=-4.3 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Feb 17, 2022 at 11:09:35AM -0800, Andy Lutomirski wrote: > On Thu, Feb 17, 2022, at 5:06 AM, Chao Peng wrote: > > On Fri, Feb 11, 2022 at 03:33:35PM -0800, Andy Lutomirski wrote: > >> On 1/18/22 05:21, Chao Peng wrote: > >> > From: "Kirill A. Shutemov" > >> > > >> > Introduce a new seal F_SEAL_INACCESSIBLE indicating the content of > >> > the file is inaccessible from userspace through ordinary MMU access > >> > (e.g., read/write/mmap). However, the file content can be accessed > >> > via a different mechanism (e.g. KVM MMU) indirectly. > >> > > >> > It provides semantics required for KVM guest private memory support > >> > that a file descriptor with this seal set is going to be used as the > >> > source of guest memory in confidential computing environments such > >> > as Intel TDX/AMD SEV but may not be accessible from host userspace. > >> > > >> > At this time only shmem implements this seal. > >> > > >> > >> I don't dislike this *that* much, but I do dislike this. F_SEAL_INACCESSIBLE > >> essentially transmutes a memfd into a different type of object. While this > >> can apparently be done successfully and without races (as in this code), > >> it's at least awkward. I think that either creating a special inaccessible > >> memfd should be a single operation that create the correct type of object or > >> there should be a clear justification for why it's a two-step process. > > > > Now one justification maybe from Stever's comment to patch-00: for ARM > > usage it can be used with creating a normal memfd, (partially)populate > > it with initial guest memory content (e.g. firmware), and then > > F_SEAL_INACCESSIBLE it just before the first time lunch of the guest in > > KVM (definitely the current code needs to be changed to support that). > > Except we don't allow F_SEAL_INACCESSIBLE on a non-empty file, right? So this won't work. Hmm, right, if we set F_SEAL_INACCESSIBLE on a non-empty file, we will need to make sure access to existing mmap-ed area should be prevented, but that is hard. > > In any case, the whole confidential VM initialization story is a bit buddy. From the earlier emails, it sounds like ARM expects the host to fill in guest memory and measure it. From my recollection of Intel's scheme (which may well be wrong, and I could easily be confusing it with SGX), TDX instead measures what is essentially a transcript of the series of operations that initializes the VM. These are fundamentally not the same thing even if they accomplish the same end goal. For TDX, we unavoidably need an operation (ioctl or similar) that initializes things according to the VM's instructions, and ARM ought to be able to use roughly the same mechanism. Yes, TDX requires a ioctl. Steven may comment on the ARM part. Chao > > Also, if we ever get fancy and teach the page allocator about memory with reduced directmap permissions, it may well be more efficient for userspace to shove data into a memfd via ioctl than it is to mmap it and write the data.