Received: by 2002:a05:6358:d09b:b0:dc:cd0c:909e with SMTP id jc27csp2625662rwb; Mon, 7 Nov 2022 15:56:36 -0800 (PST) X-Google-Smtp-Source: AMsMyM6f9p3qyjmM4t3Q+CB3XUw+WD1F/BuQLjPtvaqUjip3VSX0gTkGPbSZ4TGqqghVfY9/tMio X-Received: by 2002:a63:105e:0:b0:46e:9bac:1c3 with SMTP id 30-20020a63105e000000b0046e9bac01c3mr43749546pgq.388.1667865396378; Mon, 07 Nov 2022 15:56:36 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1667865396; cv=none; d=google.com; s=arc-20160816; b=l/h/Kq9BPGdBEe8aZnZR4M4vOo9mHvyTkO+fyrlwLymJNUzydkSY8SK68sEhuwPNEJ dj3d2Rlz43fANjhmxxdqjsGXAcpEyR06foA6ev3KY94aDtyji7sKacKUJ/tiJcyxb4xy 1I3cg1Fsi4aSawOiwVmGtJPIBo/bAFK+B2XvOVOR5wMchn5L91hpxi07aLxE7GIqPVMg 3fOtZhLASale8fNFBCRFx4txo+5mdzUdt4tfdCu1Dfd+jLFBp5pqVuo5CA4IsoTUlpLu 7WPp4JCSNsrGcteSM1s7JGaFCbdBa2t3jY71xfQNGCyjQyCLXBQ/5Fk5ThZnERo66RSX oWFg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id:dkim-signature; bh=f4LOWZg+CLDUynEyGHp+EpcgW4gK7RPNkaj8RO+dpUY=; b=vfTruRJ9CLUtKGqnwi2OrNFevL8rgHyyn4rJ/+zUyM3PCk3xfDNfm/API1f2MQwtmU fNp0lLtgJISIA46sulpBCizeJrklgyRCjU9R/IlU3m/BLWALfqNCCS7sUiJ9LdD0brzE 7+sHjXzLQX9BDt+2syA95hK8GlvM4oGE9CWtf/W/L6ADK1bWmDmGFZwpcpAGUZ5vF2o4 502IWgVTCqHMpMS7/TF40gcakRD6xqUC7UJ4iBAxj8/eB/rhk9p6s7HQPXsdqsh7hScE sw1E7Vg4xQmScxemqb7FQ33CFcRlmyFDIoo1ihFZWUDnFvqW52odA2d2zgoN1V3WeMoK ya+g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=Ad+Z6L+1; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id x63-20020a638642000000b004355fc90c39si11825113pgd.261.2022.11.07.15.56.24; Mon, 07 Nov 2022 15:56:36 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=Ad+Z6L+1; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232274AbiKGXao (ORCPT + 92 others); Mon, 7 Nov 2022 18:30:44 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54900 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232110AbiKGXal (ORCPT ); Mon, 7 Nov 2022 18:30:41 -0500 Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C9D7F1741C for ; Mon, 7 Nov 2022 15:30:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1667863840; x=1699399840; h=message-id:date:mime-version:subject:to:cc:references: from:in-reply-to:content-transfer-encoding; bh=vE5QctfufmPeyV4e3LhXXDTySHAfnG5CaA9bs3sDIwY=; b=Ad+Z6L+15KTkhl1CljMECXWor0wY5w7B3qYqpUCA1W+/B4XuL2ByzsZ7 /mblkay17T8xc5nneK2IhAmxqak+pY6/HNQI27XNwNe+6QMBlulZFwPcQ RAMGxqBw099hCaN3YXI6mbVLmbCdmcuIXT/+cuRKjul+rJ18vscNJWT8k TQUNS+PkqytfMHvG4DgGqQccimxhzm7er0v2/GlI22RJ2V1CthBNIhMRo NQjzzs5DHc7iyOrer4Bm2bCRc3ezbqH3AoOuiOfPqy5Cjf97uKYeVO+ur qFShfeXVBU4kU5RoxMzA3qxeEDV5eoSVqzQVQlsdhywkmDSJV8McDiFm1 A==; X-IronPort-AV: E=McAfee;i="6500,9779,10524"; a="337273941" X-IronPort-AV: E=Sophos;i="5.96,145,1665471600"; d="scan'208";a="337273941" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Nov 2022 15:30:40 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10524"; a="705072783" X-IronPort-AV: E=Sophos;i="5.96,145,1665471600"; d="scan'208";a="705072783" Received: from peggykes-mobl.amr.corp.intel.com (HELO [10.251.7.244]) ([10.251.7.244]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Nov 2022 15:30:38 -0800 Message-ID: <77b79116-951a-7ff9-c19b-73af2af98ce9@intel.com> Date: Mon, 7 Nov 2022 15:30:36 -0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.2.2 Subject: Re: [PATCH 2/2] x86/tdx: Do not allow #VE due to EPT violation on the private memory Content-Language: en-US To: Erdem Aktas Cc: "Nakajima, Jun" , Guorui Yu , kirill.shutemov@linux.intel.com, ak@linux.intel.com, bp@alien8.de, dan.j.williams@intel.com, david@redhat.com, elena.reshetova@intel.com, hpa@zytor.com, linux-kernel@vger.kernel.org, luto@kernel.org, mingo@redhat.com, peterz@infradead.org, sathyanarayanan.kuppuswamy@linux.intel.com, seanjc@google.com, tglx@linutronix.de, thomas.lendacky@amd.com, x86@kernel.org References: <20221028141220.29217-3-kirill.shutemov@linux.intel.com> <4bfcd256-b926-9b1c-601c-efcff0d16605@intel.com> From: Dave Hansen In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A, RCVD_IN_DNSWL_HI,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE, SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 11/7/22 14:53, Erdem Aktas wrote: > On Fri, Nov 4, 2022 at 3:50 PM Dave Hansen wrote: >> Could you please elaborate a bit on what you think the distinction is >> between: >> >> * Accept on first use >> and >> * Accept on allocation >> >> Surely, for the vast majority of memory, it's allocated and then used >> pretty quickly. As in, most allocations are __GFP_ZERO so they're >> allocated and "used" before they even leave the allocator. So, in >> practice, they're *VERY* close to equivalent. >> >> Where do you see them diverging? Why does it matter? > > For a VM with a very large memory size, let's say close to 800G of > memory, it might take a really long time to finish the initialization. > If all allocations are __GFP_ZERO, then I agree it would not matter > but -- I need to run some benchmarks to validate -- what I remember > was, that was not what we were observing. Let me run a few tests to > provide more input on this but meanwhile if you have already run some > benchmarks, that would be great. > > What I see in the code is that the "accept_page" function will zero > all the unaccepted pages even if the __GFP_ZERO flag is not set and if > __GFP_ZERO is set, we will again zero all those pages. I see a lot of > concerning comments like "Page acceptance can be very slow.". I'm not following you at all here. Yeah, page acceptance is very slow. But, the slowest part is the probably cache coherency dance that the TDX module has to do flushing and zeroing all the memory to initialize the new integrity metadata. Second to that is the cost of the TDCALL. Third is the cost of the #VE. Here's what Kirill is proposing, in some peudocode: alloc_page(order=0, __GFP_ZERO) { TD.accept(size=4M) { // TDX Module clflushes/zeroes 4M of memory } memset(4k); // leave 1023 accepted 4k pages in the allocator } To accept 4M of memory, you do one TDCALL. You do zero #VE's. Using the #VE handler, you do: alloc_page(order=0, __GFP_ZERO) { memset(4k) { -> #VE handler TD.accept(size=4k); // flush/zero 4k } // only 4k was accepted } ... Take 1023 more #VE's later on for each 4k page You do 1024 #VE's and 1024 TDCALLs. So, let's summarize. To do 4M worth of 4k pages, here's how the two approaches break down if __GFP_ZERO is in play: #VE Accept-in-allocator #VE's: 1024 0 TDCALLS: 1024 1 clflushes: 4k x 1024 4k x 1024 memset()s: 4k x 1024 4k x 1024 The *ONLY* downside of accept-at-allocate as implemented is that it does 4M at a time, so the TDCALL is long compared to a 4k one. But, this is a classing bandwidth versus latency compromise. In this case, we choose bandwidth. *Both* cases need to memset() the same amount of memory. Both cases only memset() 4k at a time. The *ONLY* way the #VE approach is better is if you allocate 4k and then never touch the rest of the 4M page. That might happen, maybe *ONE* time per zone. But the rest of the time, the amortization of the TDCALL cost is going to win. I'll be shocked if any benchmarking turns up another result.