Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp1269040rwd; Tue, 16 May 2023 14:36:23 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ6wLB+xBTFuilbHEzFPQ1cEuVeBOHfOvdvGKBcETnHtn7+Ll8/QlyZqGx340ZW38OZ/hc93 X-Received: by 2002:a17:902:e852:b0:1a6:c595:d7c3 with SMTP id t18-20020a170902e85200b001a6c595d7c3mr47242311plg.22.1684272983183; Tue, 16 May 2023 14:36:23 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1684272983; cv=none; d=google.com; s=arc-20160816; b=EQ/s4duanEnI17VFF/RzIYXSpPti8LNerd0eQUvM8XH7wLvaruv+aXAG/T5lOMlw10 T2/jV2glDIlDSeFrqHUfCWKbtJipDlKcXzXz+t5zIlvIjclsvsZOrTNJxSJ6zMG2mYAD t/QbWDljb78tiFkHtG+O7JBRil2P3rnaN63scqGhsu33grMmkX1nlkJBEZg8JJdtFZ48 lfQmVkn1/UWZFZC9ZFe/lWhThZdFzH6NBTTvx2vXE4jueZhB2n3CGNfMVxwhEU/BRtzS vA1yb+XXkAMTzDQXhb/XcC5mVbwaJLW9dkhhKYhBtw9s+GI6e3175CvnJjCTjXDHX+Fz Npkw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:feedback-id :dkim-signature:dkim-signature; bh=aQIbBC1XH4gKwubmFlh9PRPXW8iWnwJsnr8qddLaW2A=; b=wrcZS88wapv4cgM8Fjxo1eH0fU19UIFvPfAFmJVW4AyxoZPOthWtEiM4gvjfAkIknf SD36YfslFaIXpckiUg66bmyPCVVhMr/2vsUQLR4C0LAdXrmQX6fDReQ+D82E3ViPD1e4 MArHcao6QhubPEkkFdudiSvGCnqwJ0UIxNNpfZ/8S1+XllS+GTPjv2fO1ABTBZNGVZML 3w4xNGhOPOIRnp73GyoLVddkXtJh6U6KUFfAMgjxy02KCP+Jb3nrvSLR6nzmWnm0UKky TB4Qhx9n/ZHTkCBU3bX2h4GCxzZBtGoBnwidlvoTgT2iZfeJ6jHKEV2s8ay5Ha5y8Ddo qbYg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@shutemov.name header.s=fm2 header.b=dToej6Cq; dkim=pass header.i=@messagingengine.com header.s=fm1 header.b=ZzXdJPHV; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id y17-20020a170902b49100b001a8102f5d7fsi17504032plr.504.2023.05.16.14.36.11; Tue, 16 May 2023 14:36:23 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@shutemov.name header.s=fm2 header.b=dToej6Cq; dkim=pass header.i=@messagingengine.com header.s=fm1 header.b=ZzXdJPHV; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230482AbjEPVdE (ORCPT + 99 others); Tue, 16 May 2023 17:33:04 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49272 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229595AbjEPVdD (ORCPT ); Tue, 16 May 2023 17:33:03 -0400 Received: from wout4-smtp.messagingengine.com (wout4-smtp.messagingengine.com [64.147.123.20]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BF6287A92; Tue, 16 May 2023 14:32:56 -0700 (PDT) Received: from compute2.internal (compute2.nyi.internal [10.202.2.46]) by mailout.west.internal (Postfix) with ESMTP id 834C932002F9; Tue, 16 May 2023 17:32:51 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute2.internal (MEProxy); Tue, 16 May 2023 17:32:54 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=shutemov.name; h=cc:cc:content-type:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:sender:subject:subject:to:to; s=fm2; t=1684272771; x= 1684359171; bh=aQIbBC1XH4gKwubmFlh9PRPXW8iWnwJsnr8qddLaW2A=; b=d Toej6CqMPorHZ/HiMHAHV0j0BoSLvt88M9EjjPPLQALhvOYPl8sJVDYkEkATMSQr Gnu+vgKt0GS03j6xSQgSkeU+Wzr3NZ4A2eLM365uF7NEoSQYzQ6OK74h4N9WNH2C eIpqbO3JGhVr2RpO3IyKd3o2HbiAjEoz7vWgc+dEy4E54NVqhmGjEldwmIL/hAAg Llf/LjSF4uLtbv3OIbKb/oGQ0KfoJ+LqFZLmZ2ZXNngMQO4btufKJfYWffr4uYaK y4L/znDLmVm7fJ3Tf2MCVfQoazVL2doz4AFfcgEXo8HDI2RYQyusBHm2q3FSeRrp UY+33OeQXATi+AeBIh43A== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-type:content-type:date:date :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to :message-id:mime-version:references:reply-to:sender:subject :subject:to:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender :x-sasl-enc; s=fm1; t=1684272771; x=1684359171; bh=aQIbBC1XH4gKw ubmFlh9PRPXW8iWnwJsnr8qddLaW2A=; b=ZzXdJPHVPx2iYYPj1GM9V7VuzcYN4 xtAHh+yAs5OOiP2Nx6sjK5vNxyK+sFwIzHWZqwmm7FaHHl7IX0h3kiF392+SD8VT igNgcEqjSRnGMq/kiUyLI/0GGRDmeEm38kL18VgLH5YvxXbGjh+FiRh9aS5zmL5a AEbuces9Hd4WUn7SLKtOp+WpWpp5qbHQbtFmXxoKNtcq0GC5jbcwBjc3Dzp81g0w EjWwn0m+w/vB2NiiZ6PcXphNQYMYairdzk7sJNVkVzVRiSFhyzWNnc1oejWcZ4wV zErtUZoKdJaFs6liPGe7LTyOZYWlEwy13GEDvXzzwkFz/pxQLKUPTfzbQ== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvhedrfeehledgudeivdcutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfgh necuuegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmd enucfjughrpeffhffvvefukfhfgggtuggjsehttddttddttddvnecuhfhrohhmpedfmfhi rhhilhhlucetrdcuufhhuhhtvghmohhvfdcuoehkihhrihhllhesshhhuhhtvghmohhvrd hnrghmvgeqnecuggftrfgrthhtvghrnhephfeigefhtdefhedtfedthefghedutddvueeh tedttdehjeeukeejgeeuiedvkedtnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrg hmpehmrghilhhfrhhomhepkhhirhhilhhlsehshhhuthgvmhhovhdrnhgrmhgv X-ME-Proxy: Feedback-ID: ie3994620:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Tue, 16 May 2023 17:32:49 -0400 (EDT) Received: by box.shutemov.name (Postfix, from userid 1000) id DF2D610C8C1; Wed, 17 May 2023 00:32:45 +0300 (+03) Date: Wed, 17 May 2023 00:32:45 +0300 From: "Kirill A. Shutemov" To: Tom Lendacky Cc: "Kirill A. Shutemov" , Borislav Petkov , Andy Lutomirski , Dave Hansen , Sean Christopherson , Andrew Morton , Joerg Roedel , Ard Biesheuvel , Andi Kleen , Kuppuswamy Sathyanarayanan , David Rientjes , Vlastimil Babka , Thomas Gleixner , Peter Zijlstra , Paolo Bonzini , Ingo Molnar , Dario Faggioli , Mike Rapoport , David Hildenbrand , Mel Gorman , marcelo.cerri@canonical.com, tim.gardner@canonical.com, khalid.elmously@canonical.com, philip.cox@canonical.com, aarcange@redhat.com, peterx@redhat.com, x86@kernel.org, linux-mm@kvack.org, linux-coco@lists.linux.dev, linux-efi@vger.kernel.org, linux-kernel@vger.kernel.org, Mike Rapoport Subject: Re: [PATCHv11 1/9] mm: Add support for unaccepted memory Message-ID: <20230516213245.oruzw2kinbfqcwwl@box.shutemov.name> References: <20230513220418.19357-1-kirill.shutemov@linux.intel.com> <20230513220418.19357-2-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-2.8 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_LOW,SPF_HELO_PASS, SPF_PASS,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, May 16, 2023 at 02:44:00PM -0500, Tom Lendacky wrote: > On 5/13/23 17:04, Kirill A. Shutemov wrote: > > UEFI Specification version 2.9 introduces the concept of memory > > acceptance. Some Virtual Machine platforms, such as Intel TDX or AMD > > SEV-SNP, require memory to be accepted before it can be used by the > > guest. Accepting happens via a protocol specific to the Virtual Machine > > platform. > > > > There are several ways kernel can deal with unaccepted memory: > > > > 1. Accept all the memory during the boot. It is easy to implement and > > it doesn't have runtime cost once the system is booted. The downside > > is very long boot time. > > > > Accept can be parallelized to multiple CPUs to keep it manageable > > (i.e. via DEFERRED_STRUCT_PAGE_INIT), but it tends to saturate > > memory bandwidth and does not scale beyond the point. > > > > 2. Accept a block of memory on the first use. It requires more > > infrastructure and changes in page allocator to make it work, but > > it provides good boot time. > > > > On-demand memory accept means latency spikes every time kernel steps > > onto a new memory block. The spikes will go away once workload data > > set size gets stabilized or all memory gets accepted. > > > > 3. Accept all memory in background. Introduce a thread (or multiple) > > that gets memory accepted proactively. It will minimize time the > > system experience latency spikes on memory allocation while keeping > > low boot time. > > > > This approach cannot function on its own. It is an extension of #2: > > background memory acceptance requires functional scheduler, but the > > page allocator may need to tap into unaccepted memory before that. > > > > The downside of the approach is that these threads also steal CPU > > cycles and memory bandwidth from the user's workload and may hurt > > user experience. > > > > The patch implements #1 and #2 for now. #2 is the default. Some > > workloads may want to use #1 with accept_memory=eager in kernel > > command line. #3 can be implemented later based on user's demands. > > > > Support of unaccepted memory requires a few changes in core-mm code: > > > > - memblock has to accept memory on allocation; > > > > - page allocator has to accept memory on the first allocation of the > > page; > > > > Memblock change is trivial. > > > > The page allocator is modified to accept pages. New memory gets accepted > > before putting pages on free lists. It is done lazily: only accept new > > pages when we run out of already accepted memory. The memory gets > > accepted until the high watermark is reached. > > > > EFI code will provide two helpers if the platform supports unaccepted > > memory: > > > > - accept_memory() makes a range of physical addresses accepted. > > > > - range_contains_unaccepted_memory() checks anything within the range > > of physical addresses requires acceptance. > > > > Signed-off-by: Kirill A. Shutemov > > Acked-by: Mike Rapoport # memblock > > Reviewed-by: Vlastimil Babka > > --- > > drivers/base/node.c | 7 ++ > > fs/proc/meminfo.c | 5 ++ > > include/linux/mm.h | 19 +++++ > > include/linux/mmzone.h | 8 ++ > > mm/internal.h | 1 + > > mm/memblock.c | 9 +++ > > mm/mm_init.c | 7 ++ > > mm/page_alloc.c | 173 +++++++++++++++++++++++++++++++++++++++++ > > mm/vmstat.c | 3 + > > 9 files changed, 232 insertions(+) > > > > > diff --git a/mm/internal.h b/mm/internal.h > > index 68410c6d97ac..b1db7ba5f57d 100644 > > --- a/mm/internal.h > > +++ b/mm/internal.h > > @@ -1099,4 +1099,5 @@ struct vma_prepare { > > struct vm_area_struct *remove; > > struct vm_area_struct *remove2; > > }; > > + > > Looks like an unintentional change. Yep, will fix. > > #endif /* __MM_INTERNAL_H */ > > diff --git a/mm/memblock.c b/mm/memblock.c > > index 3feafea06ab2..50b921119600 100644 > > --- a/mm/memblock.c > > +++ b/mm/memblock.c > > @@ -1436,6 +1436,15 @@ phys_addr_t __init memblock_alloc_range_nid(phys_addr_t size, > > */ > > kmemleak_alloc_phys(found, size, 0); > > + /* > > + * Some Virtual Machine platforms, such as Intel TDX or AMD SEV-SNP, > > + * require memory to be accepted before it can be used by the > > + * guest. > > + * > > + * Accept the memory of the allocated buffer. > > + */ > > + accept_memory(found, found + size); > > I'm not an mm or memblock expert, but do we need to worry about freed memory > from memblock_phys_free() being possibly doubly accepted? A double > acceptance will trigger a guest termination on SNP. There will be no double acceptance. accept_memory() will consult the bitmap before accepting any memory. For already accepted memory it is a nop. -- Kiryl Shutsemau / Kirill A. Shutemov