Received: by 2002:a05:6359:322:b0:b3:69d0:12d8 with SMTP id ef34csp238300rwb; Wed, 10 Aug 2022 07:33:50 -0700 (PDT) X-Google-Smtp-Source: AA6agR7XHY2nz68pp+tKu+TXQ2GA0FZ4iZ34SUnwQTkQlUcEDC3F9tPSfbYTKAsZFDHXS0jWAFDo X-Received: by 2002:a17:90b:17c8:b0:1f5:4724:981f with SMTP id me8-20020a17090b17c800b001f54724981fmr4158083pjb.205.1660142030514; Wed, 10 Aug 2022 07:33:50 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1660142030; cv=none; d=google.com; s=arc-20160816; b=VBTajgxFyG+mwH/xTjXs72rk1RtvCrSaeFHcY4RLdKZxq0ck9sViwGICca8Y4WbDwg pHwDQU5l35IwV0+MLEwOFQFUq4OIsJgmPFcRR6gFiblcURRsPS9lF7YJRd2zTnuTwNON VlJVaFdiZY6dP2p75gwDl8BqwMZxSkLhMsm6YPI3S6847ZgyNSbmU3nPSC06FHwAa3qt u1JgBwIPPdc94+JdgwZBwjs1WbZF4DogcPrKlGwnKmC1CAGJCpQr0sioDPcla9dOsYFz pQ54mBX54ONrmoFzggzpM37eSqU1b/2YmB804nMnSd2DMdka43xb4qGIPGZiqb55NLMG nVjg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=yeyvkPcYaThn+fyjqVJVp5a4iay0VR+NtEC9+0CgV+s=; b=mIR4n2I6a24mfNb14i6h92pVyFCz/jWZkQVo6G5jZ79VUcPxsPe5eI2ptGuTryr1xJ SdH/5yHHE5S8ZK1owN4MG2PyokJlOm14xTDFEMVPviwmjXcrx67BGGd6onUOPwz4UD1b JtTj/5jGLsSTkOvzudPDvkCQ5YrgqOg98+LE2bf9roOLKnyV7+wddKMP5PubRtagQ0KX 9Zf1VmGQlFHK+zAjielHXpcR9RwkwX+nz6UaN9dao2y68f93GtLDxmmB1bjTJYCKGMMv wOM5YunUo1yQWDC7ETKW4epmFMRABLDFRSHMtqPqr9iKVgk+YMo26VQGw4ZaXQH4QcPf TVmA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id pc1-20020a17090b3b8100b001efa8c6d942si1040386pjb.189.2022.08.10.07.33.35; Wed, 10 Aug 2022 07:33:50 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232341AbiHJOXn (ORCPT + 99 others); Wed, 10 Aug 2022 10:23:43 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34206 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231961AbiHJOXi (ORCPT ); Wed, 10 Aug 2022 10:23:38 -0400 Received: from outbound-smtp27.blacknight.com (outbound-smtp27.blacknight.com [81.17.249.195]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 92EAD12755 for ; Wed, 10 Aug 2022 07:23:35 -0700 (PDT) Received: from mail.blacknight.com (pemlinmail04.blacknight.ie [81.17.254.17]) by outbound-smtp27.blacknight.com (Postfix) with ESMTPS id 12E43CAEA6 for ; Wed, 10 Aug 2022 15:23:33 +0100 (IST) Received: (qmail 10887 invoked from network); 10 Aug 2022 14:23:32 -0000 Received: from unknown (HELO techsingularity.net) (mgorman@techsingularity.net@[84.203.198.246]) by 81.17.254.9 with ESMTPSA (AES256-SHA encrypted, authenticated); 10 Aug 2022 14:23:32 -0000 Date: Wed, 10 Aug 2022 15:19:59 +0100 From: Mel Gorman To: Vlastimil Babka Cc: "Kirill A. Shutemov" , Borislav Petkov , Andy Lutomirski , Sean Christopherson , Andrew Morton , Joerg Roedel , Ard Biesheuvel , Andi Kleen , Kuppuswamy Sathyanarayanan , David Rientjes , Tom Lendacky , Thomas Gleixner , Peter Zijlstra , Paolo Bonzini , Ingo Molnar , Dario Faggioli , Dave Hansen , Mike Rapoport , David Hildenbrand , marcelo.cerri@canonical.com, tim.gardner@canonical.com, khalid.elmously@canonical.com, philip.cox@canonical.com, x86@kernel.org, linux-mm@kvack.org, linux-coco@lists.linux.dev, linux-efi@vger.kernel.org, linux-kernel@vger.kernel.org, Mike Rapoport Subject: Re: [PATCHv7 02/14] mm: Add support for unaccepted memory Message-ID: <20220810141959.ictqchz7josyd7pt@techsingularity.net> References: <20220614120231.48165-1-kirill.shutemov@linux.intel.com> <20220614120231.48165-3-kirill.shutemov@linux.intel.com> <8cf143e7-2b62-1a1e-de84-e3dcc6c027a4@suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <8cf143e7-2b62-1a1e-de84-e3dcc6c027a4@suse.cz> X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Aug 05, 2022 at 01:49:41PM +0200, Vlastimil Babka wrote: > On 6/14/22 14:02, Kirill A. Shutemov wrote: > > UEFI Specification version 2.9 introduces the concept of memory > > acceptance. Some Virtual Machine platforms, such as Intel TDX or AMD > > SEV-SNP, require memory to be accepted before it can be used by the > > guest. Accepting happens via a protocol specific to the Virtual Machine > > platform. > > > > There are several ways kernel can deal with unaccepted memory: > > > > 1. Accept all the memory during the boot. It is easy to implement and > > it doesn't have runtime cost once the system is booted. The downside > > is very long boot time. > > > > Accept can be parallelized to multiple CPUs to keep it manageable > > (i.e. via DEFERRED_STRUCT_PAGE_INIT), but it tends to saturate > > memory bandwidth and does not scale beyond the point. > > > > 2. Accept a block of memory on the first use. It requires more > > infrastructure and changes in page allocator to make it work, but > > it provides good boot time. > > > > On-demand memory accept means latency spikes every time kernel steps > > onto a new memory block. The spikes will go away once workload data > > set size gets stabilized or all memory gets accepted. > > > > 3. Accept all memory in background. Introduce a thread (or multiple) > > that gets memory accepted proactively. It will minimize time the > > system experience latency spikes on memory allocation while keeping > > low boot time. > > > > This approach cannot function on its own. It is an extension of #2: > > background memory acceptance requires functional scheduler, but the > > page allocator may need to tap into unaccepted memory before that. > > > > The downside of the approach is that these threads also steal CPU > > cycles and memory bandwidth from the user's workload and may hurt > > user experience. > > > > Implement #2 for now. It is a reasonable default. Some workloads may > > want to use #1 or #3 and they can be implemented later based on user's > > demands. > > > > Support of unaccepted memory requires a few changes in core-mm code: > > > > - memblock has to accept memory on allocation; > > > > - page allocator has to accept memory on the first allocation of the > > page; > > > > Memblock change is trivial. > > > > The page allocator is modified to accept pages on the first allocation. > > The new page type (encoded in the _mapcount) -- PageUnaccepted() -- is > > used to indicate that the page requires acceptance. > > > > Architecture has to provide two helpers if it wants to support > > unaccepted memory: > > > > - accept_memory() makes a range of physical addresses accepted. > > > > - range_contains_unaccepted_memory() checks anything within the range > > of physical addresses requires acceptance. > > > > Signed-off-by: Kirill A. Shutemov > > Acked-by: Mike Rapoport # memblock > > Reviewed-by: David Hildenbrand > > Hmm I realize it's not ideal to raise this at v7, and maybe it was discussed > before, but it's really not great how this affects the core page allocator > paths. Wouldn't it be possible to only release pages to page allocator when > accepted, and otherwise use some new per-zone variables together with the > bitmap to track how much exactly is where to accept? Then it could be hooked > in get_page_from_freelist() similarly to CONFIG_DEFERRED_STRUCT_PAGE_INIT - > if we fail zone_watermark_fast() and there are unaccepted pages in the zone, > accept them and continue. With a static key to flip in case we eventually > accept everything. Because this is really similar scenario to the deferred > init and that one was solved in a way that adds minimal overhead. > I think it might be more straight-forward to always accept pages in the size of the pageblock. Smaller ranges should matter because they have been accepted in deferred_free_range. In expand, if PageUnaccepted is set on a pageblock-sized page, take it off the list, drop the zone->lock leaving IRQs disabled, accept the memory and reacquire the lock to split the page into the required order. IRQs being left disabled is unfortunate but even if the acceptance is slow, it's presumably not so slow to cause major problems. This would would reduce and probably eliminate the need to do the assert check in accept_page. It might also simplify __free_one_page if it's known that a pageblock range of pages are either all accepted or unaccepted. Lastly, the default behaviour should probably be "accept all memory at boot" and use Kconfig to allow acceptable be deferred or overridden by command line. There are at least two reasons for this. Even though this is a virtual machine, there still may be latency sensitive applications running early in boot using pinned vcpu->pcpu and no memory overcommit. The unpredictable performance of the application early in boot may be unacceptable and unavoidable. It might take a long time but it could eventually generate bug reports about "unpredictable performance early in boot" that will be hard to track down unless accept_memory is observed using perf at the right time. Even when that does happen, there will need to be an option to turn it off if the unpredictable performance cannot be tolerated. Second, any benchmarking done early in boot is likely to be disrupted making the series a potential bisection magnet that masks a performance bug elsewhere in the merge window. -- Mel Gorman SUSE Labs