Received: by 2002:a05:6358:e9c4:b0:b2:91dc:71ab with SMTP id hc4csp1841806rwb; Fri, 5 Aug 2022 08:39:21 -0700 (PDT) X-Google-Smtp-Source: AA6agR7ZJFtK58Ya9LwFeRAvauDvVc8pVWaHugXoQGlUnP/LgYV0b6nMr5QWFrx27q5mhJJGAHwa X-Received: by 2002:a17:902:76c8:b0:16c:1c13:9eea with SMTP id j8-20020a17090276c800b0016c1c139eeamr7298503plt.83.1659713961333; Fri, 05 Aug 2022 08:39:21 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1659713961; cv=none; d=google.com; s=arc-20160816; b=eBc/VRUbAptgwy/o+CkcxS91ZbF2apQdo63marQGYjG6IcUfKO31g/oXjBZTCYPGOR 1oVE9esorWZK6Lg0dX6vn7YrODxkqWCtTWp1CggRJWqofga/10YSfpaj0h4EqO9jTdgg /IFIJZW8ljHUip/xb3IuNm3pvTQcKkmW7Iqqa/XNspTaLuhdd5M50aHETQO0owuWlljD VNSTj7C52aKrxr6y/kCamR6S6UQRP+9/uYmrk5V4rFMdbF/EFg07F+WW8ZaffRKAKHt6 iuxsT79Dc5fbZ3EvmKUyrlB+Dk0ZTtkhWluYUCM1sltBaHAf/ii7P1cv1BYBk9lMlB/E wwuQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to :organization:from:references:cc:to:content-language:subject :user-agent:mime-version:date:message-id:dkim-signature; bh=nKJNisKgyypX3Bk0DMUA5psdWAWJAJ4t4udQLs3H4mQ=; b=cvesPHYwQSrxRy4XfqjLQdJGPow9WoZj7ChCS+VFRkqCFVqX8eaj6cWpaU1rTwJwkX SeG7nQSGB0hL+mQuETKBBil2OnMLMSBQMPD8RTVM+BNf1EBVia+tOxhFRuq8FMO7DVzZ 5bSotw/9Efa2TDXzEciX76l1f9MjFzKjfNTWnIL59Cr0fvwAW9hQAfFO8DMWNtu5hSG7 Ir2xm9V8SRyql62JP77kiOyUv/7/uVezEBqXYpDSi+o5Sfs0bqVXzOU0pzXZxZVn9x9B tOAtVwJ7pGZVrcaPfqBvGmnc+jvR+8hdSwn39Zhj46ib6RmhkPaKTdbFPYTGc2dh8obB RfQQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=OTpkd2e5; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id w22-20020a634756000000b00419cdf7b71bsi3232649pgk.312.2022.08.05.08.39.06; Fri, 05 Aug 2022 08:39:21 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=OTpkd2e5; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240907AbiHEOWy (ORCPT + 99 others); Fri, 5 Aug 2022 10:22:54 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52628 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230095AbiHEOWw (ORCPT ); Fri, 5 Aug 2022 10:22:52 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id DA86F5A8A3 for ; Fri, 5 Aug 2022 07:22:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1659709369; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=nKJNisKgyypX3Bk0DMUA5psdWAWJAJ4t4udQLs3H4mQ=; b=OTpkd2e5A8D9xI1neUEE+p88mf6mYW+5MG5iz08Z6BNySbzfe3RYK7N2JtgaSpszq64Sfr n3ewQSugAyLwBb1UvQTV88BkqseKT3exWy4BQ+4pBy/QHXrq8FH6zjr2CTBFuT9LqCDzVt a1PGm4k9/vjN7/HrWmvjH1fXt9dydQM= Received: from mail-wr1-f71.google.com (mail-wr1-f71.google.com [209.85.221.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-317-zXRUFUwdOnSBOCuokhjhRg-1; Fri, 05 Aug 2022 10:22:48 -0400 X-MC-Unique: zXRUFUwdOnSBOCuokhjhRg-1 Received: by mail-wr1-f71.google.com with SMTP id i15-20020adfa50f000000b002207a6887b3so525679wrb.15 for ; Fri, 05 Aug 2022 07:22:48 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:organization:from:references :cc:to:content-language:subject:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc; bh=nKJNisKgyypX3Bk0DMUA5psdWAWJAJ4t4udQLs3H4mQ=; b=7qTA4Yzv9YbCnTT2oy0RKUXdK0JH4a63OsO0SAsEd5BF169X1v6rYWoBzKpS/932vA szvDyxt5GVDOsEtp7UNLtV8d2gXe6sGdpU1q70jyjcixJB3WFAv86tgdNhqni8TZPsI1 a39KTEStUVlaBa2xjo6ob6yFuA9HDT3m92oLvlEJM270VOzyL/DaVANfdhW41ayCsG2X UigHdA+XMNU6rpTwiwb8LVErGPC9HZrH1Jn1L+XreolLRIZesB8m61jEzBd0fe9Dilzz Xpo1v2lQlqn9WeRsVNvj8g4l4//STSWxXJ/AN5xoCe5BK81scQKSunFGnx2Y7TfU8Rcf Xn1g== X-Gm-Message-State: ACgBeo3FTLid5V4R436oof9sNlLSrz71ohq5v0CpEMPZlLNxzf03L587 ILPv7OhpuzPp5yrBAcLUeo0N9AnjE/WKGcfIT7RXjrMWFiS7E5kTMV8KVPUXIowYkWVjMcfN2Dj 1k0/KLX4atywMYhVBp3L5yK64 X-Received: by 2002:a05:6000:1684:b0:220:6e60:768d with SMTP id y4-20020a056000168400b002206e60768dmr4252365wrd.121.1659709367590; Fri, 05 Aug 2022 07:22:47 -0700 (PDT) X-Received: by 2002:a05:6000:1684:b0:220:6e60:768d with SMTP id y4-20020a056000168400b002206e60768dmr4252330wrd.121.1659709367252; Fri, 05 Aug 2022 07:22:47 -0700 (PDT) Received: from ?IPV6:2003:cb:c706:fb00:f5c3:24b2:3d03:9d52? (p200300cbc706fb00f5c324b23d039d52.dip0.t-ipconnect.de. [2003:cb:c706:fb00:f5c3:24b2:3d03:9d52]) by smtp.gmail.com with ESMTPSA id j37-20020a05600c1c2500b003a2c67aa6c0sm5760334wms.23.2022.08.05.07.22.45 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 05 Aug 2022 07:22:46 -0700 (PDT) Message-ID: Date: Fri, 5 Aug 2022 16:22:45 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.11.0 Subject: Re: [PATCHv7 02/14] mm: Add support for unaccepted memory Content-Language: en-US To: Vlastimil Babka , "Kirill A. Shutemov" , Borislav Petkov , Andy Lutomirski , Sean Christopherson , Andrew Morton , Joerg Roedel , Ard Biesheuvel Cc: Andi Kleen , Kuppuswamy Sathyanarayanan , David Rientjes , Tom Lendacky , Thomas Gleixner , Peter Zijlstra , Paolo Bonzini , Ingo Molnar , Dario Faggioli , Dave Hansen , Mike Rapoport , marcelo.cerri@canonical.com, tim.gardner@canonical.com, khalid.elmously@canonical.com, philip.cox@canonical.com, x86@kernel.org, linux-mm@kvack.org, linux-coco@lists.linux.dev, linux-efi@vger.kernel.org, linux-kernel@vger.kernel.org, Mike Rapoport , Mel Gorman References: <20220614120231.48165-1-kirill.shutemov@linux.intel.com> <20220614120231.48165-3-kirill.shutemov@linux.intel.com> <8cf143e7-2b62-1a1e-de84-e3dcc6c027a4@suse.cz> From: David Hildenbrand Organization: Red Hat In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-3.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A, RCVD_IN_DNSWL_LOW,SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 05.08.22 15:38, Vlastimil Babka wrote: > On 8/5/22 14:09, David Hildenbrand wrote: >> On 05.08.22 13:49, Vlastimil Babka wrote: >>> On 6/14/22 14:02, Kirill A. Shutemov wrote: >>>> UEFI Specification version 2.9 introduces the concept of memory >>>> acceptance. Some Virtual Machine platforms, such as Intel TDX or AMD >>>> SEV-SNP, require memory to be accepted before it can be used by the >>>> guest. Accepting happens via a protocol specific to the Virtual Machine >>>> platform. >>>> >>>> There are several ways kernel can deal with unaccepted memory: >>>> >>>> 1. Accept all the memory during the boot. It is easy to implement and >>>> it doesn't have runtime cost once the system is booted. The downside >>>> is very long boot time. >>>> >>>> Accept can be parallelized to multiple CPUs to keep it manageable >>>> (i.e. via DEFERRED_STRUCT_PAGE_INIT), but it tends to saturate >>>> memory bandwidth and does not scale beyond the point. >>>> >>>> 2. Accept a block of memory on the first use. It requires more >>>> infrastructure and changes in page allocator to make it work, but >>>> it provides good boot time. >>>> >>>> On-demand memory accept means latency spikes every time kernel steps >>>> onto a new memory block. The spikes will go away once workload data >>>> set size gets stabilized or all memory gets accepted. >>>> >>>> 3. Accept all memory in background. Introduce a thread (or multiple) >>>> that gets memory accepted proactively. It will minimize time the >>>> system experience latency spikes on memory allocation while keeping >>>> low boot time. >>>> >>>> This approach cannot function on its own. It is an extension of #2: >>>> background memory acceptance requires functional scheduler, but the >>>> page allocator may need to tap into unaccepted memory before that. >>>> >>>> The downside of the approach is that these threads also steal CPU >>>> cycles and memory bandwidth from the user's workload and may hurt >>>> user experience. >>>> >>>> Implement #2 for now. It is a reasonable default. Some workloads may >>>> want to use #1 or #3 and they can be implemented later based on user's >>>> demands. >>>> >>>> Support of unaccepted memory requires a few changes in core-mm code: >>>> >>>> - memblock has to accept memory on allocation; >>>> >>>> - page allocator has to accept memory on the first allocation of the >>>> page; >>>> >>>> Memblock change is trivial. >>>> >>>> The page allocator is modified to accept pages on the first allocation. >>>> The new page type (encoded in the _mapcount) -- PageUnaccepted() -- is >>>> used to indicate that the page requires acceptance. >>>> >>>> Architecture has to provide two helpers if it wants to support >>>> unaccepted memory: >>>> >>>> - accept_memory() makes a range of physical addresses accepted. >>>> >>>> - range_contains_unaccepted_memory() checks anything within the range >>>> of physical addresses requires acceptance. >>>> >>>> Signed-off-by: Kirill A. Shutemov >>>> Acked-by: Mike Rapoport # memblock >>>> Reviewed-by: David Hildenbrand >>> >>> Hmm I realize it's not ideal to raise this at v7, and maybe it was discussed >>> before, but it's really not great how this affects the core page allocator >>> paths. Wouldn't it be possible to only release pages to page allocator when >>> accepted, and otherwise use some new per-zone variables together with the >>> bitmap to track how much exactly is where to accept? Then it could be hooked >>> in get_page_from_freelist() similarly to CONFIG_DEFERRED_STRUCT_PAGE_INIT - >>> if we fail zone_watermark_fast() and there are unaccepted pages in the zone, >>> accept them and continue. With a static key to flip in case we eventually >>> accept everything. Because this is really similar scenario to the deferred >>> init and that one was solved in a way that adds minimal overhead. >> >> I kind of like just having the memory stats being correct (e.g., free >> memory) and acceptance being an internal detail to be triggered when >> allocating pages -- just like the arch_alloc_page() callback. > > Hm, good point about the stats. Could be tweaked perhaps so it appears > correct on the outside, but might be tricky. > >> I'm sure we could optimize for the !unaccepted memory via static keys >> also in this version with some checks at the right places if we find >> this to hurt performance? > > It would be great if we would at least somehow hit the necessary code only > when dealing with a >=pageblock size block. The bitmap approach and > accepting everything smaller uprofront actually seems rather compatible. Yet > in the current patch we e.g. check PageUnaccepted(buddy) on every buddy size > while merging. > > A list that sits besides the existing free_area, contains only >=pageblock > order sizes of unaccepted pages (no migratetype distinguished) and we tap > into it approximately before __rmqueue_fallback()? There would be some > trickery around releasing zone-lock for doing accept_memory(), but should be > manageable. > Just curious, do we have a microbenchmark that is able to reveal the impact of such code changes before we start worrying? -- Thanks, David / dhildenb