Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp38060143rwd; Wed, 12 Jul 2023 02:22:18 -0700 (PDT) X-Google-Smtp-Source: APBJJlF300Vg712GsJzxpLZVbuFVEJ+VNLW57phjT8q9LfO0fSRh5xFCX3SBLelgQXfeNqffuRqL X-Received: by 2002:a05:6a20:2584:b0:132:c7de:7e6c with SMTP id k4-20020a056a20258400b00132c7de7e6cmr875880pzd.15.1689153738631; Wed, 12 Jul 2023 02:22:18 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1689153738; cv=none; d=google.com; s=arc-20160816; b=SSd4QbyNlX35TaHEvOFSlq1ZySHcF6H19dYz2VVjXlN+AE+Lht14jMGoAXr68VhATh jv5GZmOQNoCzcj57MBu3TR1++qU8QokAARqvNSpVMFFrIJUNSmUiuu9cWfMTHvMDEFTP r0aHIKepKSWmpJHE6hnleNwUWHvk8RATS5x+m4bDzOPAfjG884jozt0dUelyHLX2Q82i HFirXPHaQnEwR7KzxyNT2LZPnZUgn3AI57P4s069GHKD02l180GHlRYCN7Puabck3On5 xlH1eT/dOclLI+c61aV/nD3ouhVJmZxPVg1PUDWTEEo1LoJa8mbVnPkYAjOHyLGZcVOe YLnQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=vI9yvNrSHfYU4paf9/mQdvt6O/OotNrMm4HcTqJdzMg=; fh=1/UUJ7VFgh3haS+1D2u8/yoMspdDUx5GjhXfGKODrLE=; b=mvVRcZYhn9yrzvl0u9aRNRkrslxK2L5TJpO+F9VgDoU9uypPqKlRXfp4xJb8fCHt5W SDwL7qvb5ob8HCWwOWs4PIrle0VcyLjc21IdwlCqHE3fdjaOdCm55bpWm/C9TaPzKv0G fwthZoDJiCn9pQK3t+dJ927w3mRwrQWYDTNlo0K8Awq3ucW5BmVF6Fenrgyb6bGN0/MA 7RDlZoytkf4CelBHlIsxYMWT9sep5ADeIpx/sEJi1j9IDVv/s6So5ezIyqxLxS1AmhPa 5REFXwwr+Y2FjUoIwxN8+3wJEFxw783vukOZnGa4JtyEKlY7teMN6DCcQ3UuwS8kSFbe dIGw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id t14-20020a170902e84e00b001b7f9517794si2992304plg.384.2023.07.12.02.22.06; Wed, 12 Jul 2023 02:22:18 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232130AbjGLJTF (ORCPT + 99 others); Wed, 12 Jul 2023 05:19:05 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35390 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232583AbjGLJTA (ORCPT ); Wed, 12 Jul 2023 05:19:00 -0400 Received: from outbound-smtp19.blacknight.com (outbound-smtp19.blacknight.com [46.22.139.246]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 45F4F1986 for ; Wed, 12 Jul 2023 02:18:49 -0700 (PDT) Received: from mail.blacknight.com (pemlinmail01.blacknight.ie [81.17.254.10]) by outbound-smtp19.blacknight.com (Postfix) with ESMTPS id BA66D1C3872 for ; Wed, 12 Jul 2023 10:18:47 +0100 (IST) Received: (qmail 30399 invoked from network); 12 Jul 2023 09:18:47 -0000 Received: from unknown (HELO techsingularity.net) (mgorman@techsingularity.net@[84.203.21.103]) by 81.17.254.9 with ESMTPSA (AES256-SHA encrypted, authenticated); 12 Jul 2023 09:18:47 -0000 Date: Wed, 12 Jul 2023 10:18:45 +0100 From: Mel Gorman To: "Kirill A. Shutemov" Cc: Borislav Petkov , Andy Lutomirski , Dave Hansen , Sean Christopherson , Andrew Morton , Joerg Roedel , Ard Biesheuvel , Andi Kleen , Kuppuswamy Sathyanarayanan , David Rientjes , Vlastimil Babka , Tom Lendacky , Thomas Gleixner , Peter Zijlstra , Paolo Bonzini , Ingo Molnar , Dario Faggioli , Mike Rapoport , David Hildenbrand , marcelo.cerri@canonical.com, tim.gardner@canonical.com, khalid.elmously@canonical.com, philip.cox@canonical.com, aarcange@redhat.com, peterx@redhat.com, x86@kernel.org, linux-mm@kvack.org, linux-coco@lists.linux.dev, linux-efi@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCHv14 5/9] efi: Add unaccepted memory support Message-ID: <20230712091845.hahda3xgvegv5hgf@techsingularity.net> References: <20230606142637.5171-1-kirill.shutemov@linux.intel.com> <20230606142637.5171-6-kirill.shutemov@linux.intel.com> <20230703132518.3ukqyolnes47i5r3@techsingularity.net> <20230704143740.bgimyg3bqsgxbm47@box.shutemov.name> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <20230704143740.bgimyg3bqsgxbm47@box.shutemov.name> X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_LOW, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jul 04, 2023 at 05:37:40PM +0300, Kirill A. Shutemov wrote: > On Mon, Jul 03, 2023 at 02:25:18PM +0100, Mel Gorman wrote: > > On Tue, Jun 06, 2023 at 05:26:33PM +0300, Kirill A. Shutemov wrote: > > > efi_config_parse_tables() reserves memory that holds unaccepted memory > > > configuration table so it won't be reused by page allocator. > > > > > > Core-mm requires few helpers to support unaccepted memory: > > > > > > - accept_memory() checks the range of addresses against the bitmap and > > > accept memory if needed. > > > > > > - range_contains_unaccepted_memory() checks if anything within the > > > range requires acceptance. > > > > > > Architectural code has to provide efi_get_unaccepted_table() that > > > returns pointer to the unaccepted memory configuration table. > > > > > > arch_accept_memory() handles arch-specific part of memory acceptance. > > > > > > Signed-off-by: Kirill A. Shutemov > > > Reviewed-by: Ard Biesheuvel > > > Reviewed-by: Tom Lendacky > > > > By and large, this looks ok from the page allocator perspective as the > > checks for unaccepted are mostly after watermark checks. However, if you > > look in the initial fast path, you'll see this > > > > /* > > * Forbid the first pass from falling back to types that fragment > > * memory until all local zones are considered. > > */ > > alloc_flags |= alloc_flags_nofragment(ac.preferred_zoneref->zone, gfp); > > > > While checking watermarks should be fine from a functional perspective and > > the fast paths are unaffected, there is a risk of premature fragmentation > > until all memory has been accepted. Meeting watermarks does not necessarily > > mean that fragmentation is avoided as pageblocks can get mixed while still > > meeting watermarks. > > Could you elaborate on this scenario? > > Current code checks the watermark, if it is met, try rmqueue(). > > If rmqueue() fails anyway, try to accept more pages and retry the zone if > it is successful. > > I'm not sure how we can get to the 'if (no_fallback) {' case with any > unaccepted memory in the allowed zones. > Lets take an extreme example and assume that the low watermark is lower than 2MB (one pageblock). Just before the watermark is reached (free count between 1MB and 2MB), it is unlikely that all free pages are within pageblocks of the same migratetype (e.g. MIGRATE_MOVABLE). If there is an allocation near the watermark of a different type (e.g. MIGRATE_UNMOVABLE) then the page allocation could fallback to a different pageblock and now it is mixed. It's a condition that is only obvious if you are explicitly checking for it via tracepoints. This can happen in the normal case, but unaccepted memory makes it worse because the "pageblock mixing" could have been avoided if the "no_fallback" case accepted at least one new pageblock instead of mixing pageblocks. That is an extreme example but the same logic applies when the free count is at or near MIGRATE_TYPES*pageblock_nr_pages as it is not guaranteed that the pageblocks with free pages are a migratetype that matches the allocation request. Hence, it may be more robust from a fragmentation perspective if ALLOC_NOFRAGMENT requests accept memory if it is available and retries before clearing ALLOC_NOFRAGMENT and mixing pageblocks before the watermarks are reached. > I see that there's preferred_zoneref and spread_dirty_pages cases, but > unaccepted memory seems change nothing for them. > preferred_zoneref is about premature zone exhaustion and spread_dirty_pages is about avoiding premature stalls on a node/zone due to an imbalance in the number of pages waiting for writeback to complete. There is an arguement to be made that they also should accept memory but it's less clear how much of a problem this is. Both are very obvious when they "fail" and likely are covered by the existing watermark checks. Premature pageblock mixing is more subtle as the final impact (root cause of a premature THP allocation failure) is harder to detect. -- Mel Gorman SUSE Labs