Received: by 2002:a05:6358:bb9e:b0:b9:5105:a5b4 with SMTP id df30csp4462385rwb; Tue, 6 Sep 2022 07:54:17 -0700 (PDT) X-Google-Smtp-Source: AA6agR7k/OqtcjpDYYHV+GC8P98z2NNb80BeNQYAbqUUeZqwbNPqvXHZXXUCoDeqZHCntu2u+xOB X-Received: by 2002:a17:902:db11:b0:175:4cdc:f32 with SMTP id m17-20020a170902db1100b001754cdc0f32mr30065864plx.58.1662476057504; Tue, 06 Sep 2022 07:54:17 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1662476057; cv=none; d=google.com; s=arc-20160816; b=ndnRN0SkVVsjlF3HTHDOQcdr5KTub5YRVdK5hilqyq6R9WhXkZtXA/KX7VsrkYTokJ mCvgp3dZFQtpiaFwv+noF/4ywlcOD7dxu4OVEzkaxRYhDyaWVPxnmum2pk6arO7pI9YQ lav+gJQNdynQM6lAc025tSiGXE97zA8AptX96Zp+wiZryy2coYB6f/NWn1lNgYValQxv w73PZYuWYPCyqdQqMVW2kApSMEnp150/di2yq+mM6Ngh+6GtjD7z66U05lojtJHdlHa+ 5wHyMCk8JQXw1weS5AZLuixwHqyLXA790dtL8r9XlqrcB7AC1oI3+mt1/xPv1EDqLeCN 6OyQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature :dkim-signature; bh=w88jYG1SC1kfMGHujv3S+yCLAncS2pWi2X4JA71CFvA=; b=IUdL94ylo9SS5cB3GBnnZXIIssVxRQpJlHEpI3B2eEyVpUJINvQUjYiyxTTfhVLHi9 +UiYVEp7pG312jHdmIyliTt4I8Nj5k61O8RG3bXhlFCj/S7gkYWT5Lt4wjILcbrURP1L gZS+/jHqLGZ9xbeBCXFXlvv8BDIh5Dt38wj9/gBq57InYcxgmLn+PeuaF2tpEu+EXRpo 5Qek6KukqG9wWf13hUw4S/uGL+ro0qWQ/h3Ts2b8SU6mNPUDt+Bg8G9wO5VRqecOjaD6 zG2jYPS0GaAMocbDab7NEQJLQb/olscf8PCZb2dNL0iTPWae1SF29UEupwhYv8+R/4ab Vm7A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@suse.de header.s=susede2_rsa header.b="s/v3dkLm"; dkim=neutral (no key) header.i=@suse.de header.b=LP4P4l4O; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=suse.de Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id d13-20020a170902e14d00b00176d89d640asi1050000pla.440.2022.09.06.07.54.06; Tue, 06 Sep 2022 07:54:17 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@suse.de header.s=susede2_rsa header.b="s/v3dkLm"; dkim=neutral (no key) header.i=@suse.de header.b=LP4P4l4O; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=suse.de Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239880AbiIFM1f (ORCPT + 99 others); Tue, 6 Sep 2022 08:27:35 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46282 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239813AbiIFM1Q (ORCPT ); Tue, 6 Sep 2022 08:27:16 -0400 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4CCEA7FFAA for ; Tue, 6 Sep 2022 05:23:12 -0700 (PDT) Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out2.suse.de (Postfix) with ESMTP id 4A2E41F9EC; Tue, 6 Sep 2022 12:22:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1662466953; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=w88jYG1SC1kfMGHujv3S+yCLAncS2pWi2X4JA71CFvA=; b=s/v3dkLmpeqdj8gQ1oinBEv5gZjP5azJBEJvEJZFqeERhSdP4JmFPOchIaW17o5YUx8y4y BzO2Gi79ulwqQPJL3/alXByvVffAAGGQke6AWq3kVWFyv1rN2k0dJjgdk7V6e/FxlahL9b Jf/SPP6gV53eQTxDxr+LWoJkYuIylRk= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1662466953; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=w88jYG1SC1kfMGHujv3S+yCLAncS2pWi2X4JA71CFvA=; b=LP4P4l4OMcApUJu9CsT6veq/r7BaW10KA+gXi3FDWGpCOCyTY2xc46TOZVLCAJEPRlZtg8 Il2GzM5ks7URVXBg== Received: from suse.de (mgorman.tcp.ovpn2.nue.suse.de [10.163.32.246]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id 74B0C2C141; Tue, 6 Sep 2022 12:22:30 +0000 (UTC) Date: Tue, 6 Sep 2022 13:22:26 +0100 From: Mel Gorman To: mawupeng Cc: akpm@linux-foundation.org, david@redhat.com, ying.huang@intel.com, hannes@cmpxchg.org, corbet@lwn.net, mcgrof@kernel.org, keescook@chromium.org, yzaikin@google.com, songmuchun@bytedance.com, mike.kravetz@oracle.com, osalvador@suse.de, surenb@google.com, rppt@kernel.org, charante@codeaurora.org, jsavitz@redhat.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH -next v3 1/2] mm: Cap zone movable's min wmark to small value Message-ID: <20220906122226.ro7coxxiatvctyth@suse.de> References: <20220905032858.1462927-1-mawupeng1@huawei.com> <20220905032858.1462927-2-mawupeng1@huawei.com> <20220905092619.2533krnnx632hswc@suse.de> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Sep 06, 2022 at 06:12:23PM +0800, mawupeng wrote: > > I think there is a misunderstanding why the higher zones have a watermark > > and why it might be large. > > > > It's not about a __GFP_HIGH or PF_MEMALLOC allocations because it's known > > that few of those allocations may be movable. It's because high memory > > allocations indirectly pin pages in lower zones. User-mapped memory allocated > > from ZONE_MOVABLE still needs page table pages allocated from a lower zone > > so there is a ratio between the size of ZONE_MOVABLE and lower zones > > that limits the total amount of memory that can be allocated. Similarly, > > file backed pages that may be allocated from ZONE_MOVABLE still requires > > pages from lower memory for the inode and other associated kernel > > objects that are allocated from lower zones. > > > > The intent behind the higher zones having a large min watermark is so > > that kswapd reclaims pages from there first to *potentially* release > > pages from lower memory. By capping pages_min for zone_movable, there is > > the potential for lower memory pressure to be higher and to reach a point > > where a ZONE_MOVABLE page cannot be allocated simply because there isn't > > enough low memory available. Once the lower zones are all unreclaimable > > (e.g. page table pages or the movable pages are not been reclaimed to free > > the associated kernel structures), the system goes OOM. > > This i do agree with you, lower zone is actually "more important" than the > higher one. > Very often yes. > But higher min watermark for zone movable will not work since no memory > allocation can use this reserve memory below min. Memory allocation > with specify watermark modifier(__GFP_ATOMIC ,__GFP_HIGH ...) can use this > in slowpath, however the standard movable memory allocation > (gfp flag: GFP_HIGHUSER_MOVABLE) does not contain this. > Then a more appropriate solution may be to alter how the gap between min and low is calculated. That gap determines when kswapd is active but allocations are still allowed. > Second, lowmem_reserve_ratio is used to "reserve" memory for lower zone. > And the second patch introduce per zone watermark_scale_factor to boost > normal/movable zone's watermark which can trigger early kswapd for zone > movable. > The problem with the tunable is that this patch introduces a potentially seriously problem that must then be corrected by a system administrator and it'll be non-obvious what the root of the problem is or the solution. For some users, they will only be able to determine is that OOM triggers when there is plenty of free memory or kswapd is consuming a lot more CPU than expected. They will not necessarily be able to determine that watermark_scale_factor is the solution. > > > > It's possible that there are safe adjustments that could be made that > > would detect when there is no choice except to reclaim zone reclaimable > > but it would be tricky and it's not this patch. This patch changelog states > > > > However zone movable will get its min share in > > __setup_per_zone_wmarks() which does not make any sense. > > > > It makes sense, higher zones allocations indirectly pin pages in lower > > zones and there is a bias in reclaim to free the higher zone pages first > > on the *possibility* that lower zone pages get indirectly released later. > > > > In our Test vm with 16G of mirrored memory(normal zone) and 256 of normal > momory(Movable zone), the min share for normal zone is too few since the > size of min watermark is calc by zone dma/normal while this will be shared > by zones(include zone movable) based on managed pages. > > Node 0, zone DMA > min 39 > low 743 > high 1447 > Node 0, zone Normal > min 180 > low 3372 > high 6564 > Node 1, zone Movable > min 3728 > low 69788 > high 135848 The gap between min and low is massive so either adjust how that gap is calculated or to avoid side-effects for other users, consider special casing the gap for ZONE_MOVABLE with a comment explaining why it is treated differently. To mitigate the risk further, it could be further special cased to only apply when there is a massive ratio between ALL_ZONES_EXCEPT_MOVABLE:ZONE_MOVABLE. Document in the changelog the potential downside of more lowmem potentially getting pinned by MOVABLE allocations leading to excessive kswapd activity or premature OOM. -- Mel Gorman SUSE Labs