Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp38058796rwd; Wed, 12 Jul 2023 02:20:45 -0700 (PDT) X-Google-Smtp-Source: APBJJlH0CyINANzgCUAdVdUSbmzZTEcw8XMU3iFa5hvQ16FRfpALdmoynaDCSW66m6DPrgzwaoet X-Received: by 2002:a05:6a00:cce:b0:682:537f:2cb8 with SMTP id b14-20020a056a000cce00b00682537f2cb8mr23453176pfv.26.1689153644968; Wed, 12 Jul 2023 02:20:44 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1689153644; cv=none; d=google.com; s=arc-20160816; b=itb5rfu1HGjPrZA+XuS+vcoPZBj5otZo2PSLpifHSMWjKAC7Q2SJ67ELldGRTB8u1C 27XZr5KF8VD37R6xDj6Re0B+GPmGkpwgmrwSjkuu5s8LSDH5cEmHAPB/hm5BZ9jn2u7b AtO4WMAdQp3gbdLwBQr2znIbdN6GazyldSv9s4dw1a0p5Mb+fGqDV+73ENksH7e28XIE jHQ4+McE7xnCXXxbgrOkpK87FSJ5D0kE4ZMzSvIaWO+RAtNTfO+iUfTlqG+XExKuo2zp eCGNk5M10JU1BqENHWdCpsDgYCu2xCQsJwvmaM71/gdpauceDTXrnW1UXDVD16u02UzQ fhmw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=8YdbxqTQtoZ15qrCm7C/YCyF879tt32i+cYkq87ET9Y=; fh=mD4vfYYQJrktNFoQGJHbEf84NvrQtaeBiBLRMbQuZG8=; b=V/uBUnLFGdxB2WP/MHjlx7k3pifeRDqev5zOS9oWYtZgdPCaLYNSg7vZidginAt167 +aLssTqhIqe5IE6htlJ64x0WpGGfaKWNGiIFHr4G4izMX18Oquiz6O4cxBeX/dWsSYtx 0hTeX4HhiVNUQb98lZbmlpCjyo+Zph5g7jYIotwZyxYXaWTlZuiWzIj9XtWKfJjUcR1x GbY84XYgonI1PuVyJYAExTKENKRZytwTb/F2Xs1a4CmSlLxLOWHNe4E19BL5UHB899ES BV7wuJVErpvvyzdtqmXwWShCgNqaedvhAI84eNK6quJbZHv5wKMPoo3E1xFynTZe+9mL eHWA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id 200-20020a6217d1000000b0063b356e36fesi2875751pfx.372.2023.07.12.02.20.33; Wed, 12 Jul 2023 02:20:44 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233253AbjGLJGU (ORCPT + 99 others); Wed, 12 Jul 2023 05:06:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52588 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233122AbjGLJFm (ORCPT ); Wed, 12 Jul 2023 05:05:42 -0400 Received: from outbound-smtp42.blacknight.com (outbound-smtp42.blacknight.com [46.22.139.226]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8FBA510C2 for ; Wed, 12 Jul 2023 02:05:29 -0700 (PDT) Received: from mail.blacknight.com (pemlinmail03.blacknight.ie [81.17.254.16]) by outbound-smtp42.blacknight.com (Postfix) with ESMTPS id F41711B15 for ; Wed, 12 Jul 2023 10:05:27 +0100 (IST) Received: (qmail 12686 invoked from network); 12 Jul 2023 09:05:27 -0000 Received: from unknown (HELO techsingularity.net) (mgorman@techsingularity.net@[84.203.21.103]) by 81.17.254.9 with ESMTPSA (AES256-SHA encrypted, authenticated); 12 Jul 2023 09:05:27 -0000 Date: Wed, 12 Jul 2023 10:05:26 +0100 From: Mel Gorman To: Michal Hocko Cc: Huang Ying , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Arjan Van De Ven , Andrew Morton , Vlastimil Babka , David Hildenbrand , Johannes Weiner , Dave Hansen , Pavel Tatashin , Matthew Wilcox Subject: Re: [RFC 2/2] mm: alloc/free depth based PCP high auto-tuning Message-ID: <20230712090526.thk2l7sbdcdsllfi@techsingularity.net> References: <20230710065325.290366-1-ying.huang@intel.com> <20230710065325.290366-3-ying.huang@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jul 11, 2023 at 01:19:46PM +0200, Michal Hocko wrote: > On Mon 10-07-23 14:53:25, Huang Ying wrote: > > To auto-tune PCP high for each CPU automatically, an > > allocation/freeing depth based PCP high auto-tuning algorithm is > > implemented in this patch. > > > > The basic idea behind the algorithm is to detect the repetitive > > allocation and freeing pattern with short enough period (about 1 > > second). The period needs to be short to respond to allocation and > > freeing pattern changes quickly and control the memory wasted by > > unnecessary caching. > > 1s is an ethernity from the allocation POV. Is a time based sampling > really a good choice? I would have expected a natural allocation/freeing > feedback mechanism. I.e. double the batch size when the batch is > consumed and it requires to be refilled and shrink it under memory > pressure (GFP_NOWAIT allocation fails) or when the surplus grows too > high over batch (e.g. twice as much). Have you considered something as > simple as that? > Quite honestly I am not sure time based approach is a good choice > because memory consumptions tends to be quite bulky (e.g. application > starts or workload transitions based on requests). > I tend to agree. Tuning based on the recent allocation pattern without frees would make more sense and also be symmetric with how free_factor works. I suspect that time-based may be heavily orientated around the will-it-scale benchmark. While I only glanced at this, a few things jumped out 1. Time-based heuristics are not ideal. congestion_wait() and friends was an obvious case where time-based heuristics fell apart even before the event it waited on was removed. For congestion, it happened to work for slow storage for a while but that was about it. For allocation stream detection, it has a similar problem. If a process is allocating heavily, then fine, if it's in bursts of less than a second more than one second apart then it will not adapt. While I do not think it is explicitly mentioned anywhere, my understanding was that heuristics like this within mm/ should be driven by explicit events as much as possible and not time. 2. If time was to be used, it would be cheaper to have the simpliest possible state tracking in the fast paths and decay any resizing of the PCP within the vmstat updates (reuse pcp->expire except it applies to local pcps). Even this is less than ideal as the PCP may be too large for short periods of time but it may also act as a backstop for worst-case behaviour 3. free_factor is an existing mechanism for detecting recent patterns and adapting the PCP sizes. The allocation side should be symmetric and the events that should drive it are "refills" on the alloc side and "drains" on the free side. Initially it might be easier to have a single parameter that scales batch and high up to a limit 4. The amount of state tracked seems excessive and increases the size of the per-cpu structure by more than 1 cache line. That in itself may not be a problem but the state is tracked on every page alloc/free that goes through the fast path and it's relatively complex to track. That is a constant penalty in fast paths that may not may not be relevant to the workload and only sustained bursty allocation streams may offset the cost. 5. Memory pressure and reclaim activity does not appear to be accounted for and it's not clear if pcp->high is bounded or if it's possible for a single PCP to hide a large number of pages from other CPUs sharing the same node. The max size of the PCP should probably be explicitly clamped. -- Mel Gorman SUSE Labs