Received: by 2002:a05:6358:7058:b0:131:369:b2a3 with SMTP id 24csp2194994rwp; Fri, 14 Jul 2023 02:14:33 -0700 (PDT) X-Google-Smtp-Source: APBJJlEh1wrKbxt7uxgQfGdoaIQOekUOjuikM8GV9Hc4ozE05lfkIHlMdv65FX9kT4I89IvPrxdJ X-Received: by 2002:a17:903:44c:b0:1b8:6cab:db7f with SMTP id iw12-20020a170903044c00b001b86cabdb7fmr2929495plb.53.1689326072982; Fri, 14 Jul 2023 02:14:32 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1689326072; cv=none; d=google.com; s=arc-20160816; b=R+BgK0rUn3b5qqkwyLLHq0u8H9KHm7z72nquDulZzDaYUChCCSM/9Z2gvy/XM2s3GU /u7Nkg9O5dN8beJ+WjyREbe8HfbHXysyTpROM1h+Ct0wKjzeeAWdsUpjwbTBb+HTOPuq 2RuJN2zFLMlNOxHX8WU7vHJSfEQ5b+EDTR0xnVANRGGF71a8A+IwAJLj65lVJQ/D43AM ++GBbfkBm3fvHEpZJlrnk7wcOcxuGblB7tMKmFxtvkjr3lqAYoTzAZ0lWm/ZMqO/gxUt PTrm6A2aKIikIGQBzuMkRDlRkcfIAveLw2+lTuKs8Vs/nDvipfCtWB1mQ+LNqutGtNYl qx1A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=OD2Y8EYH4iQhl/uinOOqc2LyprUZwMqThWDOzf/DPgk=; fh=MGF0QK4m70NeD2EWyFsOA3+cYldgt32C7sGJiu65hug=; b=GWSYEXqKI+TCdLX0NqTC+SiU27ncD7vL+aY+vjcVb7teN7QoKVeRXH+eUT03TcMvo5 lj0KkpHnzbvQ3dd08FoavjMaFSBVVEJ0mzJ2rOEoPVc1Ut28o4dUey8XavGuiC0YPS1D PSr8plMOqDZCFb709E0IIuUMs1phZufGRujOsrcr6GzvpTkivGDsy6ooQSJPN6zHWNn6 6pFmnuqocoSiMuJH/XKphlYbbNTN5yp4HJMv6L4Rxxo9budWCqIHhbgINeDr77W8nfhh TQWfRqorLvxw0Iy7V1cmWHbFnu9nuz64AJM/i5PhJ/lXAddFgchVxmjGKbzt8GM7YfBW 3NuA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b=AQAUwUV1; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=suse.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id f4-20020a170902ce8400b001b8c6890612si6896128plg.594.2023.07.14.02.14.20; Fri, 14 Jul 2023 02:14:32 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b=AQAUwUV1; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=suse.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235742AbjGNI7g (ORCPT + 99 others); Fri, 14 Jul 2023 04:59:36 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34664 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235654AbjGNI7Z (ORCPT ); Fri, 14 Jul 2023 04:59:25 -0400 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2966C270B for ; Fri, 14 Jul 2023 01:59:22 -0700 (PDT) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id A89AD22100; Fri, 14 Jul 2023 08:59:20 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1689325160; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=OD2Y8EYH4iQhl/uinOOqc2LyprUZwMqThWDOzf/DPgk=; b=AQAUwUV19C/3BBTp78ZvdS/A9jzkyZrYU47ID+jQp5YfCXngJ9JYeQ9VHBZv/cgyMPwQtW hec+5BnJFMYm77jUoXeS7yjLZPYmwu+UVvDAikEbBkAtjbzo1dEQ20Yq5KLrvQ3Kuaxy0V 85vGxifQdJrBjYcDTSsrIrHqBn8Y6o4= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 8763B138F8; Fri, 14 Jul 2023 08:59:20 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id HwIVHmgOsWTeZQAAMHmgww (envelope-from ); Fri, 14 Jul 2023 08:59:20 +0000 Date: Fri, 14 Jul 2023 10:59:19 +0200 From: Michal Hocko To: "Huang, Ying" Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Arjan Van De Ven , Andrew Morton , Mel Gorman , Vlastimil Babka , David Hildenbrand , Johannes Weiner , Dave Hansen , Pavel Tatashin , Matthew Wilcox Subject: Re: [RFC 1/2] mm: add framework for PCP high auto-tuning Message-ID: References: <20230710065325.290366-1-ying.huang@intel.com> <20230710065325.290366-2-ying.huang@intel.com> <87edldefnt.fsf@yhuang6-desk2.ccr.corp.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <87edldefnt.fsf@yhuang6-desk2.ccr.corp.intel.com> X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed 12-07-23 15:45:58, Huang, Ying wrote: > Michal Hocko writes: > > > On Mon 10-07-23 14:53:24, Huang Ying wrote: > >> The page allocation performance requirements of different workloads > >> are usually different. So, we often need to tune PCP (per-CPU > >> pageset) high to optimize the workload page allocation performance. > >> Now, we have a system wide sysctl knob (percpu_pagelist_high_fraction) > >> to tune PCP high by hand. But, it's hard to find out the best value > >> by hand. And one global configuration may not work best for the > >> different workloads that run on the same system. One solution to > >> these issues is to tune PCP high of each CPU automatically. > >> > >> This patch adds the framework for PCP high auto-tuning. With it, > >> pcp->high will be changed automatically by tuning algorithm at > >> runtime. Its default value (pcp->high_def) is the original PCP high > >> value calculated based on low watermark pages or > >> percpu_pagelist_high_fraction sysctl knob. To avoid putting too many > >> pages in PCP, the original limit of percpu_pagelist_high_fraction > >> sysctl knob, MIN_PERCPU_PAGELIST_HIGH_FRACTION, is used to calculate > >> the max PCP high value (pcp->high_max). > > > > It would have been very helpful to describe the basic entry points to > > the auto-tuning. AFAICS the central place of the tuning is tune_pcp_high > > which is called from the freeing path. Why? Is this really a good place > > considering this is a hot path? What about the allocation path? Isn't > > that a good spot to watch for the allocation demand? > > Yes. The main entry point to the auto-tuning is tune_pcp_high(). Which > is called from the freeing path because pcp->high is only used by page > freeing. It's possible to call it in allocation path instead. The > drawback is that the pcp->high may be updated a little later in some > situations. For example, if there are many page freeing but no page > allocation for quite long time. But I don't think this is a serious > problem. I consider it a serious flaw in the framework as it cannot cope with the transition of the allocation pattern (e.g. increasing the allocation pressure). > > Also this framework seems to be enabled by default. Is this really > > desirable? What about workloads tuning the pcp batch size manually? > > Shouldn't they override any auto-tuning? > > In the current implementation, the pcp->high will be tuned between > original pcp high (default or tuned manually) and the max pcp high (via > MIN_PERCPU_PAGELIST_HIGH_FRACTION). So the high value tuned manually is > respected at some degree. > > So you think that it's better to disable auto-tuning if PCP high is > tuned manually? Yes, I think this is a much safer option. For two reasons 1) it is less surprising to setups which know what they are doing by configuring the batching and 2) the auto-tuning needs a way to get disabled in case there are pathological patterns in behavior. -- Michal Hocko SUSE Labs