Received: by 2002:a05:7412:f589:b0:e2:908c:2ebd with SMTP id eh9csp643008rdb; Tue, 31 Oct 2023 19:36:51 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEQkeBwvERLhzZDCbDf5JscMlIsu18jBlYmzuChiqHXRsukI/F5ETA4lvCUg693jK1zd2fm X-Received: by 2002:a05:6a20:1447:b0:17b:2b7e:923c with SMTP id a7-20020a056a20144700b0017b2b7e923cmr14148958pzi.16.1698806210680; Tue, 31 Oct 2023 19:36:50 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1698806210; cv=none; d=google.com; s=arc-20160816; b=rLMBshrPvErlrC3LTPlrld7w2hyQMYevzefZ7M7EaRH8ON+TCeoDcOV9tnPS/XkuUB oFR+KFJOoB8E6qOyZA6CHff6iRS95WdBy9hLBJI7bzYXxd1ZGakTmp26tw2dU2FccZGl Kl8+5OgZ1R2xuaVTID51upD7YTIze8R9RXuuebVHF7P//zYeEaD4QqpP9BjZvFCxVJTC 70JjBssuxX1Xk4cpRTZFd6uI6Vq9ulVctkiuz5PGAfuVN5kEXYF2OAgkDENGwnY0EPUd XVk4hcyzD6OhSw1Ffywxzl/wDJV+/XXilaJKc323zEm+TVz9Cz+rNpOb/tj5Q7Lvlnnk sQrQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:user-agent:message-id:date :references:in-reply-to:subject:cc:to:from:dkim-signature; bh=2C+KT8dCgSuT/otao5fAZTtBBjaavTdhmI2Co5WqJIU=; fh=YSyGne8pA144BYQZ2pounOU/MDxmInRIWniiqPyqVgo=; b=jVNOIta2YJ15eENW83j2zLbiuTwSXW3ukQB5mu1GcZeDeCiqObV7Hi9vKX7EMIsn1Z X7jv1PIK6CPfZbdxMxQ5/L7IOdJARbjT5ZO97PS4o15z2DpdRtrXdG8Xgn3/ccmi0lAu WuQCPgZZWk4OyZBeVlr45Iz3TaPYDVpk3avt/LczPMMGylPnDGkMAsz6UPb5CO0wYqat 18t6OpbBkwKLj0vIXDAXOIvLA+Eub0vJ327ucbViwuDyDltx6fWFHHDx6hhtcuw41Dzt y7F+HjfHrwQbJ1Of+oEhurzRByhOV1V2hBegFoVt0o+W/QOnJTDCpgt4SkqJiTQXWTVl G1eg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b="gQc/s7GK"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.36 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from pete.vger.email (pete.vger.email. [23.128.96.36]) by mx.google.com with ESMTPS id y2-20020a170902cac200b001bbc80a2a3asi1955100pld.299.2023.10.31.19.36.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 31 Oct 2023 19:36:50 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.36 as permitted sender) client-ip=23.128.96.36; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b="gQc/s7GK"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.36 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by pete.vger.email (Postfix) with ESMTP id 53D748027A69; Tue, 31 Oct 2023 19:36:44 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at pete.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1344977AbjKACgZ (ORCPT + 99 others); Tue, 31 Oct 2023 22:36:25 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46212 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1344958AbjKACgY (ORCPT ); Tue, 31 Oct 2023 22:36:24 -0400 Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.93]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F0E30102; Tue, 31 Oct 2023 19:36:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1698806179; x=1730342179; h=from:to:cc:subject:in-reply-to:references:date: message-id:mime-version; bh=CN4PDYT+KcfignHaXej0DYuB1qouUibKHlCbKBw8oZ4=; b=gQc/s7GKMisukvnLGgKJsbIqLUbiLhfBvIhbqK6PCNErbLExr4Kx9Dz3 DKBjnpMOXpP0g4LfXXSrMSLUrSjuj0lZBo66DNYYAFt2/00+rXV4jE8Ad 4PMkLcaPSYWNeCe0CEQ1iAdkl1CQNiK4Na9B/cOJRtT/FXIj1zX2YToXM rREPSvzQKeKMKz2ZeHKptfgnCsqp7orpNgAjsWHVmSfxzxm2ei0/oBdwv LwU46mePbmMNgqyR499pP9x2dgkpZx6JHZlFdO0WGdC96Qwflmh3YfQCE X0FuhlIjW0i15t2J6lFgUOtt29aBOMidx7ZsLk9mgpJZaFDDjKk7p8OMv g==; X-IronPort-AV: E=McAfee;i="6600,9927,10880"; a="385599662" X-IronPort-AV: E=Sophos;i="6.03,266,1694761200"; d="scan'208";a="385599662" Received: from orviesa002.jf.intel.com ([10.64.159.142]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 31 Oct 2023 19:36:18 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.03,266,1694761200"; d="scan'208";a="1954297" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by orviesa002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 31 Oct 2023 19:36:15 -0700 From: "Huang, Ying" To: Johannes Weiner Cc: Michal Hocko , Gregory Price , linux-kernel@vger.kernel.org, linux-cxl@vger.kernel.org, linux-mm@kvack.org, akpm@linux-foundation.org, aneesh.kumar@linux.ibm.com, weixugc@google.com, apopple@nvidia.com, tim.c.chen@intel.com, dave.hansen@intel.com, shy828301@gmail.com, gregkh@linuxfoundation.org, rafael@kernel.org, Gregory Price Subject: Re: [RFC PATCH v3 0/4] Node Weights and Weighted Interleave In-Reply-To: <20231031162216.GB3029315@cmpxchg.org> (Johannes Weiner's message of "Tue, 31 Oct 2023 12:22:16 -0400") References: <20231031003810.4532-1-gregory.price@memverge.com> <20231031152142.GA3029315@cmpxchg.org> <20231031162216.GB3029315@cmpxchg.org> Date: Wed, 01 Nov 2023 10:34:12 +0800 Message-ID: <87il6m6w2j.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii X-Spam-Status: No, score=-1.3 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE, URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on pete.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (pete.vger.email [0.0.0.0]); Tue, 31 Oct 2023 19:36:44 -0700 (PDT) Johannes Weiner writes: > On Tue, Oct 31, 2023 at 04:56:27PM +0100, Michal Hocko wrote: >> On Tue 31-10-23 11:21:42, Johannes Weiner wrote: >> > On Tue, Oct 31, 2023 at 10:53:41AM +0100, Michal Hocko wrote: >> > > On Mon 30-10-23 20:38:06, Gregory Price wrote: [snip] >> >> > This hopefully also explains why it's a global setting. The usecase is >> > different from conventional NUMA interleaving, which is used as a >> > locality measure: spread shared data evenly between compute >> > nodes. This one isn't about locality - the CXL tier doesn't have local >> > compute. Instead, the optimal spread is based on hardware parameters, >> > which is a global property rather than a per-workload one. >> >> Well, I am not convinced about that TBH. Sure it is probably a good fit >> for this specific CXL usecase but it just doesn't fit into many others I >> can think of - e.g. proportional use of those tiers based on the >> workload - you get what you pay for. >> >> Is there any specific reason for not having a new interleave interface >> which defines weights for the nodemask? Is this because the policy >> itself is very dynamic or is this more driven by simplicity of use? > > A downside of *requiring* weights to be paired with the mempolicy is > that it's then the application that would have to figure out the > weights dynamically, instead of having a static host configuration. A > policy of "I want to be spread for optimal bus bandwidth" translates > between different hardware configurations, but optimal weights will > vary depending on the type of machine a job runs on. > > That doesn't mean there couldn't be usecases for having weights as > policy as well in other scenarios, like you allude to above. It's just > so far such usecases haven't really materialized or spelled out > concretely. Maybe we just want both - a global default, and the > ability to override it locally. I think that this is a good idea. The system-wise configuration with reasonable default makes applications life much easier. If more control is needed, some kind of workload specific configuration can be added. And, instead of adding another memory policy, a cgroup-wise configuration may be easier to be used. The per-workload weight may need to be adjusted when we deploying different combination of workloads in the system. Another question is that should the weight be per-memory-tier or per-node? In this patchset, the weight is per-source-target-node combination. That is, the weight becomes a matrix instead of a vector. IIUC, this is used to control cross-socket memory access in addition to per-memory-type memory access. Do you think the added complexity is necessary? > Could you elaborate on the 'get what you pay for' usecase you > mentioned? -- Best Regards, Huang, Ying