Received: by 2002:a05:7412:f589:b0:e2:908c:2ebd with SMTP id eh9csp781180rdb; Wed, 1 Nov 2023 02:29:48 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFBtSV0DrsuX4kxOZ8Bdv+68aHzW5qEeaJDt/DknbYpzjJDeoS4gh3/H7Y4NhDMGj4iEIx1 X-Received: by 2002:a05:6871:8011:b0:1e1:e1f8:ea3e with SMTP id sk17-20020a056871801100b001e1e1f8ea3emr16104075oab.1.1698830988459; Wed, 01 Nov 2023 02:29:48 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1698830988; cv=pass; d=google.com; s=arc-20160816; b=e26M8Rw2Tg84jYgeKo2JYuU1Aq/3nH9sUU74RHFd2BdPWMFkFURBLbcoSTxpgGGygp KXOoqvZ5bYkW2wHTf59MmkobJFZkyGSgkwkouU6FDW0U7xbCGRk0ktKJbe+7O6egvLrV tRFqVhsNB5AsBrL2Um4EC5AK3KIiC59cLkeBe588ATR5T0vkawN2GMubFZMypL1a+RNT n/tm5+NZ1lgyW8NbvBc3e0r62gn0sawPaIBkOjbfYBey+4wh+bIx6N/Fp7AmTUVyLMEY 8mJWEVUJ6s6kw7XV63cr7bLGW/y855chWQouiQX4BFOe1F9oRAO/uZk15H6LOXnP4u9P 7Iqg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=1a718EGSuvKUfXkIabojU6llpcv7CWBa/dcOub5YHK0=; fh=uyuGtsY5R7JhghYPbpouJvwk2HAyZHud96u7XYYdefk=; b=UPC0X5IZdPmVCVqX3r7I24AqD6Ym5di0BAIfk36q4K2lFnndiMXAQuQc8VYyFx5AZ9 k1w3ZS868JurRpJkeelrfUIrUYrvNRTR7ZZ+VzWFyoAU4hatoXbk1Ykdzq0RfV7KtP7F sTq/uXFdeX9toRy5cBW67tRpm+dtOY6Tyfl6WBsuCuLHMO0vthP5qtdxEvPv1fZMc6h4 YqqJrHkL+so+T43J1Bq8euCPuNOAF+d3btHR9dNi0sd5YPqzdsJ26go+eGe0c/twJjDm r4Yvm0ZFLiY4QRfTiBwAIqEnriXad37w3Xx/YSW1M9jvv5tmqjiGkmiyLM8KYGxQUG3f n1ig== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@micron.com header.s=selector2 header.b=Jzs6oelG; arc=pass (i=1 spf=pass spfdomain=micron.com dmarc=pass fromdomain=micron.com); spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=micron.com Return-Path: Received: from snail.vger.email (snail.vger.email. [2620:137:e000::3:7]) by mx.google.com with ESMTPS id m21-20020a656a15000000b005a9fb6bf73esi2726679pgu.293.2023.11.01.02.29.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 01 Nov 2023 02:29:48 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) client-ip=2620:137:e000::3:7; Authentication-Results: mx.google.com; dkim=pass header.i=@micron.com header.s=selector2 header.b=Jzs6oelG; arc=pass (i=1 spf=pass spfdomain=micron.com dmarc=pass fromdomain=micron.com); spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=micron.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id 4AE72804C509; Wed, 1 Nov 2023 02:29:47 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231220AbjKAJ3l (ORCPT + 99 others); Wed, 1 Nov 2023 05:29:41 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42528 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230164AbjKAJ3k (ORCPT ); Wed, 1 Nov 2023 05:29:40 -0400 Received: from NAM11-BN8-obe.outbound.protection.outlook.com (mail-bn8nam11on2041.outbound.protection.outlook.com [40.107.236.41]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B7D3098; Wed, 1 Nov 2023 02:29:34 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=n0wuKQoItWuR5oUJJmLCu4Uyyr5yOXrK9cPqXzATZPWm8sPxKKAJi3T7hY4mh9m201NYcOFG+KFYy81yuOdTxWGlDUn+EOJW0/IYx9yKS+vYAVXBXDiRN/zO7azYz00TJPa0IBEbQ5rYGf8xMdLyA+46ZGuQlrkBVKozd58+ol+FSc3A3ydthuys9OwYHoy7gzhwuSBmesXdqhUNd33OCzwlfq0ToAGKlxNTqTYcbS/GDBW3Zhwz5WTpAbrkX3M4L1esU41TcwV+u6SX4oVPJSxvvhqIu3/fj/icQEYzrU8VKN82xvx+wtn3YmXJhkqPdTb8hD2S020d6PcEyVK6gQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=1a718EGSuvKUfXkIabojU6llpcv7CWBa/dcOub5YHK0=; b=MS4jXpBwSH+5uVxItj9tJbIBa5tHZm2b6PEmZdWELEPC3sKajLtwdhSrXcUqxrnRVINaeCQFdu5qRnDzhXif3d3gpW/zm1sHBlNA/c+D1SaQ3UDSJFFKMOKa2j5ZP3ube2evInjiKKpHwlyLaoIEpIWmAx/Jr4YrmPjVVWctX/0wseGOMMyhdvzR2OXdi0ou0lQ70o/i9MnFPMD7yFfgPtQVqV7bXogiFi2azrC/6fQRX+x9hFIGPa7z/Efpyj51mGhfkwirXkK68EXRo8f7LwfIbFJJYyfkua5/AZmMeO4u5bmY+Jt6RHjBKCVnjDcRVecvIwAE8g3IrxYF6bOpwA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 137.201.242.130) smtp.rcpttodomain=linux-foundation.org smtp.mailfrom=micron.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=micron.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=micron.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=1a718EGSuvKUfXkIabojU6llpcv7CWBa/dcOub5YHK0=; b=Jzs6oelGzQF0HMm+J22qyaA+wGS8nebQISa7CjyT7vLSSIp4xvb8y4z3syXN7I7vN2+W6huY6MnepAyiQ1AS5D6SkXRyZJkkAbAbVnX5lJ3XAepDw/Mydq5sp3E1H2H4BN6KYSxBQmn7q+CN6fgb5a5cVPSHYFFgnmuA8mkrMMQap0m8KzrlaxI2oYvaNuvyRUEajEFaBUOp5i1co4onYN3eqyimnH2AfoIhNylvxeD4wmoyZj8nKFmLkqQUGkbeVRWSmZ6HJ2jCzZVXiG5pPua0DvLRsfPdj8QHeJhDuerx9l44Ff8kBZJ1XPXJtTSodQ5RY53DCajJ78ovYDIYfA== Received: from PH0PR07CA0112.namprd07.prod.outlook.com (2603:10b6:510:4::27) by BY1PR08MB8574.namprd08.prod.outlook.com (2603:10b6:a03:527::16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6933.28; Wed, 1 Nov 2023 09:29:30 +0000 Received: from SN1PEPF000252A2.namprd05.prod.outlook.com (2603:10b6:510:4:cafe::eb) by PH0PR07CA0112.outlook.office365.com (2603:10b6:510:4::27) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6954.19 via Frontend Transport; Wed, 1 Nov 2023 09:29:29 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 137.201.242.130) smtp.mailfrom=micron.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=micron.com; Received-SPF: Pass (protection.outlook.com: domain of micron.com designates 137.201.242.130 as permitted sender) receiver=protection.outlook.com; client-ip=137.201.242.130; helo=mail.micron.com; pr=C Received: from mail.micron.com (137.201.242.130) by SN1PEPF000252A2.mail.protection.outlook.com (10.167.242.9) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6954.19 via Frontend Transport; Wed, 1 Nov 2023 09:29:29 +0000 Received: from BOW36EX19B.micron.com (137.201.85.154) by BOW17EX19A.micron.com (137.201.21.218) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1258.12; Wed, 1 Nov 2023 03:29:28 -0600 Received: from VENKATARAVI-LAP.micron.com (10.70.32.235) by RestrictedRelayBOW36EX19B.micron.com (137.201.85.154) with Microsoft SMTP Server id 15.2.1258.12 via Frontend Transport; Wed, 1 Nov 2023 03:29:24 -0600 From: Ravi Jonnalagadda To: CC: , , , , , , , , , , , , , , , Subject: Re: [RFC PATCH v3 0/4] Node Weights and Weighted Interleave Date: Wed, 1 Nov 2023 14:59:23 +0530 Message-ID: <20231101092923.283-1-ravis.opensrc@micron.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <87il6m6w2j.fsf@yhuang6-desk2.ccr.corp.intel.com> References: <87il6m6w2j.fsf@yhuang6-desk2.ccr.corp.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain X-MT-Whitelisted: matched X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: SN1PEPF000252A2:EE_|BY1PR08MB8574:EE_ X-MS-Office365-Filtering-Correlation-Id: 26538c5a-04fe-41b9-d4a3-08dbdabd0c97 X-EXT-ByPass: 1 X-MT-RULE-Whitelisted: Triggered X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: ZlpJAwXB4y+SoEfQZ7HQFpJK9OhxNX1UDyE9g9hEjdHmaPr78wHCu0ZIlr7rSksTGHGa9nuvDA/yKYWI6ZQ2JtCyRtJn/l9hJxup4lvU8//vPsKlvAxgav/4DMfUvIswj5AhfrrWbwqkaTwE+8YNtri0KpCxbZSnsxlwqOoIIa+mBt0lMH9CWYWozfvRJW6om4l/Q9xzuheEYO90XXabUuCfx3tmQd/5sFOtoMTVURWr8RCsifMyPVwc2MOn0xnbDv+S5VqHU2KpFQbKtsS7vuPJXvtYZ3/ZGSWpnFtys4A05qfLSbp83aK3Th4nDQsC0gCxGdQWvbtq5y1WiMdPtc8wtCtW/SkSlWRDvM+vRHCZgJNDY27p70AmDmWOaqZhcT7tTtAobai6Vc2SgwzsplrxcD+xbM9CTQ/cBbJu9RGmrcp/cjT8DgsHVS1PlkN3P4KJbyKjDMVIlqlkFcj26WVE7L2xKQrWojCezBAh0HrSkSUIlvwgGVjj52/S433XsiYMP54TBx1BRomEb7LqCtZe2gVJiacMGFtCeJ4Q8DZlXan6nEU+NN3FSOtt3VE/TT3EkgBlb5kv/wSicjFX7/RWOQsOKZwOA/otegtgIqXjq8gcbgVDXDMXpL3+yBT1BiSx8zEZqg9X/TOCcaGQdp480yLu0zAm3ejcLvt+ZIyJk+Kin0BRC8IwdaVdQA9lXZnoUtiQpDtxkN1UhYYG2xNM1DMEooevKzaCG5koc2iFZfp0uAjLNcSJ+Ol0n+GTWlM664/naILZwTmiEhXkxA== X-Forefront-Antispam-Report: CIP:137.201.242.130;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.micron.com;PTR:masquerade.micron.com;CAT:NONE;SFS:(13230031)(4636009)(396003)(39860400002)(376002)(346002)(136003)(230922051799003)(1800799009)(82310400011)(64100799003)(186009)(451199024)(40470700004)(46966006)(36840700001)(47076005)(36756003)(86362001)(82740400003)(1076003)(26005)(70206006)(40480700001)(41300700001)(478600001)(7696005)(8676002)(8936002)(4326008)(6862004)(2906002)(83380400001)(2616005)(40460700003)(426003)(70586007)(5660300002)(316002)(54906003)(356005)(7416002)(36860700001)(7636003)(336012)(16393002);DIR:OUT;SFP:1101; X-OriginatorOrg: micron.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 01 Nov 2023 09:29:29.6914 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 26538c5a-04fe-41b9-d4a3-08dbdabd0c97 X-MS-Exchange-CrossTenant-Id: f38a5ecd-2813-4862-b11b-ac1d563c806f X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f38a5ecd-2813-4862-b11b-ac1d563c806f;Ip=[137.201.242.130];Helo=[mail.micron.com] X-MS-Exchange-CrossTenant-AuthSource: SN1PEPF000252A2.namprd05.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: BY1PR08MB8574 X-Spam-Status: No, score=-1.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FORGED_SPF_HELO, RCVD_IN_DNSWL_BLOCKED,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_NONE, T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Wed, 01 Nov 2023 02:29:47 -0700 (PDT) >> On Tue, Oct 31, 2023 at 04:56:27PM +0100, Michal Hocko wrote:=0D >>> On Tue 31-10-23 11:21:42, Johannes Weiner wrote:=0D >>> > On Tue, Oct 31, 2023 at 10:53:41AM +0100, Michal Hocko wrote:=0D >>> > > On Mon 30-10-23 20:38:06, Gregory Price wrote:=0D >=0D >[snip]=0D >=0D >>>=0D >>> > This hopefully also explains why it's a global setting. The usecase i= s=0D >>> > different from conventional NUMA interleaving, which is used as a=0D >>> > locality measure: spread shared data evenly between compute=0D >>> > nodes. This one isn't about locality - the CXL tier doesn't have loca= l=0D >>> > compute. Instead, the optimal spread is based on hardware parameters,= =0D >>> > which is a global property rather than a per-workload one.=0D >>>=0D >>> Well, I am not convinced about that TBH. Sure it is probably a good fit= =0D >>> for this specific CXL usecase but it just doesn't fit into many others = I=0D >>> can think of - e.g. proportional use of those tiers based on the=0D >>> workload - you get what you pay for.=0D >>>=0D >>> Is there any specific reason for not having a new interleave interface= =0D >>> which defines weights for the nodemask? Is this because the policy=0D >>> itself is very dynamic or is this more driven by simplicity of use?=0D >>=0D >> A downside of *requiring* weights to be paired with the mempolicy is=0D >> that it's then the application that would have to figure out the=0D >> weights dynamically, instead of having a static host configuration. A=0D >> policy of "I want to be spread for optimal bus bandwidth" translates=0D >> between different hardware configurations, but optimal weights will=0D >> vary depending on the type of machine a job runs on.=0D >>=0D >> That doesn't mean there couldn't be usecases for having weights as=0D >> policy as well in other scenarios, like you allude to above. It's just=0D >> so far such usecases haven't really materialized or spelled out=0D >> concretely. Maybe we just want both - a global default, and the=0D >> ability to override it locally.=0D >=0D >I think that this is a good idea. The system-wise configuration with=0D >reasonable default makes applications life much easier. If more control=0D >is needed, some kind of workload specific configuration can be added.=0D =0D Glad that we are in agreement here. For bandwidth expansion use cases=0D that this interleave patchset is trying to cater to, most applications=0D would have to follow the "reasanable defaults" for weights.=0D The necessity for applications to choose different weights while=0D interleaving would probably be to do capacity expansion which the=0D default memory tiering implementation would anyway support and provide=0D better latency.=0D =0D >And, instead of adding another memory policy, a cgroup-wise=0D >configuration may be easier to be used. The per-workload weight may=0D >need to be adjusted when we deploying different combination of workloads=0D >in the system.=0D >=0D >Another question is that should the weight be per-memory-tier or=0D >per-node? In this patchset, the weight is per-source-target-node=0D >combination. That is, the weight becomes a matrix instead of a vector.=0D >IIUC, this is used to control cross-socket memory access in addition to=0D >per-memory-type memory access. Do you think the added complexity is=0D >necessary?=0D =0D Pros and Cons of Node based interleave:=0D Pros:=0D 1. Weights can be defined for devices with different bandwidth and latency= =0D characteristics individually irrespective of which tier they fall into.=0D 2. Defining the weight per-source-target-node would be necessary for multi= =0D socket systems where few devices may be closer to one socket rather than ot= her.=0D Cons:=0D 1. Weights need to be programmed for all the nodes which can be tedious for= =0D systems with lot of NUMA nodes.=0D =0D Pros and Cons of Memory Tier based interleave:=0D Pros:=0D 1. Programming weight per initiator would apply for all the nodes in the ti= er.=0D 2. Weights can be calculated considering the cumulative bandwidth of all=0D the nodes in the tier and need to be programmed once for all the nodes in a= =0D given tier.=0D 3. It may be useful in cases where numa nodes with similar latency and band= width=0D characteristics increase, possibly with pooling use cases.=0D Cons:=0D 1. If nodes with different bandwidth and latency characteristics are placed= =0D in same tier as seen in the current mainline kernel, it will be difficult t= o=0D apply a correct interleave weight policy.=0D 2. There will be a need for functionality to move nodes between different t= iers=0D or create new tiers to place such nodes for programming correct interleave = weights.=0D We are working on a patch to support it currently.=0D 3. For systems where each numa node is having different characteristics,=0D a single node might end up existing in different memory tier, which would b= e=0D equivalent to node based interleaving. On newer systems where all CXL memor= y=0D from different devices under a port are combined to form single numa node, = this=0D scenario might be applicable.=0D 4. Users may need to keep track of different memory tiers and what nodes ar= e present=0D in each tier for invoking interleave policy.=0D =0D >=0D >> Could you elaborate on the 'get what you pay for' usecase you=0D >> mentioned?=0D >=0D >--=0D >Best Regards,=0D >Huang, Ying=0D --=0D Best Regards,=0D Ravi Jonnalagadda=0D