Received: by 2002:a5d:9c59:0:0:0:0:0 with SMTP id 25csp2282215iof; Wed, 8 Jun 2022 01:20:43 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwsWD2kleNBuvDVoTeIE39BVkMaRg69Fs/7uPqKN3EsFQLAaz6lBQH4G9lxp0ymKlCo46X6 X-Received: by 2002:a17:902:e888:b0:163:f3e5:b379 with SMTP id w8-20020a170902e88800b00163f3e5b379mr33288728plg.62.1654676443212; Wed, 08 Jun 2022 01:20:43 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1654676443; cv=none; d=google.com; s=arc-20160816; b=B08u/Ul+hECqT1H0AQOAWOyNPUs4aDftzhFm9vbT2NUJ8+geCqr6sLQKtkY7BXU9cT zj3pb/gA9oJFhs7W3Awp++nRJs+3qKN+4iee5Hih5ihlgDdF6EMyU/2NBBXArxeRBFL0 oMG3PHwFZcqoVUJEsLi6bZ3oW1uFxFZwSdvog+BveW9ht+7Z0w7pAwT4wOrQBtSrdjoS cTrJskCp6sI1Ljrm6cx1fycgW81jzXtDMBZzpbn2w41mH47MpjMMEdshEh/MZr89KaTU 93YSzKTaO4eJtsjTdAk4ktDIPsMZ63PyTWXoji/UBU9p+CSR+ydF54v0ns9S0fV1cdpq GfCw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:date:cc:to:from:subject :message-id:dkim-signature; bh=aKkB/cbuFOQeC7hEgpQmt0dqAc46UALEMYQatEmiXZk=; b=XZm+s9Jr/dH4pPIBCpGwQV29ngguRrQGfyBWtp4TkJXkHKpwvU6S1Fhq8+Y9hoImup 1NrWDw+w2fQFR5mDrKzuG9wljLdsUgW1ls5pjGe/tD5/RqevfWkPNrwCkPYU0cVqRjm9 YQWs1vhvLhIdefTpoMCu8Vfs0Uto3GNpwDKSBID09K5S8KdyOY8sfcx0zKG2gMIwlaI9 d9iW6eGxJ0Q0Ym7NgA/xaSDzGdaJ4/LKJVJjcDVPWr2W8CbriHMX6rXntD0m2tHgksaH Uc12mt6swZLbZc43SWjs0lU/szCz9Gc8vQNNoJkBqUDcujMH/0zA9JdyqrbbsyZod2mh rexQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=As1jG0MZ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [2620:137:e000::1:18]) by mx.google.com with ESMTPS id g9-20020a170902934900b0015d17ba5b65si25113645plp.22.2022.06.08.01.20.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 08 Jun 2022 01:20:43 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) client-ip=2620:137:e000::1:18; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=As1jG0MZ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 00A191A8830; Wed, 8 Jun 2022 00:50:38 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236280AbiFHGDs (ORCPT + 99 others); Wed, 8 Jun 2022 02:03:48 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58482 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1347127AbiFHF5p (ORCPT ); Wed, 8 Jun 2022 01:57:45 -0400 Received: from mga06.intel.com (mga06b.intel.com [134.134.136.31]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 46A331476BF for ; Tue, 7 Jun 2022 21:20:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1654662019; x=1686198019; h=message-id:subject:from:to:cc:date:in-reply-to: references:mime-version:content-transfer-encoding; bh=oxqzxeK1ewo+4Ulb0MeKVIGm1F9MiXZOy7Gc3aAjZXc=; b=As1jG0MZEyE/yzxRWO+9F3bkaTBk4BZOwaySfvTTS44zIJ/SOUbds9JS jua4mZhoxnHM2Xfm2u8wTRc9FhA7/9mVq5wiAXq5rad03Th+psqVEThEQ oR727DelpiixJ5Zfa17ZPTiwm+qkVAvmxhJFa213CCh7UDZEjXjMlNZtl DaA+Jajhd4AUCwchYMBn3+2RjavunKv7d4ZYxd2a75Xx4rDIGmCm1a/eq gDk/wCdgI6ouu386LDBQEa1KIBMkotcSplnEPXKJ1iXlmZJGmE8bNzcsw vqai1s8OW4rmT3NUa/ctMb1uxxlMumX26gD2mymLA2NLAMqzg6EWKPLPk g==; X-IronPort-AV: E=McAfee;i="6400,9594,10371"; a="338518835" X-IronPort-AV: E=Sophos;i="5.91,285,1647327600"; d="scan'208";a="338518835" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Jun 2022 21:20:03 -0700 X-IronPort-AV: E=Sophos;i="5.91,285,1647327600"; d="scan'208";a="636540115" Received: from wantingz-mobl.ccr.corp.intel.com ([10.254.214.193]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Jun 2022 21:19:55 -0700 Message-ID: Subject: Re: [PATCH] mm: mempolicy: N:M interleave policy for tiered memory nodes From: Ying Huang To: Johannes Weiner , linux-mm@kvack.org Cc: Hao Wang , Abhishek Dhanotia , Dave Hansen , Yang Shi , Tim Chen , Davidlohr Bueso , Adam Manzanares , linux-kernel@vger.kernel.org, kernel-team@fb.com, Hasan Al Maruf , Wei Xu , "Aneesh Kumar K.V" , Yang Shi Date: Wed, 08 Jun 2022 12:19:52 +0800 In-Reply-To: <20220607171949.85796-1-hannes@cmpxchg.org> References: <20220607171949.85796-1-hannes@cmpxchg.org> Content-Type: text/plain; charset="UTF-8" User-Agent: Evolution 3.38.3-1 MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-3.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RDNS_NONE,SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 2022-06-07 at 13:19 -0400, Johannes Weiner wrote: > From: Hasan Al Maruf > > Existing interleave policy spreads out pages evenly across a set of > specified nodes, i.e. 1:1 interleave. Upcoming tiered memory systems > have CPU-less memory nodes with different peak bandwidth and > latency-bandwidth characteristics. In such systems, we will want to > use the additional bandwidth provided by lowtier memory for > bandwidth-intensive applications. However, the default 1:1 interleave > can lead to suboptimal bandwidth distribution. > > Introduce an N:M interleave policy, where N pages allocated to the > top-tier nodes are followed by M pages allocated to lowtier nodes. > This provides the capability to steer the fraction of memory traffic > that goes to toptier vs. lowtier nodes. For example, 4:1 interleave > leads to an 80%/20% traffic breakdown between toptier and lowtier. > > The ratios are configured through a new sysctl: > > vm.numa_tier_interleave = toptier lowtier > > We have run experiments on bandwidth-intensive production services on > CXL-based tiered memory systems, where lowtier CXL memory has, when > compared to the toptier memory directly connected to the CPU: > > - ~half of the peak bandwidth > - ~80ns higher idle latency > - steeper latency vs. bandwidth curve > > Results show that regular interleaving leads to a ~40% performance > regression over baseline; 5:1 interleaving shows an ~8% improvement > over baseline. We have found the optimal distribution changes based on > hardware characteristics: slower CXL memory will shift the optimal > breakdown from 5:1 to (e.g.) 8:1. > > The sysctl only applies to processes and vmas with an "interleave" > policy and has no bearing on contexts using prefer or bind policies. > > It defaults to a setting of "1 1", which represents even interleaving, > and so is backward compatible with existing setups. > > Signed-off-by: Hasan Al Maruf > Signed-off-by: Hao Wang > Signed-off-by: Johannes Weiner In general, I think the use case is valid. But we are changing memory tiering now, including - make memory tiering explict - support more than 2 tiers - expose memory tiering via sysfs Details can be found int the following threads, https://lore.kernel.org/lkml/CAAPL-u9Wv+nH1VOZTj=9p9S70Y3Qz3+63EkqncRDdHfubsrjfw@mail.gmail.com/ https://lore.kernel.org/lkml/20220603134237.131362-1-aneesh.kumar@linux.ibm.com/ With these changes, we may need to revise your implementation. For example, put interleave knobs in memory tier sysfs interface, support more than 2 tiers, etc. Best Regards, Huang, Ying [snip]