Received: by 10.213.65.68 with SMTP id h4csp2003726imn; Thu, 29 Mar 2018 15:31:01 -0700 (PDT) X-Google-Smtp-Source: AIpwx492WLdHSkRTZbdFOCww4HLpuTEbkoAvbLLSvXqmFtl5UZs87/hz/kXk16hhLbBRpFGV8hXT X-Received: by 10.99.106.68 with SMTP id f65mr6686101pgc.343.1522362661191; Thu, 29 Mar 2018 15:31:01 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1522362661; cv=none; d=google.com; s=arc-20160816; b=FwZcjojQfTn+X1AQcHfxE2P7e+IADczlr+etnP99OuCbJH8av6bScqTkLrRNeRgN/+ g+xFwUAfTsiNRUcFqRJ2YicfXKc4Zlk+KXJqfBAS+xDKXPzA/YHEbW7h4wFqLCBecOto 5GBcSPOGsi6PV8K8wK5N2kNa/xmbP1BzZZDSxgmWw+SBl4FHHZEG9KWKFaKop2XWQql+ Tr2D395eKReBJTU9Edm7Y4FPdAKNrSbEdiGpzWsyIoafAuq2G4oPL80p1Q2dJeEXArvG faEGuRgaIjg6pjcmWzFV1VgHstFecsePpmSF2lDy5h8baUKTMG6Ox+gtpcBzWp02us4v 64DQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:arc-authentication-results; bh=BCiOMvoE8CeV1G63bEmoOcn4gDzsRmU9zHXYXphy2vY=; b=gJcPy4jj8Rfkaz15F6zs+JkJKDX5ayefcBWwtBDoVJXqQJWbXJkm3UdDrmVGnH5Fa6 b9FCGSGZ3aD1pE9f2KhALvVaNMCpkxDI7vGy9NVaUvlHPhTBZP0lyHAp/mjxDexuX4Ny UpwH+N42ANvZCl3JhzrMgMMVDY8EKMGXGas3wnAGtSCeY2MMRaqzJ0YPGyvwOfYhwIDH gfdIKUXndVUZ+8gmOCS1khFRAD0WEV4zYWmMezYDpjzXsaogGNTSD/qu71hHHuTbYOt8 yDECctszrZAeFUAoDBG+7JCjiQEcTo+qBgmx331P/SmNVjmqWiTvSRpRN5AqLOcGM/kG PRtQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id s1-v6si6656993plr.109.2018.03.29.15.30.47; Thu, 29 Mar 2018 15:31:01 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752201AbeC2W3X (ORCPT + 99 others); Thu, 29 Mar 2018 18:29:23 -0400 Received: from mga05.intel.com ([192.55.52.43]:60712 "EHLO mga05.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751209AbeC2W3U (ORCPT ); Thu, 29 Mar 2018 18:29:20 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga105.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 29 Mar 2018 15:29:19 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.48,378,1517904000"; d="scan'208";a="28690090" Received: from vshiva-udesk.sc.intel.com ([10.3.52.52]) by fmsmga008.fm.intel.com with ESMTP; 29 Mar 2018 15:29:19 -0700 From: Vikas Shivappa To: vikas.shivappa@intel.com, tony.luck@intel.com, ravi.v.shankar@intel.com, fenghua.yu@intel.com, sai.praneeth.prakhya@intel.com, x86@kernel.org, tglx@linutronix.de, hpa@zytor.com Cc: linux-kernel@vger.kernel.org, ak@linux.intel.com, vikas.shivappa@linux.intel.com Subject: [PATCH 1/6] x86/intel_rdt/mba_sc: Add documentation for MBA software controller Date: Thu, 29 Mar 2018 15:26:11 -0700 Message-Id: <1522362376-3505-2-git-send-email-vikas.shivappa@linux.intel.com> X-Mailer: git-send-email 1.9.1 In-Reply-To: <1522362376-3505-1-git-send-email-vikas.shivappa@linux.intel.com> References: <1522362376-3505-1-git-send-email-vikas.shivappa@linux.intel.com> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Add documentation about usage which includes the "schemata" format and use case for MBA software controller. Signed-off-by: Vikas Shivappa --- Documentation/x86/intel_rdt_ui.txt | 63 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 63 insertions(+) diff --git a/Documentation/x86/intel_rdt_ui.txt b/Documentation/x86/intel_rdt_ui.txt index 71c3098..3b9634e 100644 --- a/Documentation/x86/intel_rdt_ui.txt +++ b/Documentation/x86/intel_rdt_ui.txt @@ -315,6 +315,60 @@ Memory b/w domain is L3 cache. MB:=bandwidth0;=bandwidth1;... +Memory bandwidth(b/w) in MegaBytes +---------------------------------- + +Memory bandwidth is a core specific mechanism which means that when the +Memory b/w percentage is specified in the schemata per package it +actually is applied on a per core basis via IA32_MBA_THRTL_MSR +interface. This may lead to confusion in scenarios below: + +1. User may not see increase in actual b/w when percentage values are + increased: + +This can occur when aggregate L2 external b/w is more than L3 external +b/w. Consider an SKL SKU with 24 cores on a package and where L2 +external b/w is 10GBps (hence aggregate L2 external b/w is 240GBps) and +L3 external b/w is 100GBps. Now a workload with '20 threads, having 50% +b/w, each consuming 5GBps' consumes the max L3 b/w of 100GBps although +the percentage value specified is only 50% << 100%. Hence increasing +the b/w percentage will not yeild any more b/w. This is because +although the L2 external b/w still has capacity, the L3 external b/w +is fully used. Also note that this would be dependent on number of +cores the benchmark is run on. + +2. Same b/w percentage may mean different actual b/w depending on # of + threads: + +For the same SKU in #1, a 'single thread, with 10% b/w' and '4 thread, +with 10% b/w' can consume upto 10GBps and 40GBps although they have same +percentage b/w of 10%. This is simply because as threads start using +more cores in an rdtgroup, the actual b/w may increase or vary although +user specified b/w percentage is same. + +In order to mitigate this and make the interface more user friendly, we +can let the user specify the max bandwidth per rdtgroup in bytes(or mega +bytes). The kernel underneath would use a software feedback mechanism or +a "Software Controller" which reads the actual b/w using MBM counters +and adjust the memowy bandwidth percentages to ensure the "actual b/w +< user b/w". + +The legacy behaviour is default and user can switch to the "MBA software +controller" mode using a mount option 'mba_MB'. + +To use the feature mount the file system using mba_MB option: + +# mount -t resctrl resctrl [-o cdp[,cdpl2][mba_MB]] /sys/fs/resctrl + +The schemata format is below: + +Memory b/w Allocation in Megabytes +---------------------------------- + +Memory b/w domain is L3 cache. + + MB:=bw_MB0;=bw_MB1;... + Reading/writing the schemata file --------------------------------- Reading the schemata file will show the state of all resources @@ -358,6 +412,15 @@ allocations can overlap or not. The allocations specifies the maximum b/w that the group may be able to use and the system admin can configure the b/w accordingly. +If the MBA is specified in MB(megabytes) then user can enter the max b/w in MB +rather than the percentage values. + +# echo "L3:0=3;1=c\nMB:0=1024;1=500" > /sys/fs/resctrl/p0/schemata +# echo "L3:0=3;1=3\nMB:0=1024;1=500" > /sys/fs/resctrl/p1/schemata + +In the above example the tasks in "p1" and "p0" on socket 0 would use a max b/w +of 1024MB where as on socket 1 they would use 500MB. + Example 2 --------- Again two sockets, but this time with a more realistic 20-bit mask. -- 1.9.1