Received: by 2002:a05:6a10:f347:0:0:0:0 with SMTP id d7csp4574925pxu; Wed, 9 Dec 2020 23:13:28 -0800 (PST) X-Google-Smtp-Source: ABdhPJxSRXK9USqu1i2tsSu79ymrue1U9tp4G62jipseEVZLTjLkru297V3zoElkaJTS+8sj3m8o X-Received: by 2002:a17:906:d72:: with SMTP id s18mr5296239ejh.110.1607584408121; Wed, 09 Dec 2020 23:13:28 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1607584408; cv=none; d=google.com; s=arc-20160816; b=fKSjyvzj9xJflPOlbKdUIOpa+fGQk6xf5wx2+j5pldQ7pxlqfYu+5PZpwuaMd+zBNc uO2WluQB42HYhSn+rGmf9H3VT0aDoN/YMMV3462gg7WjELyPBu7P9ruLelePYmhnJgF2 4wLFdYj7ujxGFgMnvievF7NkO//1R1Mh4KOImP6J6cB6hMTMIBljvzH4wC9qItVnIY6t MwowPT1Is2vj8+s0onT2nUTKK+yJkBJLCIWU9TYNg6Uk5N8eTMXVpFrwjt0AVrDAwwLZ LTtAeGlsHaelkxg0bE9thjecWmH2moqHNoFWSuqJQUFC/AaZBt+YwpJK5nyJOPPlFkR0 ejDg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:references:in-reply-to:message-id:date:subject :cc:to:from:ironport-sdr:ironport-sdr; bh=K3fktOn5c8J9JQv/AAgglldonVQg6EHgzGn100q4rlM=; b=CAylZhpsZ20s4JoTUX6F33qA9C4A8NNfNH3s7e67XWe29jRnJIYjjUtDTH1/WbqUqc 5BH76i+CJjSPZPZwXLPBjVH5loF2bJs8R5d8zKu6acTu10+Af46VWX4dh1Gj9C72Oytg 4BPtvqHtk/atM9r8OuydEp5j7HHV//YlKGD1NgQLM17/eltmwC4iVcgrsKUCjoZkVsbF fxeytyUh9rKcqWE14fanx/Kb73UWrDiU3uxhdLK7szYewGgMWsgOvWJl/7L8npZgU3wX opzuCYdITG7GB9hDgtGgcZ2vN8V8KzP27qEia0OFh0aOFwLRNfEs6CS/ok75x3vjSR7d LDaA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id bt17si2245647edb.469.2020.12.09.23.13.05; Wed, 09 Dec 2020 23:13:28 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727337AbgLJF1W (ORCPT + 99 others); Thu, 10 Dec 2020 00:27:22 -0500 Received: from mga09.intel.com ([134.134.136.24]:7013 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726434AbgLJF1V (ORCPT ); Thu, 10 Dec 2020 00:27:21 -0500 IronPort-SDR: YGPG6Md4hw4U1ii42vayo0QyUJ4NcIchGGZAtaq66/aWQcIViNfz7+2PdbzUMgKVh3qmsjTH4E SKhqdxfE/iuQ== X-IronPort-AV: E=McAfee;i="6000,8403,9830"; a="174346308" X-IronPort-AV: E=Sophos;i="5.78,407,1599548400"; d="scan'208";a="174346308" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Dec 2020 21:26:40 -0800 IronPort-SDR: lo7EoDzHT9S0ukNUp46sLr07ZFqRc17qbqchgU/rS41xUqqRqDzKVWi5TBt1ydWTohVSIlBqa2 gAbrkdqPFfUw== X-IronPort-AV: E=Sophos;i="5.78,407,1599548400"; d="scan'208";a="408389993" Received: from xshen14-linux.bj.intel.com ([10.238.155.105]) by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Dec 2020 21:26:38 -0800 From: Xiaochen Shen To: bp@alien8.de Cc: linux-tip-commits@vger.kernel.org, bp@suse.de, tony.luck@intel.com, x86@kernel.org, linux-kernel@vger.kernel.org, xiaochen.shen@intel.com Subject: [tip: x86/cache v2] x86/resctrl: Fix incorrect local bandwidth when mba_sc is enabled Date: Thu, 10 Dec 2020 13:49:17 +0800 Message-Id: <1607579357-15897-1-git-send-email-xiaochen.shen@intel.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <20201209222328.GA20710@zn.tnic> References: <20201209222328.GA20710@zn.tnic> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org MBA software controller (mba_sc) is a feedback loop which periodically reads MBM counters and tries to restrict the bandwidth below a user specified bandwidth. It tags along MBM counter overflow handler to do the updates with 1s interval in mbm_update() and update_mba_bw(). The purpose of mbm_update() is to periodically read the MBM counters to make sure that the hardware counter doesn't wrap around more than once between user samplings. mbm_update() calls __mon_event_count() for local bandwidth updating when mba_sc is not enabled, but calls mbm_bw_count() instead when mba_sc is enabled. __mon_event_count() will not be called for local bandwidth updating in MBM counter overflow handler, but it is still called when reading MBM local bandwidth counter file 'mbm_local_bytes', the call path is as below: rdtgroup_mondata_show() mon_event_read() mon_event_count() __mon_event_count() In __mon_event_count(), m->chunks is updated by delta chunks which is calculated from previous MSR value (m->prev_msr) and current MSR value. When mba_sc is enabled, m->chunks is also updated in mbm_update() by mistake by the delta chunks which is calculated from m->prev_bw_msr instead of m->prev_msr. But m->chunks is not used in update_mba_bw() in the mba_sc feedback loop. When reading MBM local bandwidth counter file, m->chunks was changed unexpectedly by mbm_bw_count(). As a result, the incorrect local bandwidth counter which calculated from incorrect m->chunks is read out to the user. Fix this by removing incorrect m->chunks updating in mbm_bw_count() in MBM counter overflow handler, and always calling __mon_event_count() in mbm_update() to make sure that the hardware local bandwidth counter doesn't wrap around. Test steps: # Run workload with aggressive memory bandwidth (e.g., 10 GB/s) git clone https://github.com/intel/intel-cmt-cat && cd intel-cmt-cat && make ./tools/membw/membw -c 0 -b 10000 --read # Enable MBA software controller mount -t resctrl resctrl -o mba_MBps /sys/fs/resctrl # Create control group c1 mkdir /sys/fs/resctrl/c1 # Set MB throttle to 6 GB/s echo "MB:0=6000;1=6000" > /sys/fs/resctrl/c1/schemata # Write PID of the workload to tasks file echo `pidof membw` > /sys/fs/resctrl/c1/tasks # Read local bytes counters twice with 1s interval, the calculated # local bandwidth is not as expected (approaching to 6 GB/s): local_1=`cat /sys/fs/resctrl/c1/mon_data/mon_L3_00/mbm_local_bytes` sleep 1 local_2=`cat /sys/fs/resctrl/c1/mon_data/mon_L3_00/mbm_local_bytes` echo "local b/w (bytes/s):" `expr $local_2 - $local_1` Before fix: local b/w (bytes/s): 11076796416 After fix: local b/w (bytes/s): 5465014272 Fixes: ba0f26d8529c (x86/intel_rdt/mba_sc: Prepare for feedback loop) Signed-off-by: Xiaochen Shen Reviewed-by: Tony Luck --- arch/x86/kernel/cpu/resctrl/monitor.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c index 622073f..7ac3121 100644 --- a/arch/x86/kernel/cpu/resctrl/monitor.c +++ b/arch/x86/kernel/cpu/resctrl/monitor.c @@ -343,7 +343,6 @@ static void mbm_bw_count(u32 rmid, struct rmid_read *rr) return; chunks = mbm_overflow_count(m->prev_bw_msr, tval, rr->r->mbm_width); - m->chunks += chunks; cur_bw = (get_corrected_mbm_count(rmid, chunks) * r->mon_scale) >> 20; if (m->delta_comp) @@ -514,15 +513,14 @@ static void mbm_update(struct rdt_resource *r, struct rdt_domain *d, int rmid) } if (is_mbm_local_enabled()) { rr.evtid = QOS_L3_MBM_LOCAL_EVENT_ID; + __mon_event_count(rmid, &rr); /* * Call the MBA software controller only for the * control groups and when user has enabled * the software controller explicitly. */ - if (!is_mba_sc(NULL)) - __mon_event_count(rmid, &rr); - else + if (is_mba_sc(NULL)) mbm_bw_count(rmid, &rr); } } -- 1.8.3.1