Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1761615AbcLPXNZ (ORCPT ); Fri, 16 Dec 2016 18:13:25 -0500 Received: from mga02.intel.com ([134.134.136.20]:36940 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757187AbcLPXNQ (ORCPT ); Fri, 16 Dec 2016 18:13:16 -0500 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.33,360,1477983600"; d="scan'208";a="40673319" From: Vikas Shivappa To: vikas.shivappa@intel.com, vikas.shivappa@linux.intel.com Cc: linux-kernel@vger.kernel.org, x86@kernel.org, tglx@linutronix.de, peterz@infradead.org, ravi.v.shankar@intel.com, tony.luck@intel.com, fenghua.yu@intel.com, andi.kleen@intel.com, davidcc@google.com, eranian@google.com, hpa@zytor.com Subject: [PATCH V4 00/14] Cqm2: Intel Cache Monitoring fixes and enhancements Date: Fri, 16 Dec 2016 15:12:54 -0800 Message-Id: <1481929988-31569-1-git-send-email-vikas.shivappa@linux.intel.com> X-Mailer: git-send-email 1.9.1 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6470 Lines: 139 Another attempt for cqm2 series- The current upstream cqm(Cache monitoring) has major issues which make the feature almost unusable which this series tries to fix and also address Thomas comments on previous versions of the cqm2 patch series to better document what we are trying to fix. Patch is based on tip/x86/cache. This is a continuation of patch series David(davidcc@google.com) perviously posted and hence its trying to fix the same isssues. Below are the issues and the fixes/enhancements we attempt- - Issue: RMID recycling leads to inaccurate data and complicates the code and increases the code foot print. Currently, it almost makes the feature *unusable* as we only see zeroes and inconsistent data once we run out of RMIDs in the life time of a systemboot. The only way to get right numbers is to reboot the system once we run out of RMIDs. Root cause: Recycling steals an RMID from an existing event x and gives it to an other event y. However due to the nature of monitoring llc_occupancy we may miss tracking an unknown(possibly large) part of cache fills at the time when event does not have RMID. Hence the user ends up with inaccurate data for both events x and y and the inaccuracy is arbitrary and cannot be measured. Even if an event x gets another RMID very soon after loosing the previous RMID, we still miss all the occupancy data that was tied to the previous RMID which means we cannot get accurate data even when for most of the time event has an RMID. There is no way to guarantee accurate results with recycling and data is inaccurate by arbitrary degree. The fact that an event can loose an RMID anytime complicates a lot of code in sched_in, init, count, read. It also complicates mbm as we may loose the RMID anytime and hence need to keep a history of all the old counts. Fix: Recycling is removed based on Tony's idea originally that its introducing a lot of code, failing to provide accurate data and hence questionable benefits. Recycling was introduced to deal with scarce RMIDs. We instead support below things to mitigate scarce RMID issue and and also provide reliable output to user. -We ran out of RMIds soon because only global RMIds were supported. This patch supports per-pkg RMIDs and dynamic RMID assignment only whan tasks are actually scheduled on a socket to mitigate the scarcity of RMIDs. Since we also increased the RMIDs upto x amount where x is # of packages, the issue is minimized greatly given that we have 2-4 RMIDs per logical processor on each package. -User choses the packages he wants to monitor and we just throw an error if we so many RMIDs are not available. When user wants guarenteed monitoring user can use this. -User can also choose lazy RMID allocation in which case an error is thrown when at read. This may be better as the user then does not have events which he thinks are being monitored but they actually are not monitored 100% of time. - Issue: Inaccurate data for per package data, systemwide. Just prints zeros or arbitary numbers. Fix: Patches fix this by just throwing an error if the mode is not supported. The modes supported is task monitoring and cgroup monitoring. Also the per package data for say socket x is returned with the -C -G cgrpy option. The systemwide data can be looked up by monitoring root cgroup. - Support per pkg RMIDs hence scale better with more packages, and get more RMIDs to use and use when needed (ie when tasks are actually scheduled on the package). - Issue: Cgroup monitoring is not complete. No hierarchical monitoring support, inconsistent or wrong data seen when monitoring cgroup. Fix: cgroup monitoring support added. Patch adds full cgroup monitoring support. Can monitor different cgroups in the same hierarchy together and seperately. And can also monitor a task and the cgroup which the task belongs. - Issue: Lot of inconsistent data is seen currently when we monitor different kind of events like cgroup and task events *together*. Fix: Patch adds support to be able to monitor a cgroup x and as task p1 with in a cgroup x and also monitor different cgroup and tasks together. - Monitoring of a task for its lifetime is not supported. Patch adds support to continuously monitor a cgroup even when perf is not being run. This provides light weight long term/~lifetime monitoring. - Issue: Cat and cqm write the same PQR_ASSOC_MSR seperately Fix: Integrate the sched in code and write the PQR_MSR only once every switch_to Whats working now (unit tested): Task monitoring, cgroup hierarchical monitoring, monitor multiple cgroups, cgroup and task in same cgroup, continuous cgroup monitoring, per pkg rmids, error on read, error on open. TBD/Known issues : - Most of MBM is working but will need updates to hierarchical monitoring and other new feature related changes we introduce. [PATCH 02/14] x86/cqm: Remove cqm recycling/conflict handling Before the patch: Users sees only zeros or wrong data once we run out of RMIDs. After: User would see either correct data or an error that we run out of RMIDs. [PATCH 03/14] x86/rdt: Add rdt common/cqm compile option [PATCH 04/14] x86/cqm: Add Per pkg rmid support Before patch: RMIds are global. Tests: Available RMIDs increase by x times where x is # of packages. Adds LAZY RMID alloc - RMIDs are alloced during first sched in [PATCH 05/14] x86/cqm,perf/core: Cgroup support prepare [PATCH 06/14] x86/cqm: Add cgroup hierarchical monitoring support [PATCH 07/14] x86/rdt,cqm: Scheduling support update Before patch: cgroup monitoring not supported fully. After: cgroup monitoring is fully supported including hierarchical monitoring. [PATCH 08/14] x86/cqm: Add support for monitoring task and cgroup [PATCH 09/14] x86/cqm: Add Continuous cgroup monitoring Adds new features. [PATCH 10/14] x86/cqm: Add RMID reuse [PATCH 11/14] x86/cqm: Add failure on open and read Before patch: Once RMID is used , its never used again. After: We reuse the RMIDs which are freed. User can specify NOLAZY RMID allocation and open fails if we fail to get all RMIDs at open. [PATCH 12/14] perf/core,x86/cqm: Add read for Cgroup events,per pkg [PATCH 13/14] perf/stat: fix bug in handling events in error state [PATCH 14/14] perf/stat: revamp read error handling, snapshot and Patches 1-14 - 10/14 Add all the features but the data is not visible to the perf/core nor the perf user mode. The 11-14 fix these and make the data availabe to the perf user mode.