From: Vikas Shivappa <vikas.shivappa@linux.intel.com>
To: vikas.shivappa@intel.com, vikas.shivappa@linux.intel.com
Cc: linux-kernel@vger.kernel.org, x86@kernel.org, tglx@linutronix.de,
        peterz@infradead.org, ravi.v.shankar@intel.com, tony.luck@intel.com,
        fenghua.yu@intel.com, andi.kleen@intel.com, davidcc@google.com,
        eranian@google.com, hpa@zytor.com
Subject: [PATCH V4 00/14] Cqm2: Intel Cache Monitoring fixes and enhancements
Date: Fri, 16 Dec 2016 15:12:54 -0800
Message-Id: <1481929988-31569-1-git-send-email-vikas.shivappa@linux.intel.com>
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 6470
Lines: 139

Another attempt for cqm2 series-

The current upstream cqm(Cache monitoring) has major issues which make
the feature almost unusable which this series tries to fix and also
address Thomas comments on previous versions of the cqm2 patch series to
better document what we are trying to fix. Patch is based on
tip/x86/cache.

This is a continuation of patch series David(davidcc@google.com)
perviously posted and hence its trying to fix the same isssues. 

Below are the issues and the fixes/enhancements we attempt-

- Issue: RMID recycling leads to inaccurate data and complicates the
code and increases the code foot print. Currently, it almost makes the
feature *unusable* as we only see zeroes and inconsistent data once we
run out of RMIDs in the life time of a systemboot. The only way to get
right numbers is to reboot the system once we run out of RMIDs.

Root cause: Recycling steals an RMID from an existing event x and gives
it to an other event y. However due to the nature of monitoring
llc_occupancy we may miss tracking an unknown(possibly large) part of
cache fills at the time when event does not have RMID. Hence the user
ends up with inaccurate data for both events x and y and the inaccuracy
is arbitrary and cannot be measured.  Even if an event x gets another
RMID very soon after loosing the previous RMID, we still miss all the
occupancy data that was tied to the previous RMID which means we cannot
get accurate data even when for most of the time event has an RMID.
There is no way to guarantee accurate results with recycling and data is
inaccurate by arbitrary degree. The fact that an event can loose an RMID
anytime complicates a lot of code in sched_in, init, count, read. It
also complicates mbm as we may loose the RMID anytime and hence need to
keep a history of all the old counts.

Fix: Recycling is removed based on Tony's idea originally that its
introducing a lot of code, failing to provide accurate data and hence
questionable benefits. Recycling was introduced to deal with scarce
RMIDs. We instead support below things to mitigate scarce RMID issue and
and also provide reliable output to user. 
-We ran out of RMIds soon because only global RMIds were supported.
This patch supports per-pkg RMIDs and dynamic RMID
assignment only whan tasks are actually scheduled on a socket to mitigate the
scarcity of RMIDs. 
Since we also increased the RMIDs upto x amount
where x is # of packages, the issue is minimized greatly given that we have 2-4
RMIDs per logical processor on each package.
-User choses the packages he wants to monitor and we
just throw an error if we so many RMIDs are not available. When user
wants guarenteed monitoring user can use this.
-User can also choose lazy RMID allocation in which case an error is
thrown when at read.
This may be better as the user then does not have
events which he thinks are being monitored but they actually are not
monitored 100% of time.

- Issue: Inaccurate data for per package data, systemwide. Just prints
zeros or arbitary numbers.
Fix: Patches fix this by just throwing an error if the mode is not supported. 
The modes supported is task monitoring and cgroup monitoring. 
Also the per package
data for say socket x is returned with the -C <cpu on socketx> -G cgrpy option.
The systemwide data can be looked up by monitoring root cgroup.

- Support per pkg RMIDs hence scale better with more packages, and get
more RMIDs to use and use when needed (ie when tasks are actually
scheduled on the package).

- Issue: Cgroup monitoring is not complete. No hierarchical monitoring
support, inconsistent or wrong data seen when monitoring cgroup.

Fix: cgroup monitoring support added. 
Patch adds full cgroup monitoring support. Can monitor different cgroups
in the same hierarchy together and seperately. And can also monitor a
task and the cgroup which the task belongs.

- Issue: Lot of inconsistent data is seen currently when we monitor different
kind of events like cgroup and task events *together*.

Fix: Patch adds support to be
able to monitor a cgroup x and as task p1 with in a cgroup x and also
monitor different cgroup and tasks together.

- Monitoring of a task for its lifetime is not supported. Patch adds
support to continuously monitor a cgroup even when perf is not being
run. This provides light weight long term/~lifetime monitoring. 

- Issue: Cat and cqm write the same PQR_ASSOC_MSR seperately
Fix: Integrate the sched in code and write the PQR_MSR only once every switch_to 

Whats working now (unit tested):
Task monitoring, cgroup hierarchical monitoring, monitor multiple
cgroups, cgroup and task in same cgroup, continuous cgroup monitoring,
per pkg rmids, error on read, error on open.

TBD/Known issues : 
- Most of MBM is working but will need updates to hierarchical
monitoring and other new feature related changes we introduce. 

[PATCH 02/14] x86/cqm: Remove cqm recycling/conflict handling

Before the patch: Users sees only zeros or wrong data once we run out of
RMIDs.
After: User would see either correct data or an error that we run out of
RMIDs.

[PATCH 03/14] x86/rdt: Add rdt common/cqm compile option
[PATCH 04/14] x86/cqm: Add Per pkg rmid support

Before patch: RMIds are global.
Tests: Available RMIDs increase by x times where x is # of packages.
Adds LAZY RMID alloc - RMIDs are alloced during first sched in

[PATCH 05/14] x86/cqm,perf/core: Cgroup support prepare
[PATCH 06/14] x86/cqm: Add cgroup hierarchical monitoring support
[PATCH 07/14] x86/rdt,cqm: Scheduling support update

Before patch: cgroup monitoring not supported fully.
After: cgroup monitoring is fully supported including hierarchical
monitoring.

[PATCH 08/14] x86/cqm: Add support for monitoring task and cgroup
[PATCH 09/14] x86/cqm: Add Continuous cgroup monitoring

Adds new features.

[PATCH 10/14] x86/cqm: Add RMID reuse
[PATCH 11/14] x86/cqm: Add failure on open and read

Before patch: Once RMID is used , its never used again.
After: We reuse the RMIDs which are freed. User can specify NOLAZY RMID
allocation and open fails if we fail to get all RMIDs at open.

[PATCH 12/14] perf/core,x86/cqm: Add read for Cgroup events,per pkg
[PATCH 13/14] perf/stat: fix bug in handling events in error state
[PATCH 14/14] perf/stat: revamp read error handling, snapshot and

Patches 1-14 - 10/14 Add all the features but the data is not visible to
the perf/core nor the perf user mode. The 11-14 fix these and make the
data availabe to the perf user mode.