Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752686AbcD2Ens (ORCPT ); Fri, 29 Apr 2016 00:43:48 -0400 Received: from mail-pa0-f52.google.com ([209.85.220.52]:36711 "EHLO mail-pa0-f52.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751346AbcD2Enq (ORCPT ); Fri, 29 Apr 2016 00:43:46 -0400 From: David Carrillo-Cisneros To: Peter Zijlstra , Alexander Shishkin , Arnaldo Carvalho de Melo , Ingo Molnar Cc: Vikas Shivappa , Matt Fleming , Tony Luck , Stephane Eranian , Paul Turner , David Carrillo-Cisneros , x86@kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH 00/32] 2nd Iteration of Cache QoS Monitoring support. Date: Thu, 28 Apr 2016 21:43:06 -0700 Message-Id: <1461905018-86355-1-git-send-email-davidcc@google.com> X-Mailer: git-send-email 2.8.0.rc3.226.g39d4020 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 9546 Lines: 193 This series introduces the next iteration of kernel support for the Cache QoS Monitoring (CQM) technology available in Intel Xeon processors. One of the main limitations of the previous version is the inability to simultaneously monitor: 1) cpu event and any other event in that cpu. 2) cgroup events for cgroups in same descendancy line. 3) cgroup events and any thread event of a cgroup in the same descendancy line. Another limitation is that monitoring for a cgroup was enabled/disabled by the existence of a perf event for that cgroup. Since the event llc_occupancy measures changes in occupancy rather than total occupancy, in order to read meaningful llc_occupancy values, an event should be enabled for a long enough period of time. The overhead in context switches caused by the perf events is undesired in some sensitive scenarios. This series of patches addresses the shortcomings mentioned above and, add some other improvements. The main changes are: - No more potential conflicts between different events. New version builds a hierarchy of RMIDs that captures the dependency between monitored cgroups. llc_occupancy for cgroup is the sum of llc_occupancies for that cgroup RMID and all other RMIDs in the cgroups subtree (both monitored cgroups and threads). - A cgroup integration that allows to monitor the a cgroup without creating a perf event, decreasing the context switch overhead. Monitoring is controlled by a boolean cgroup subsystem attribute in each perf cgroup, this is: echo 1 > cgroup_path/perf_event.cqm_cont_monitoring starts CQM monitoring whether or not there is a perf_event attached to the cgroup. Setting the attribute to 0 makes monitoring dependent on the existence of a perf_event. A perf_event is always required in order to read llc_occupancy. This cgroup integration uses Intel's PQR code and is intended to be used by upcoming versions of Intel's CAT. - A more stable rotation algorithm: New algorithm uses SLOs that guarantee: - A minimum of enabled time for monitored cgroups and threads. - A maximum time disabled before error is introduced by reusing dirty RMIDs. - A minimum rate at which RMIDs recycling must progress. - Reduced impact of stealing/rotation of RMIDs: The new algorithm accounts the residual occupancy held by limbo RMIDs towards the former owner of the limbo RMID, decreasing the error introduced by RMID rotation. It also allows a limbo RMID to be reused by its former owner when appropriate, decreasing the potential error of reusing dirty RMIDs and allowing to make progress even if most limbo RMIDs do not drop occupancy fast enough. - Elimination of pmu::count: perf generic's perf_event_count() perform a quick add of atomic types. The introduction of pmu::count in the previous CQM series to read occupancy for thread events changed the behavior of perf_event_count() by performing a potentially slow IPI and write/read to MSR. It also made pmu::read to have different behaviors depending on whether the event was a cpu/cgroup event or a thread. This patches serie removes the custom pmu::count from CQM and provides a consistent behavior for all calls of perf_event_read . - Added error return for pmu::read: Reads to CQM events may fail due to stealing of RMIDs, even after successfully adding an event to a PMU. This patch series expands pmu::read with an int return value and propagates the error to callers that can fail (ie. perf_read). The ability to fail of pmu::read is consistent with the recent changes that allow perf_event_read to fail for transactional reading of event groups. - Introduces the field pmu_event_flags that contain flags set by the PMU to signal variations on the default behavior to perf's generic code. In this series, three flags are introduced: - PERF_CGROUP_NO_RECURSION : Signals generic code to add events of the cgroup ancestors of a cgroup. - PERF_INACTIVE_CPU_READ_PKG: Signals generic coda that this CPU event can be read in any CPU in its event::cpu's package, even if the event is not active. - PERF_INACTIVE_EV_READ_ANY_CPU: Signals generic code that this event can be read in any CPU in any package in the system even if the event is not active. Using the above flags takes advantage of the CQM's hw ability to read llc_occupancy even when the associated perf event is not running in a CPU. This patch series also updates the perf tool to fix error handling and to better handle the idiosyncrasies of snapshot and per-pkg events. David Carrillo-Cisneros (31): perf/x86/intel/cqm: temporarily remove MBM from CQM and cleanup perf/x86/intel/cqm: remove check for conflicting events perf/x86/intel/cqm: remove all code for rotation of RMIDs perf/x86/intel/cqm: make read of RMIDs per package (Temporal) perf/core: remove unused pmu->count x86/intel,cqm: add CONFIG_INTEL_RDT configuration flag and refactor PQR perf/x86/intel/cqm: separate CQM PMU's attributes from x86 PMU perf/x86/intel/cqm: prepare for next patches perf/x86/intel/cqm: add per-package RMIDs, data and locks perf/x86/intel/cqm: basic RMID hierarchy with per package rmids perf/x86/intel/cqm: (I)state and limbo prmids perf/x86/intel/cqm: add per-package RMID rotation perf/x86/intel/cqm: add polled update of RMID's llc_occupancy perf/x86/intel/cqm: add preallocation of anodes perf/core: add hooks to expose architecture specific features in perf_cgroup perf/x86/intel/cqm: add cgroup support perf/core: adding pmu::event_terminate perf/x86/intel/cqm: use pmu::event_terminate perf/core: introduce PMU event flag PERF_CGROUP_NO_RECURSION x86/intel/cqm: use PERF_CGROUP_NO_RECURSION in CQM perf/x86/intel/cqm: handle inherit event and inherit_stat flag perf/x86/intel/cqm: introduce read_subtree perf/core: introduce PERF_INACTIVE_*_READ_* flags perf/x86/intel/cqm: use PERF_INACTIVE_*_READ_* flags in CQM sched: introduce the finish_arch_pre_lock_switch() scheduler hook perf/x86/intel/cqm: integrate CQM cgroups with scheduler perf/core: add perf_event cgroup hooks for subsystem attributes perf/x86/intel/cqm: add CQM attributes to perf_event cgroup perf,perf/x86,perf/powerpc,perf/arm,perf/*: add int error return to pmu::read perf,perf/x86: add hook perf_event_arch_exec perf/stat: revamp error handling for snapshot and per_pkg events Stephane Eranian (1): perf/stat: fix bug in handling events in error state arch/alpha/kernel/perf_event.c | 3 +- arch/arc/kernel/perf_event.c | 3 +- arch/arm64/include/asm/hw_breakpoint.h | 2 +- arch/arm64/kernel/hw_breakpoint.c | 3 +- arch/metag/kernel/perf/perf_event.c | 5 +- arch/mips/kernel/perf_event_mipsxx.c | 3 +- arch/powerpc/include/asm/hw_breakpoint.h | 2 +- arch/powerpc/kernel/hw_breakpoint.c | 3 +- arch/powerpc/perf/core-book3s.c | 11 +- arch/powerpc/perf/core-fsl-emb.c | 5 +- arch/powerpc/perf/hv-24x7.c | 5 +- arch/powerpc/perf/hv-gpci.c | 3 +- arch/s390/kernel/perf_cpum_cf.c | 5 +- arch/s390/kernel/perf_cpum_sf.c | 3 +- arch/sh/include/asm/hw_breakpoint.h | 2 +- arch/sh/kernel/hw_breakpoint.c | 3 +- arch/sparc/kernel/perf_event.c | 2 +- arch/tile/kernel/perf_event.c | 3 +- arch/x86/Kconfig | 6 + arch/x86/events/amd/ibs.c | 2 +- arch/x86/events/amd/iommu.c | 5 +- arch/x86/events/amd/uncore.c | 3 +- arch/x86/events/core.c | 3 +- arch/x86/events/intel/Makefile | 3 +- arch/x86/events/intel/bts.c | 3 +- arch/x86/events/intel/cqm.c | 3847 +++++++++++++++++++++--------- arch/x86/events/intel/cqm.h | 519 ++++ arch/x86/events/intel/cstate.c | 3 +- arch/x86/events/intel/pt.c | 3 +- arch/x86/events/intel/rapl.c | 3 +- arch/x86/events/intel/uncore.c | 3 +- arch/x86/events/intel/uncore.h | 2 +- arch/x86/events/msr.c | 3 +- arch/x86/include/asm/hw_breakpoint.h | 2 +- arch/x86/include/asm/perf_event.h | 41 + arch/x86/include/asm/pqr_common.h | 74 + arch/x86/include/asm/processor.h | 4 + arch/x86/kernel/cpu/Makefile | 4 + arch/x86/kernel/cpu/pqr_common.c | 43 + arch/x86/kernel/hw_breakpoint.c | 3 +- arch/x86/kvm/pmu.h | 10 +- drivers/bus/arm-cci.c | 3 +- drivers/bus/arm-ccn.c | 3 +- drivers/perf/arm_pmu.c | 3 +- include/linux/perf_event.h | 91 +- kernel/events/core.c | 170 +- kernel/sched/core.c | 1 + kernel/sched/sched.h | 3 + kernel/trace/bpf_trace.c | 5 +- tools/perf/builtin-stat.c | 43 +- tools/perf/util/counts.h | 19 + tools/perf/util/evsel.c | 44 +- tools/perf/util/evsel.h | 8 +- tools/perf/util/stat.c | 35 +- 54 files changed, 3746 insertions(+), 1337 deletions(-) create mode 100644 arch/x86/events/intel/cqm.h create mode 100644 arch/x86/include/asm/pqr_common.h create mode 100644 arch/x86/kernel/cpu/pqr_common.c -- 2.8.0.rc3.226.g39d4020