Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752817AbcD2VGj (ORCPT ); Fri, 29 Apr 2016 17:06:39 -0400 Received: from mga14.intel.com ([192.55.52.115]:45141 "EHLO mga14.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752657AbcD2VGi (ORCPT ); Fri, 29 Apr 2016 17:06:38 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.24,553,1455004800"; d="scan'208";a="965645740" Date: Fri, 29 Apr 2016 14:06:24 -0700 (PDT) From: Vikas Shivappa X-X-Sender: vikas@vshiva-Udesk To: David Carrillo-Cisneros cc: Peter Zijlstra , Alexander Shishkin , Arnaldo Carvalho de Melo , Ingo Molnar , Vikas Shivappa , Matt Fleming , Tony Luck , Stephane Eranian , Paul Turner , x86@kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH 00/32] 2nd Iteration of Cache QoS Monitoring support. In-Reply-To: <1461905018-86355-1-git-send-email-davidcc@google.com> Message-ID: References: <1461905018-86355-1-git-send-email-davidcc@google.com> User-Agent: Alpine 2.10 (DEB 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 9950 Lines: 205 On Thu, 28 Apr 2016, David Carrillo-Cisneros wrote: > This series introduces the next iteration of kernel support for the > Cache QoS Monitoring (CQM) technology available in Intel Xeon processors. Wondering what is the kernel version this compiles on ? Thanks, Vikas > > One of the main limitations of the previous version is the inability > to simultaneously monitor: > 1) cpu event and any other event in that cpu. > 2) cgroup events for cgroups in same descendancy line. > 3) cgroup events and any thread event of a cgroup in the same > descendancy line. > > Another limitation is that monitoring for a cgroup was enabled/disabled by > the existence of a perf event for that cgroup. Since the event > llc_occupancy measures changes in occupancy rather than total occupancy, > in order to read meaningful llc_occupancy values, an event should be > enabled for a long enough period of time. The overhead in context switches > caused by the perf events is undesired in some sensitive scenarios. > > This series of patches addresses the shortcomings mentioned above and, > add some other improvements. The main changes are: > - No more potential conflicts between different events. New > version builds a hierarchy of RMIDs that captures the dependency > between monitored cgroups. llc_occupancy for cgroup is the sum of > llc_occupancies for that cgroup RMID and all other RMIDs in the > cgroups subtree (both monitored cgroups and threads). > > - A cgroup integration that allows to monitor the a cgroup without > creating a perf event, decreasing the context switch overhead. > Monitoring is controlled by a boolean cgroup subsystem attribute > in each perf cgroup, this is: > > echo 1 > cgroup_path/perf_event.cqm_cont_monitoring > > starts CQM monitoring whether or not there is a perf_event > attached to the cgroup. Setting the attribute to 0 makes > monitoring dependent on the existence of a perf_event. > A perf_event is always required in order to read llc_occupancy. > This cgroup integration uses Intel's PQR code and is intended to > be used by upcoming versions of Intel's CAT. > > - A more stable rotation algorithm: New algorithm uses SLOs that > guarantee: > - A minimum of enabled time for monitored cgroups and > threads. > - A maximum time disabled before error is introduced by > reusing dirty RMIDs. > - A minimum rate at which RMIDs recycling must progress. > > - Reduced impact of stealing/rotation of RMIDs: The new algorithm > accounts the residual occupancy held by limbo RMIDs towards the > former owner of the limbo RMID, decreasing the error introduced > by RMID rotation. > It also allows a limbo RMID to be reused by its former owner when > appropriate, decreasing the potential error of reusing dirty RMIDs > and allowing to make progress even if most limbo RMIDs do not > drop occupancy fast enough. > > - Elimination of pmu::count: perf generic's perf_event_count() > perform a quick add of atomic types. The introduction of > pmu::count in the previous CQM series to read occupancy for thread > events changed the behavior of perf_event_count() by performing a > potentially slow IPI and write/read to MSR. It also made pmu::read > to have different behaviors depending on whether the event was a > cpu/cgroup event or a thread. This patches serie removes the custom > pmu::count from CQM and provides a consistent behavior for all > calls of perf_event_read . > > - Added error return for pmu::read: Reads to CQM events may fail > due to stealing of RMIDs, even after successfully adding an event > to a PMU. This patch series expands pmu::read with an int return > value and propagates the error to callers that can fail > (ie. perf_read). > The ability to fail of pmu::read is consistent with the recent > changes that allow perf_event_read to fail for transactional > reading of event groups. > > - Introduces the field pmu_event_flags that contain flags set by > the PMU to signal variations on the default behavior to perf's > generic code. In this series, three flags are introduced: > - PERF_CGROUP_NO_RECURSION : Signals generic code to add > events of the cgroup ancestors of a cgroup. > - PERF_INACTIVE_CPU_READ_PKG: Signals generic coda that > this CPU event can be read in any CPU in its event::cpu's > package, even if the event is not active. > - PERF_INACTIVE_EV_READ_ANY_CPU: Signals generic code that > this event can be read in any CPU in any package in the > system even if the event is not active. > Using the above flags takes advantage of the CQM's hw ability to > read llc_occupancy even when the associated perf event is not > running in a CPU. > > This patch series also updates the perf tool to fix error handling and to > better handle the idiosyncrasies of snapshot and per-pkg events. > > David Carrillo-Cisneros (31): > perf/x86/intel/cqm: temporarily remove MBM from CQM and cleanup > perf/x86/intel/cqm: remove check for conflicting events > perf/x86/intel/cqm: remove all code for rotation of RMIDs > perf/x86/intel/cqm: make read of RMIDs per package (Temporal) > perf/core: remove unused pmu->count > x86/intel,cqm: add CONFIG_INTEL_RDT configuration flag and refactor > PQR > perf/x86/intel/cqm: separate CQM PMU's attributes from x86 PMU > perf/x86/intel/cqm: prepare for next patches > perf/x86/intel/cqm: add per-package RMIDs, data and locks > perf/x86/intel/cqm: basic RMID hierarchy with per package rmids > perf/x86/intel/cqm: (I)state and limbo prmids > perf/x86/intel/cqm: add per-package RMID rotation > perf/x86/intel/cqm: add polled update of RMID's llc_occupancy > perf/x86/intel/cqm: add preallocation of anodes > perf/core: add hooks to expose architecture specific features in > perf_cgroup > perf/x86/intel/cqm: add cgroup support > perf/core: adding pmu::event_terminate > perf/x86/intel/cqm: use pmu::event_terminate > perf/core: introduce PMU event flag PERF_CGROUP_NO_RECURSION > x86/intel/cqm: use PERF_CGROUP_NO_RECURSION in CQM > perf/x86/intel/cqm: handle inherit event and inherit_stat flag > perf/x86/intel/cqm: introduce read_subtree > perf/core: introduce PERF_INACTIVE_*_READ_* flags > perf/x86/intel/cqm: use PERF_INACTIVE_*_READ_* flags in CQM > sched: introduce the finish_arch_pre_lock_switch() scheduler hook > perf/x86/intel/cqm: integrate CQM cgroups with scheduler > perf/core: add perf_event cgroup hooks for subsystem attributes > perf/x86/intel/cqm: add CQM attributes to perf_event cgroup > perf,perf/x86,perf/powerpc,perf/arm,perf/*: add int error return to > pmu::read > perf,perf/x86: add hook perf_event_arch_exec > perf/stat: revamp error handling for snapshot and per_pkg events > > Stephane Eranian (1): > perf/stat: fix bug in handling events in error state > > arch/alpha/kernel/perf_event.c | 3 +- > arch/arc/kernel/perf_event.c | 3 +- > arch/arm64/include/asm/hw_breakpoint.h | 2 +- > arch/arm64/kernel/hw_breakpoint.c | 3 +- > arch/metag/kernel/perf/perf_event.c | 5 +- > arch/mips/kernel/perf_event_mipsxx.c | 3 +- > arch/powerpc/include/asm/hw_breakpoint.h | 2 +- > arch/powerpc/kernel/hw_breakpoint.c | 3 +- > arch/powerpc/perf/core-book3s.c | 11 +- > arch/powerpc/perf/core-fsl-emb.c | 5 +- > arch/powerpc/perf/hv-24x7.c | 5 +- > arch/powerpc/perf/hv-gpci.c | 3 +- > arch/s390/kernel/perf_cpum_cf.c | 5 +- > arch/s390/kernel/perf_cpum_sf.c | 3 +- > arch/sh/include/asm/hw_breakpoint.h | 2 +- > arch/sh/kernel/hw_breakpoint.c | 3 +- > arch/sparc/kernel/perf_event.c | 2 +- > arch/tile/kernel/perf_event.c | 3 +- > arch/x86/Kconfig | 6 + > arch/x86/events/amd/ibs.c | 2 +- > arch/x86/events/amd/iommu.c | 5 +- > arch/x86/events/amd/uncore.c | 3 +- > arch/x86/events/core.c | 3 +- > arch/x86/events/intel/Makefile | 3 +- > arch/x86/events/intel/bts.c | 3 +- > arch/x86/events/intel/cqm.c | 3847 +++++++++++++++++++++--------- > arch/x86/events/intel/cqm.h | 519 ++++ > arch/x86/events/intel/cstate.c | 3 +- > arch/x86/events/intel/pt.c | 3 +- > arch/x86/events/intel/rapl.c | 3 +- > arch/x86/events/intel/uncore.c | 3 +- > arch/x86/events/intel/uncore.h | 2 +- > arch/x86/events/msr.c | 3 +- > arch/x86/include/asm/hw_breakpoint.h | 2 +- > arch/x86/include/asm/perf_event.h | 41 + > arch/x86/include/asm/pqr_common.h | 74 + > arch/x86/include/asm/processor.h | 4 + > arch/x86/kernel/cpu/Makefile | 4 + > arch/x86/kernel/cpu/pqr_common.c | 43 + > arch/x86/kernel/hw_breakpoint.c | 3 +- > arch/x86/kvm/pmu.h | 10 +- > drivers/bus/arm-cci.c | 3 +- > drivers/bus/arm-ccn.c | 3 +- > drivers/perf/arm_pmu.c | 3 +- > include/linux/perf_event.h | 91 +- > kernel/events/core.c | 170 +- > kernel/sched/core.c | 1 + > kernel/sched/sched.h | 3 + > kernel/trace/bpf_trace.c | 5 +- > tools/perf/builtin-stat.c | 43 +- > tools/perf/util/counts.h | 19 + > tools/perf/util/evsel.c | 44 +- > tools/perf/util/evsel.h | 8 +- > tools/perf/util/stat.c | 35 +- > 54 files changed, 3746 insertions(+), 1337 deletions(-) > create mode 100644 arch/x86/events/intel/cqm.h > create mode 100644 arch/x86/include/asm/pqr_common.h > create mode 100644 arch/x86/kernel/cpu/pqr_common.c > > -- > 2.8.0.rc3.226.g39d4020 > >