Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933989AbbLPA4g (ORCPT ); Tue, 15 Dec 2015 19:56:36 -0500 Received: from mga11.intel.com ([192.55.52.93]:28558 "EHLO mga11.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932875AbbLPAyj (ORCPT ); Tue, 15 Dec 2015 19:54:39 -0500 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.20,434,1444719600"; d="scan'208";a="872326672" From: Andi Kleen To: acme@kernel.org, peterz@infradead.org Cc: jolsa@kernel.org, mingo@kernel.org, linux-kernel@vger.kernel.org, eranian@google.com, namhyung@kernel.org, Andi Kleen Subject: [PATCH 09/10] x86, perf: Add Top Down events to Intel Core Date: Tue, 15 Dec 2015 16:54:25 -0800 Message-Id: <1450227266-2501-10-git-send-email-andi@firstfloor.org> X-Mailer: git-send-email 2.4.3 In-Reply-To: <1450227266-2501-1-git-send-email-andi@firstfloor.org> References: <1450227266-2501-1-git-send-email-andi@firstfloor.org> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6426 Lines: 172 From: Andi Kleen Add declarations for the events needed for TopDown to the Intel big core CPUs starting with Sandy Bridge. We need to report different values if HyperThreading is on or off. The only thing this patch does is to export some events in sysfs. TopDown level 1 uses a set of abstracted metrics which are generic to out of order CPU cores (although some CPUs may not implement all of them): topdown-total-slots Available slots in the pipeline topdown-slots-issued Slots issued into the pipeline topdown-slots-retired Slots successfully retired topdown-fetch-bubbles Pipeline gaps in the frontend topdown-recovery-bubbles Pipeline gaps during recovery from misspeculation A slot is a single operation in the CPU pipe line. These metrics then allow to compute four useful metrics: FrontendBound, BackendBound, Retiring, BadSpeculation. The formulas to compute the metrics are generic, they only change based on the availability on the abstracted input values. The kernel declares the events supported by the current CPU and their scaling factors (such as the pipeline width) and perf stat then computes the formulas based on the available metrics. This is similar how existing perf metrics, such as TSC metrics or IPC, are implemented. This abstracts all CPU pipe line specific knowledge in the kernel driver, but still avoids the need for larger scale perf interface changes. For HyperThreading the any bit is needed to get accurate values when both threads are executing. This implies that the events can only be collected as root or with perf_event_paranoid=-1 for now. Hyper Threading also requires averaging events from both threads together (the CPU cannot measure them independently). In perf stat this is already done by the per core mode. The new .agg-per-core attribute is added to the events, which then forces perf stat to enable --per-core. The basic scheme is based on the following paper: Yasin, A Top Down Method for Performance analysis and Counter architecture ISPASS14 (pdf available via google) v2: Rework scaling. Fix formulas for HyperThreading. Signed-off-by: Andi Kleen --- arch/x86/kernel/cpu/perf_event_intel.c | 73 ++++++++++++++++++++++++++++++++++ 1 file changed, 73 insertions(+) diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c index 33b4b67..2eb50f7 100644 --- a/arch/x86/kernel/cpu/perf_event_intel.c +++ b/arch/x86/kernel/cpu/perf_event_intel.c @@ -222,9 +222,64 @@ struct attribute *nhm_events_attrs[] = { NULL, }; +/* + * TopDown events for Core. + * + * The events are all in slots, which is a free slot in a 4 wide + * pipeline. Some events are already reported in slots, for cycle + * events we multiply by the pipeline width (4). + * + * With Hyper Threading on, TopDown metrics are either summed or averaged + * between the threads of a core: (count_t0 + count_t1). + * + * For the average case the metric is always scaled to pipeline width, + * so we use factor 2 ((count_t0 + count_t1) / 2 * 4) + * + * We tell perf to aggregate per core by setting the .agg-per-core + * attribute for the alias to 1. + */ + +EVENT_ATTR_STR_HT(topdown-total-slots, td_total_slots, + "event=0x3c,umask=0x0", /* cpu_clk_unhalted.thread */ + "event=0x3c,umask=0x0,any=1"); /* cpu_clk_unhalted.thread_any */ +EVENT_ATTR_STR_HT(topdown-total-slots.scale, td_total_slots_scale, "4", "2"); +EVENT_ATTR_STR_HT(topdown-total-slots.agg-per-core, td_total_slots_pc, + "0", "1"); +EVENT_ATTR_STR(topdown-slots-issued, td_slots_issued, + "event=0xe,umask=0x1"); /* uops_issued.any */ +EVENT_ATTR_STR_HT(topdown-slots-issued.agg-per-core, td_slots_issued_pc, + "0", "1"); +EVENT_ATTR_STR(topdown-slots-retired, td_slots_retired, + "event=0xc2,umask=0x2"); /* uops_retired.retire_slots */ +EVENT_ATTR_STR_HT(topdown-slots-retired.agg-per-core, td_slots_retired_pc, + "0", "1"); +EVENT_ATTR_STR(topdown-fetch-bubbles, td_fetch_bubbles, + "event=0x9c,umask=0x1"); /* idq_uops_not_delivered_core */ +EVENT_ATTR_STR_HT(topdown-fetch-bubbles.agg-per-core, td_fetch_bubbles_pc, + "0", "1"); +EVENT_ATTR_STR_HT(topdown-recovery-bubbles, td_recovery_bubbles, + "event=0xd,umask=0x3,cmask=1", /* int_misc.recovery_cycles */ + "event=0xd,umask=0x3,cmask=1,any=1"); /* int_misc.recovery_cycles_any */ +EVENT_ATTR_STR_HT(topdown-recovery-bubbles.scale, td_recovery_bubbles_scale, + "4", "2"); +EVENT_ATTR_STR_HT(topdown-recovery-bubbles.agg-per-core, td_recovery_bubbles_pc, + "0", "1"); + struct attribute *snb_events_attrs[] = { EVENT_PTR(mem_ld_snb), EVENT_PTR(mem_st_snb), + EVENT_PTR(td_slots_issued), + EVENT_PTR(td_slots_issued_pc), + EVENT_PTR(td_slots_retired), + EVENT_PTR(td_slots_retired_pc), + EVENT_PTR(td_fetch_bubbles), + EVENT_PTR(td_fetch_bubbles_pc), + EVENT_PTR(td_total_slots), + EVENT_PTR(td_total_slots_scale), + EVENT_PTR(td_total_slots_pc), + EVENT_PTR(td_recovery_bubbles), + EVENT_PTR(td_recovery_bubbles_scale), + EVENT_PTR(td_recovery_bubbles_pc), NULL, }; @@ -3201,6 +3256,18 @@ static struct attribute *hsw_events_attrs[] = { EVENT_PTR(cycles_ct), EVENT_PTR(mem_ld_hsw), EVENT_PTR(mem_st_hsw), + EVENT_PTR(td_slots_issued), + EVENT_PTR(td_slots_issued_pc), + EVENT_PTR(td_slots_retired), + EVENT_PTR(td_slots_retired_pc), + EVENT_PTR(td_fetch_bubbles), + EVENT_PTR(td_fetch_bubbles_pc), + EVENT_PTR(td_total_slots), + EVENT_PTR(td_total_slots_scale), + EVENT_PTR(td_total_slots_pc), + EVENT_PTR(td_recovery_bubbles), + EVENT_PTR(td_recovery_bubbles_scale), + EVENT_PTR(td_recovery_bubbles_pc), NULL }; @@ -3518,6 +3585,12 @@ __init int intel_pmu_init(void) memcpy(hw_cache_extra_regs, skl_hw_cache_extra_regs, sizeof(hw_cache_extra_regs)); intel_pmu_lbr_init_skl(); + /* INT_MISC.RECOVERY_CYCLES has umask 1 in Skylake */ + event_attr_td_recovery_bubbles.event_str_noht = + "event=0xd,umask=0x1,cmask=1"; + event_attr_td_recovery_bubbles.event_str_ht = + "event=0xd,umask=0x1,cmask=1,any=1"; + x86_pmu.event_constraints = intel_skl_event_constraints; x86_pmu.pebs_constraints = intel_skl_pebs_event_constraints; x86_pmu.extra_regs = intel_skl_extra_regs; -- 2.4.3 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/