Received: by 2002:a05:6a10:16a7:0:0:0:0 with SMTP id gp39csp4302503pxb; Tue, 10 Nov 2020 12:57:05 -0800 (PST) X-Google-Smtp-Source: ABdhPJynvewwiH3bOiBp0KGc/kcFunuB/wSB/9g16rPFbu/lGk1XijxhT9KicK3dFr2Lvpe9zeOH X-Received: by 2002:a17:906:76d3:: with SMTP id q19mr21470040ejn.162.1605041825072; Tue, 10 Nov 2020 12:57:05 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1605041825; cv=none; d=google.com; s=arc-20160816; b=so5gfCF3l6sWmy6VEIm//3uFjhsyEiamTctL3QItEJvGpUxcPb+IgNLALM2ohiPB5f kIT7y33e1xMv495mH+sEZiICRgyDGaM7Tz9QT1pzYqqy5IBHHx//tt1IJNXuRMzcAay2 YK0zGP4fzC50BFK+z+z9ELwAA3YoHtMIZdpnuaX+ahaW9AWL0vsqeENpwvPwSnwplP6a e7qRUApellQECuq/jyGVQOAwZqfNgnj7KL+dNBvZlnVzDytmGaA8rZHpNovsNDPRqYMJ T+AHwgTiggarStIxyBssUX+1wXgbBg8Js6PdXcTYjii1kPPxjOuWLAkr2mTmm3AJuDc6 7znA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=Ac0VM0dcPTqx37kRHqfKzIT7iqCUTygIk3BYJ8irkmI=; b=Jwr8Es2ulD67Hi/qy0dGJQJj7ZZ52/E4gJ4b2laClr4dOMIPFsC46L/fMfovHW3vs+ yENzB1Bmi/m7QX/QshJnTj7unRVjgaWzQqYqpvy6xov7h8BHAF5rnj6T7W/jYteeKz2x eEnWEmmvTNIQIlDhZ5euqwiba6/97Eswwv/92PMizkKyC8ljl8+wZ6qfj4KFbo9tSvMw dg7bEOSVy7kzg8lh5ED+hrAyFh4D7b17pGUso+m0x915Q80CQguF/R7KVUFrk+cL9ncs sD4gXZ1bdC8L2AaxZ0AtG5+zmm3r9+vrMm/YtlGJydqOWl7GwMUA4yiWVkkQxERogab+ BQ4g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=JTyj4rto; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id bu6si10465315edb.94.2020.11.10.12.56.41; Tue, 10 Nov 2020 12:57:05 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=JTyj4rto; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730979AbgKJUwT (ORCPT + 99 others); Tue, 10 Nov 2020 15:52:19 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35926 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726179AbgKJUwS (ORCPT ); Tue, 10 Nov 2020 15:52:18 -0500 Received: from mail-yb1-xb43.google.com (mail-yb1-xb43.google.com [IPv6:2607:f8b0:4864:20::b43]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BBE60C0613D3 for ; Tue, 10 Nov 2020 12:52:16 -0800 (PST) Received: by mail-yb1-xb43.google.com with SMTP id f140so12975463ybg.3 for ; Tue, 10 Nov 2020 12:52:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=Ac0VM0dcPTqx37kRHqfKzIT7iqCUTygIk3BYJ8irkmI=; b=JTyj4rtordrueuHgkRvdME7gUYeUTCkqogHaxc6bBhVlibTpEnwnZBjuQcZtyD79Z5 CLc7fA0/gYX9LlScauGCt1LDeUYhs0MkaXLD1XZLy/rwpjLvBn4FPZIt5l5gaGanH7pM SD6RWa43QvncETp12N7ewpeQlLU+t0cx9nGG7V/kSTJkGt7jFuO55Y1W1c1yOrFhw/ZF KeQiLShdIZgjNxLbheUKxXTNRP8X8rEfYenc9tfXgXP1Sr80PcMArkm8WySoPg2hol2p mjp0Y2ctrE/2HVl4rh0pQd6avSY3nYFrJAcijlztRRZ5qmt4xrIHK9BbQw4sRRXLqzte Tlrw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=Ac0VM0dcPTqx37kRHqfKzIT7iqCUTygIk3BYJ8irkmI=; b=rkSCSjS8tRsvcxtkyWKtcPmqfsdFlyyDee0UEJX41RLheq0c4QddWOZeCvg0DrKbzK 22D6xkVNFpON19rr1DQwP67XlZY/jucH+VDuzN06mofJQbGYGx9aYSVZuDPt0mqtU7/t abkkGlB+A8iwSB0upCDESxX3QlSMEu+iElLfR0wB7WnyBs457cjEbnMI+Uud2ibArF1H aLESkeT1d/9xCjLJJMMCe4soKWqFpkf3+U+gJwH+KWE/6W2Xa1YSkQtU9DE8zv53z3OP owzmCCkQWopmJM873zCF4WtDQXkjJCpqnRlKtl1rE0nKdK/zkgm2bLAY+QZk/uyEwK0x Vukw== X-Gm-Message-State: AOAM531zI76X8abFvxTsOxu5d6Leka5GgKjvazzsjtUI4Nnpf+o6GhhB q54oF6nQPXdCTEwsY80E7g7pd7dXDdaK7Vz0VZSfjA== X-Received: by 2002:a25:a567:: with SMTP id h94mr27245703ybi.211.1605041535584; Tue, 10 Nov 2020 12:52:15 -0800 (PST) MIME-Version: 1.0 References: <20201109021254.79755-1-like.xu@linux.intel.com> <20201110151257.GP2611@hirez.programming.kicks-ass.net> <20201110153721.GQ2651@hirez.programming.kicks-ass.net> In-Reply-To: <20201110153721.GQ2651@hirez.programming.kicks-ass.net> From: Stephane Eranian Date: Tue, 10 Nov 2020 12:52:04 -0800 Message-ID: Subject: Re: [PATCH] perf/intel: Remove Perfmon-v4 counter_freezing support To: Peter Zijlstra Cc: Like Xu , Paolo Bonzini , kvm@vger.kernel.org, Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Kan Liang , luwei.kang@intel.com, Thomas Gleixner , "Wang, Wei W" , Tony Luck , Mark Gross , Srinivas Pandruvada , LKML Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Nov 10, 2020 at 7:37 AM Peter Zijlstra wrote: > > On Tue, Nov 10, 2020 at 04:12:57PM +0100, Peter Zijlstra wrote: > > On Mon, Nov 09, 2020 at 10:12:37AM +0800, Like Xu wrote: > > > The Precise Event Based Sampling(PEBS) supported on Intel Ice Lake server > > > platforms can provide an architectural state of the instruction executed > > > after the instruction that caused the event. This patch set enables the > > > the PEBS via DS feature for KVM (also non) Linux guest on the Ice Lake. > > > The Linux guest can use PEBS feature like native: > > > > > > # perf record -e instructions:ppp ./br_instr a > > > # perf record -c 100000 -e instructions:pp ./br_instr a > > > > > > If the counter_freezing is not enabled on the host, the guest PEBS will > > > be disabled on purpose when host is using PEBS facility. By default, > > > KVM disables the co-existence of guest PEBS and host PEBS. > > > > Uuhh, what?!? counter_freezing should never be enabled, its broken. Let > > me go delete all that code. > > --- > Subject: perf/intel: Remove Perfmon-v4 counter_freezing support > > Perfmon-v4 counter freezing is fundamentally broken; remove this default > disabled code to make sure nobody uses it. > > The feature is called Freeze-on-PMI in the SDM, and if it would do that, > there wouldn't actually be a problem, *however* it does something subtly > different. It globally disables the whole PMU when it raises the PMI, > not when the PMI hits. > > This means there's a window between the PMI getting raised and the PMI > actually getting served where we loose events and this violates the > perf counter independence. That is, a counting event should not result > in a different event count when there is a sampling event co-scheduled. > What is implemented is Freeze-on-Overflow, yet it is described as Freeze-on-PMI. That, in itself, is a problem. I agree with you on that point. However, there are use cases for both modes. I can sample on event A and count on B, C and when A overflows, I want to snapshot B, C. For that I want B, C at the moment of the overflow, not at the moment the PMI is delivered. Thus, youd would want the Freeze-on-overflow behavior. You can collect in this mode with the perf tool, IIRC: perf record -e '{cycles,instructions,branches:S}' .... The other usage model is that of the replay-debugger (rr) which you are alluding to, which needs precise count of an event including during the skid window. For that, you need Freeze-on-PMI (delivered). Note that this tool likely only cares about user level occurrences of events. As for counter independence, I am not sure it holds in all cases. If the events are setup for user+kernel then, as soon as you co-schedule a sampling event, you will likely get more counts on the counting event due to the additional kernel entries/exits caused by interrupt-based profiling. Even if you were to restrict to user level only, I would expect to see a few more counts. > This is known to break existing software. > > Signed-off-by: Peter Zijlstra (Intel) > --- > arch/x86/events/intel/core.c | 152 ------------------------------------------- > arch/x86/events/perf_event.h | 3 +- > 2 files changed, 1 insertion(+), 154 deletions(-) > > diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c > index c79748f6921d..9909dfa6fb12 100644 > --- a/arch/x86/events/intel/core.c > +++ b/arch/x86/events/intel/core.c > @@ -2121,18 +2121,6 @@ static void intel_tfa_pmu_enable_all(int added) > intel_pmu_enable_all(added); > } > > -static void enable_counter_freeze(void) > -{ > - update_debugctlmsr(get_debugctlmsr() | > - DEBUGCTLMSR_FREEZE_PERFMON_ON_PMI); > -} > - > -static void disable_counter_freeze(void) > -{ > - update_debugctlmsr(get_debugctlmsr() & > - ~DEBUGCTLMSR_FREEZE_PERFMON_ON_PMI); > -} > - > static inline u64 intel_pmu_get_status(void) > { > u64 status; > @@ -2696,95 +2684,6 @@ static int handle_pmi_common(struct pt_regs *regs, u64 status) > return handled; > } > > -static bool disable_counter_freezing = true; > -static int __init intel_perf_counter_freezing_setup(char *s) > -{ > - bool res; > - > - if (kstrtobool(s, &res)) > - return -EINVAL; > - > - disable_counter_freezing = !res; > - return 1; > -} > -__setup("perf_v4_pmi=", intel_perf_counter_freezing_setup); > - > -/* > - * Simplified handler for Arch Perfmon v4: > - * - We rely on counter freezing/unfreezing to enable/disable the PMU. > - * This is done automatically on PMU ack. > - * - Ack the PMU only after the APIC. > - */ > - > -static int intel_pmu_handle_irq_v4(struct pt_regs *regs) > -{ > - struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events); > - int handled = 0; > - bool bts = false; > - u64 status; > - int pmu_enabled = cpuc->enabled; > - int loops = 0; > - > - /* PMU has been disabled because of counter freezing */ > - cpuc->enabled = 0; > - if (test_bit(INTEL_PMC_IDX_FIXED_BTS, cpuc->active_mask)) { > - bts = true; > - intel_bts_disable_local(); > - handled = intel_pmu_drain_bts_buffer(); > - handled += intel_bts_interrupt(); > - } > - status = intel_pmu_get_status(); > - if (!status) > - goto done; > -again: > - intel_pmu_lbr_read(); > - if (++loops > 100) { > - static bool warned; > - > - if (!warned) { > - WARN(1, "perfevents: irq loop stuck!\n"); > - perf_event_print_debug(); > - warned = true; > - } > - intel_pmu_reset(); > - goto done; > - } > - > - > - handled += handle_pmi_common(regs, status); > -done: > - /* Ack the PMI in the APIC */ > - apic_write(APIC_LVTPC, APIC_DM_NMI); > - > - /* > - * The counters start counting immediately while ack the status. > - * Make it as close as possible to IRET. This avoids bogus > - * freezing on Skylake CPUs. > - */ > - if (status) { > - intel_pmu_ack_status(status); > - } else { > - /* > - * CPU may issues two PMIs very close to each other. > - * When the PMI handler services the first one, the > - * GLOBAL_STATUS is already updated to reflect both. > - * When it IRETs, the second PMI is immediately > - * handled and it sees clear status. At the meantime, > - * there may be a third PMI, because the freezing bit > - * isn't set since the ack in first PMI handlers. > - * Double check if there is more work to be done. > - */ > - status = intel_pmu_get_status(); > - if (status) > - goto again; > - } > - > - if (bts) > - intel_bts_enable_local(); > - cpuc->enabled = pmu_enabled; > - return handled; > -} > - > /* > * This handler is triggered by the local APIC, so the APIC IRQ handling > * rules apply: > @@ -4081,9 +3980,6 @@ static void intel_pmu_cpu_starting(int cpu) > if (x86_pmu.version > 1) > flip_smm_bit(&x86_pmu.attr_freeze_on_smi); > > - if (x86_pmu.counter_freezing) > - enable_counter_freeze(); > - > /* Disable perf metrics if any added CPU doesn't support it. */ > if (x86_pmu.intel_cap.perf_metrics) { > union perf_capabilities perf_cap; > @@ -4154,9 +4050,6 @@ static void free_excl_cntrs(struct cpu_hw_events *cpuc) > static void intel_pmu_cpu_dying(int cpu) > { > fini_debug_store_on_cpu(cpu); > - > - if (x86_pmu.counter_freezing) > - disable_counter_freeze(); > } > > void intel_cpuc_finish(struct cpu_hw_events *cpuc) > @@ -4548,39 +4441,6 @@ static __init void intel_nehalem_quirk(void) > } > } > > -static const struct x86_cpu_desc counter_freezing_ucodes[] = { > - INTEL_CPU_DESC(INTEL_FAM6_ATOM_GOLDMONT, 2, 0x0000000e), > - INTEL_CPU_DESC(INTEL_FAM6_ATOM_GOLDMONT, 9, 0x0000002e), > - INTEL_CPU_DESC(INTEL_FAM6_ATOM_GOLDMONT, 10, 0x00000008), > - INTEL_CPU_DESC(INTEL_FAM6_ATOM_GOLDMONT_D, 1, 0x00000028), > - INTEL_CPU_DESC(INTEL_FAM6_ATOM_GOLDMONT_PLUS, 1, 0x00000028), > - INTEL_CPU_DESC(INTEL_FAM6_ATOM_GOLDMONT_PLUS, 8, 0x00000006), > - {} > -}; > - > -static bool intel_counter_freezing_broken(void) > -{ > - return !x86_cpu_has_min_microcode_rev(counter_freezing_ucodes); > -} > - > -static __init void intel_counter_freezing_quirk(void) > -{ > - /* Check if it's already disabled */ > - if (disable_counter_freezing) > - return; > - > - /* > - * If the system starts with the wrong ucode, leave the > - * counter-freezing feature permanently disabled. > - */ > - if (intel_counter_freezing_broken()) { > - pr_info("PMU counter freezing disabled due to CPU errata," > - "please upgrade microcode\n"); > - x86_pmu.counter_freezing = false; > - x86_pmu.handle_irq = intel_pmu_handle_irq; > - } > -} > - > /* > * enable software workaround for errata: > * SNB: BJ122 > @@ -4966,9 +4826,6 @@ __init int intel_pmu_init(void) > max((int)edx.split.num_counters_fixed, assume); > } > > - if (version >= 4) > - x86_pmu.counter_freezing = !disable_counter_freezing; > - > if (boot_cpu_has(X86_FEATURE_PDCM)) { > u64 capabilities; > > @@ -5090,7 +4947,6 @@ __init int intel_pmu_init(void) > > case INTEL_FAM6_ATOM_GOLDMONT: > case INTEL_FAM6_ATOM_GOLDMONT_D: > - x86_add_quirk(intel_counter_freezing_quirk); > memcpy(hw_cache_event_ids, glm_hw_cache_event_ids, > sizeof(hw_cache_event_ids)); > memcpy(hw_cache_extra_regs, glm_hw_cache_extra_regs, > @@ -5117,7 +4973,6 @@ __init int intel_pmu_init(void) > break; > > case INTEL_FAM6_ATOM_GOLDMONT_PLUS: > - x86_add_quirk(intel_counter_freezing_quirk); > memcpy(hw_cache_event_ids, glp_hw_cache_event_ids, > sizeof(hw_cache_event_ids)); > memcpy(hw_cache_extra_regs, glp_hw_cache_extra_regs, > @@ -5577,13 +5432,6 @@ __init int intel_pmu_init(void) > pr_cont("full-width counters, "); > } > > - /* > - * For arch perfmon 4 use counter freezing to avoid > - * several MSR accesses in the PMI. > - */ > - if (x86_pmu.counter_freezing) > - x86_pmu.handle_irq = intel_pmu_handle_irq_v4; > - > if (x86_pmu.intel_cap.perf_metrics) > x86_pmu.intel_ctrl |= 1ULL << GLOBAL_CTRL_EN_PERF_METRICS; > > diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h > index 10032f023fcc..4084fa31cc21 100644 > --- a/arch/x86/events/perf_event.h > +++ b/arch/x86/events/perf_event.h > @@ -681,8 +681,7 @@ struct x86_pmu { > > /* PMI handler bits */ > unsigned int late_ack :1, > - enabled_ack :1, > - counter_freezing :1; > + enabled_ack :1; > /* > * sysfs attrs > */