Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752846AbdCASfR (ORCPT ); Wed, 1 Mar 2017 13:35:17 -0500 Received: from foss.arm.com ([217.140.101.70]:51462 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752001AbdCASfN (ORCPT ); Wed, 1 Mar 2017 13:35:13 -0500 Date: Wed, 1 Mar 2017 18:10:33 +0000 From: Mark Rutland To: Neil Leeder Cc: Catalin Marinas , Will Deacon , Peter Zijlstra , Ingo Molnar , linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, Mark Langsdorf , Mark Salter , Jon Masters , Timur Tabi , Jeremy Linton , marc.zyngier@arm.com Subject: Re: [PATCH/RFC] arm64: pmu: add Qualcomm Technologies extensions Message-ID: <20170301181032.GL28874@leverpostej> References: <1488385085-19238-1-git-send-email-nleeder@codeaurora.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <1488385085-19238-1-git-send-email-nleeder@codeaurora.org> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 7641 Lines: 206 Hi Neil, On Wed, Mar 01, 2017 at 11:18:05AM -0500, Neil Leeder wrote: > Adds CPU PMU perf events support for Qualcomm Technologies' Falkor CPU. > > The Qualcomm Technologies CPU PMU is named qcom_pmuv3 and provides > extensions to the architected PMU events. Is this is a strict superset of PMUv3 (that could validly be treated as just PMUv3), or do those IMP DEF parts need to be poked to use this at all? What is reported by ID_AA64DFR0_EL1.PMUVer on these CPUs? Quite frankly, I'm less than thrilled about the prospect of IMPLEMENTATION DEFINED CPU PMUs that fall outside of the architected PMU model, especially for ACPI systems where the raison d'ĂȘtre is standards and uniformity, and where we have no sensible mechanism to provide information regarding IMPLEMENTATION DEFINED functionality. This has knock-on effects for other things, like userspace PMU access and/or virtualization, and this multiplies the support effort. KVM already has (architected) PMU support, and without a corresponding KVM patch this is at best insufficient. I don't imagine the KVM folk will be too thrilled about the prospect of emulating an IMPLEMENTATION DEFINED CPU feature like this. Note that system PMUs are a very different case, since these are not (implicitly) passed through to guests, nor do they have significant overlap with architected functionality or each other. It's fine to add support for those to the host kernel. > The extended events are implemented by a set of registers which > are programmed by shifting an event code into a group field. > The PMNx event then points to that region/group combo. > > Restrictions that limit only one concurrent region/group combination > are also enforced. > > Signed-off-by: Neil Leeder > --- > The Qualcomm Technologies CPU PMU extensions have an additional set of registers > which need to be programmed when configuring an event. These are the PMRESRs, > which are similar to the krait & scorpion registers in armv7, and the L2 > variants in the Qualcomm Technologies L2 PMU driver. If these *must* be programmed, it sounds like this isn't PMUv3 compatible. [...] > There are additional constraints on qc events. The armv7 implementation checks > these in get_event_idx, but during L2 PMU reviews it was thought better to do > these during init processing where possible. I added these in the map_event > callback because its the only callback from within armpmu_event_init(). I'm not > sure if that would be considered stretching the acceptable use of that interface, > so I'm open to other suggestions. As a general rule for PMU drivers: * pmu::event_init() should check whether the entire group the event is in (i.e. the parent and all siblings) can potentially be allocated into counters simultaneously. If it is always impossible for the whole group to go on, pmu::event_init should fail, since the group is impossible. * pmu::add() needs to handle inter-group and intra-group conflicts that can arise dynamically dependent on how (other) events are scheduled. This also needs to fail in the event of a conflict. > The qc driver also needs to check conflicts between events, using a bitmap. This > has similar use to the hw_events->used_mask. I added the event_conflicts bitmap > into hw_events, although I'm not sure if that's the best place for an > implementation-specific field. An alternative would be a static DEFINE_PER_CPU > bitmap, although that didn't seem as clean, but may be more acceptable. > > qc_max_resr is a variable, rather than a constant, to accommodate future > processors with different numbers of RESRs. > > This requires Jeremy Linton's patch sequence to add arm64 CPU PMU ACPI support: > https://patchwork.kernel.org/patch/9533677/ As a heads-up, I'm currently working on an alternative approach that you can find at: git://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git arm/perf/acpi That's a work-in-progress, and there are a few issues (notably IRQ management) that I'm currently fixing up. I also intend to add a PMUv3 check to the PMUv3 probe. [...] > +static bool qc_pmu; Sorry, but a global boolean to describe whether a (single ?) PMU instance is a particular implementation is not going to fly. > +static void qc_pmu_enable_event(struct perf_event *event, > + struct hw_perf_event *hwc, int idx); > +static void qc_pmu_disable_event(struct perf_event *event, > + struct hw_perf_event *hwc); > + > /* > * Perf Events' indices > */ > @@ -704,10 +730,13 @@ static void armv8pmu_enable_event(struct perf_event *event) > */ > armv8pmu_disable_counter(idx); > > - /* > - * Set event (if destined for PMNx counters). > - */ > - armv8pmu_write_evtype(idx, hwc->config_base); > + if (qc_pmu) > + qc_pmu_enable_event(event, hwc, idx); > + else > + /* > + * Set event (if destined for PMNx counters). > + */ > + armv8pmu_write_evtype(idx, hwc->config_base); Similarly, this is not a workable structure for these functions. [...] > +static void qc_pmu_reset(void *info) > +{ > + qc_clear_resrs(); > + armv8pmu_reset(info); > +} Is the QC-specific reset required only if we use the QC-specific events, or is that required for the standard PMUv3 features to be usable? Are standard PMUv3 events guaranteed to work if we didn't call qc_clear_resrs()? If this is not required for the standard PMUv3 features, and/or any IMP DEF reset is performed by FW, it looks like we *may* be able to treat this as PMUv3. > +static void qc_pmu_enable_event(struct perf_event *event, > + struct hw_perf_event *hwc, int idx) > +{ > + unsigned int reg, code, group; > + > + if (QC_EVT_PFX(hwc->config_base) != QC_EVT_PREFIX) { > + armv8pmu_write_evtype(idx, hwc->config_base); > + return; > + } This really shows that this is not a workable structure. It's hideous to hook the PMUv3 code to call this, then try to duplicate what the PMUv3 code would have done in this case. [...] > +static int armv8_falkor_pmu_init(struct arm_pmu *cpu_pmu) > +{ > + armv8_pmu_init(cpu_pmu); > + cpu_pmu->name = "qcom_pmuv3"; > + cpu_pmu->map_event = armv8_qc_map_event; > + cpu_pmu->reset = qc_pmu_reset; > + cpu_pmu->attr_groups[ARMPMU_ATTR_GROUP_EVENTS] = > + &armv8_pmuv3_events_attr_group; > + cpu_pmu->attr_groups[ARMPMU_ATTR_GROUP_FORMATS] = > + &qc_pmu_format_attr_group; > + cpu_pmu->get_event_idx = qc_get_event_idx; > + cpu_pmu->clear_event_idx = qc_clear_event_idx; > + > + qc_max_resr = QC_FALKOR_MAX_RESR; > + qc_clear_resrs(); > + qc_pmu = true; > + > + if (qc_max_resr > QC_MAX_RESRS) { > + /* Sanity check */ > + pr_err("qcom_pmuv3: max number of RESRs exceeded\n"); > + return -EINVAL; > + } > + > + return armv8pmu_probe_pmu(cpu_pmu); > +} > + > static const struct of_device_id armv8_pmu_of_device_ids[] = { > {.compatible = "arm,armv8-pmuv3", .data = armv8_pmuv3_init}, > {.compatible = "arm,cortex-a53-pmu", .data = armv8_a53_pmu_init}, > @@ -1087,6 +1421,8 @@ static int armv8_vulcan_pmu_init(struct arm_pmu *cpu_pmu) > * aren't supported by the current PMU are disabled. > */ > static const struct pmu_probe_info armv8_pmu_probe_table[] = { > + PMU_PROBE(QCOM_CPU_PART_FALKOR_V1 << MIDR_PARTNUM_SHIFT, > + MIDR_PARTNUM_MASK, armv8_falkor_pmu_init), > PMU_PROBE(0, 0, armv8_pmuv3_init), /* enable all defined counters */ > { /* sentinel value */ } We don't special-case other PMUs here, and the plan for ACPI was to rely solely on the architectural information, i.e. the architected ID registers associated with PMUv3. I don't think we should special-case implementations like this. My series removes this MIDR matching. Thanks, Mark.