Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934094AbeAHP37 (ORCPT + 1 other); Mon, 8 Jan 2018 10:29:59 -0500 Received: from mga07.intel.com ([134.134.136.100]:48195 "EHLO mga07.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933967AbeAHP3x (ORCPT ); Mon, 8 Jan 2018 10:29:53 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.46,330,1511856000"; d="scan'208";a="18400573" From: kan.liang@intel.com To: peterz@infradead.org, mingo@redhat.com, acme@kernel.org, linux-kernel@vger.kernel.org Cc: tglx@linutronix.de, jolsa@redhat.com, eranian@google.com, ak@linux.intel.com, Kan Liang Subject: [RESEND PATCH V2 1/4] perf/x86/intel: fix event update for auto-reload Date: Mon, 8 Jan 2018 07:15:13 -0800 Message-Id: <1515424516-143728-2-git-send-email-kan.liang@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1515424516-143728-1-git-send-email-kan.liang@intel.com> References: <1515424516-143728-1-git-send-email-kan.liang@intel.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: From: Kan Liang There is bug when mmap read event->count with large PEBS enabled. Here is an example. #./read_count 0x71f0 0x122c0 0x1000000001c54 0x100000001257d 0x200000000bdc5 There is auto-reload mechanism enabled for PEBS events in fixed period mode. But the calculation of event->count does not take the auto-reload values into account. Anyone who read the event->count will get wrong result, e.g x86_pmu_read. Also, the calculation of hwc->period_left is wrong either. It impacts the accuracy of period for the first record in PEBS multiple records. The issue is introduced with the auto-reload mechanism enabled by commit 851559e35fd5 ("perf/x86/intel: Use the PEBS auto reload mechanism when possible") Introduce intel_pmu_save_and_restart_reload only for auto-reload. The formula to calculate the event->count is as below. event->count = period left from last time + (reload_times - 1) * reload_val + latency of PMI handler prev_count is the last observed hardware counter value. Just the same as non-auto-reload, its absolute value is the period of the first record. It should not update with each reload. Because it doesn't 'observe' the hardware counter for each auto-reload. For the second and later records, the period is exactly the reload value. Just need to simply add (reload_times - 1) * reload_val to event->count. The calculation of the latency of PMI handler is a little bit different as non-auto-reload. Because the start point is -reload_value. It needs to be adjusted by adding reload_value. The period_left needs to do the same adjustment. There is nothing need to do in x86_perf_event_set_period(). Because it is fixed period. The period_left is already adjusted. Signed-off-by: Kan Liang --- arch/x86/events/intel/ds.c | 69 ++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 67 insertions(+), 2 deletions(-) diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c index 3674a4b..cc1f373 100644 --- a/arch/x86/events/intel/ds.c +++ b/arch/x86/events/intel/ds.c @@ -1251,17 +1251,82 @@ get_next_pebs_record_by_bit(void *base, void *top, int bit) return NULL; } +/* + * Specific intel_pmu_save_and_restart() for auto-reload. + */ +static int intel_pmu_save_and_restart_reload(struct perf_event *event, + u64 reload_val, + int reload_times) +{ + struct hw_perf_event *hwc = &event->hw; + int shift = 64 - x86_pmu.cntval_bits; + u64 prev_raw_count, new_raw_count; + u64 delta; + + if ((reload_times == 0) || (reload_val == 0)) + return intel_pmu_save_and_restart(event); + + /* + * Careful: an NMI might modify the previous event value. + * + * Our tactic to handle this is to first atomically read and + * exchange a new raw count - then add that new-prev delta + * count to the generic event atomically: + */ +again: + prev_raw_count = local64_read(&hwc->prev_count); + rdpmcl(hwc->event_base_rdpmc, new_raw_count); + + if (local64_cmpxchg(&hwc->prev_count, prev_raw_count, + new_raw_count) != prev_raw_count) + goto again; + + /* + * Now we have the new raw value and have updated the prev + * timestamp already. We can now calculate the elapsed delta + * (event-)time and add that to the generic event. + * + * Careful, not all hw sign-extends above the physical width + * of the count. + * + * event->count = period left from last time + + * (reload_times - 1) * reload_val + + * latency of PMI handler + * The period left from last time can be got from -prev_count. + * The start points of counting is always -reload_val. + * So the real latency of PMI handler is reload_val + new_raw_count. + */ + delta = (reload_val << shift) + (new_raw_count << shift) - + (prev_raw_count << shift); + delta >>= shift; + + local64_add(reload_val * (reload_times - 1), &event->count); + local64_add(delta, &event->count); + local64_sub(delta, &hwc->period_left); + + return x86_perf_event_set_period(event); +} + static void __intel_pmu_pebs_event(struct perf_event *event, struct pt_regs *iregs, void *base, void *top, int bit, int count) { + struct hw_perf_event *hwc = &event->hw; struct perf_sample_data data; struct pt_regs regs; void *at = get_next_pebs_record_by_bit(base, top, bit); - if (!intel_pmu_save_and_restart(event) && - !(event->hw.flags & PERF_X86_EVENT_AUTO_RELOAD)) + if (hwc->flags & PERF_X86_EVENT_AUTO_RELOAD) { + /* + * Now, auto-reload is only enabled in fixed period mode. + * The reload value is always hwc->sample_period. + * May need to change it, if auto-reload is enabled in + * freq mode later. + */ + intel_pmu_save_and_restart_reload(event, hwc->sample_period, + count); + } else if (!intel_pmu_save_and_restart(event)) return; while (count > 1) { -- 2.7.4