Received: by 10.223.176.5 with SMTP id f5csp772768wra; Tue, 6 Feb 2018 07:09:13 -0800 (PST) X-Google-Smtp-Source: AH8x225M1DbVRvNd/MpldiX5d+tj1uWbsq9ZpT2j7iBilxrWA7ZV/a830a9chfuhZQCYrOjR88dK X-Received: by 10.98.82.8 with SMTP id g8mr2765547pfb.212.1517929753787; Tue, 06 Feb 2018 07:09:13 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1517929753; cv=none; d=google.com; s=arc-20160816; b=FvhtC0W/VHqmSrGwDCPNbi291s/mkMiNMeEBSF4Cfl6IHkMiRXi23MnjrPi46/l+J0 uuFowJfTSp9LMa0AEgbDQJpXau5UcZwP8FzkXr/Sai0C8DLp55TIyvxyQHV3C+w3YF3D CbFMI5nyqYBBXzKj6cP8S+KRVFTwZQ/u1G+SfkiRTJRl6qZqwmIfiFg1RjwPWlR3+KIS 8Y8kc3aPxFuQd0Ng5kSngcmASkAZYU6BquMwd3rC4sxu6VQ0zFuct8xVX3xsfGOoh36G 7SNfCDPTbu1F3RM3Ze3kgAjdjbPDl3xjufbC2TMVdX17mTOu2rk8sQqC7Fw9GW+iQIvc doOA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature:arc-authentication-results; bh=H9NbtJ7Ym+wh+/DyvDqkE9m9vJtkCPK/ndbsZR38vTI=; b=Gi4381u+20sKcH4LkViVyhorfcJkxNSxo7kpiyrF+qQgBKbhti/tfh8rs7m9vG4qHN CTNDJ2DDcMt8kVEuuWfFlAlMkSjaOyhmwPGjysUWbOnWSIRP2SpZkZuJedzLI12W2xgT w5pxaaLLqNirhtOaUdGW2i+PPUCKEf0eg4zbn7kEhL3zeThuibWpOXHzJi7kvpn0BlSu GXo99T+GEZzV6X01POs+hnAoiIMo4GRv8v21oOaycMqvAzRjrZuJGntYolFp0mG+kQ0e gVwvsNtrVaZ5uDkMNbVLVclLKRiCMVm0icJAKYp4YWRVw8R31kfwSG9mlemuY/qOVwbd LhRQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@infradead.org header.s=bombadil.20170209 header.b=h3tJsWL8; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id c2si1205047pgq.719.2018.02.06.07.08.58; Tue, 06 Feb 2018 07:09:13 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail header.i=@infradead.org header.s=bombadil.20170209 header.b=h3tJsWL8; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752372AbeBFPG6 (ORCPT + 99 others); Tue, 6 Feb 2018 10:06:58 -0500 Received: from bombadil.infradead.org ([65.50.211.133]:37267 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752241AbeBFPGy (ORCPT ); Tue, 6 Feb 2018 10:06:54 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20170209; h=In-Reply-To:Content-Type:MIME-Version :References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=H9NbtJ7Ym+wh+/DyvDqkE9m9vJtkCPK/ndbsZR38vTI=; b=h3tJsWL80id6fvugBSVhp4Wls 0TTjOn0J8AAeBk6prwZWLektnrQuThro4Rcbi0BfAp8AVUbWNPgenkgglvkWzDDvvjbDVS62MOLGc btP2zzjNgxyVfvdHlr/XYyCea9wuzyPtwLRxY6WDDlac4ZWv65MWkqCEKJFTmBSYA98sszDQL4Rvz s5x7w8dyF5IvgZMrOVW64JtVAC4THJYKbD4dQBE79s5cXe2O13MwFga8zVxF/wkM/SpAClX6IvAtA UPEGp4f2cP5qz2WuZbo0ml8d9ZFhMe6NMrTXMYtmEd0HVvMvU+va2eh/OYYAwvIc9nBMdugbwePYO VFsmZaECA==; Received: from j217100.upc-j.chello.nl ([24.132.217.100] helo=hirez.programming.kicks-ass.net) by bombadil.infradead.org with esmtpsa (Exim 4.89 #1 (Red Hat Linux)) id 1ej4pe-00025A-QM; Tue, 06 Feb 2018 15:06:51 +0000 Received: by hirez.programming.kicks-ass.net (Postfix, from userid 1000) id 3851F2029F9F9; Tue, 6 Feb 2018 16:06:48 +0100 (CET) Date: Tue, 6 Feb 2018 16:06:48 +0100 From: Peter Zijlstra To: kan.liang@linux.intel.com Cc: mingo@redhat.com, linux-kernel@vger.kernel.org, acme@kernel.org, tglx@linutronix.de, jolsa@redhat.com, eranian@google.com, ak@linux.intel.com Subject: Re: [PATCH V3 1/5] perf/x86/intel: fix event update for auto-reload Message-ID: <20180206150648.GK2249@hirez.programming.kicks-ass.net> References: <1517243373-355481-1-git-send-email-kan.liang@linux.intel.com> <1517243373-355481-2-git-send-email-kan.liang@linux.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1517243373-355481-2-git-send-email-kan.liang@linux.intel.com> User-Agent: Mutt/1.9.2 (2017-12-15) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Jan 29, 2018 at 08:29:29AM -0800, kan.liang@linux.intel.com wrote: > +/* > + * Specific intel_pmu_save_and_restart() for auto-reload. > + * It only be called from drain_pebs(). > + */ > +static int intel_pmu_save_and_restart_reload(struct perf_event *event, > + int reload_times) > +{ > + struct hw_perf_event *hwc = &event->hw; > + int shift = 64 - x86_pmu.cntval_bits; > + u64 reload_val = hwc->sample_period; > + u64 prev_raw_count, new_raw_count; > + u64 delta; > + > + WARN_ON((reload_times == 0) || (reload_val == 0)); I'm struggling to see why !count is a problem. You can call read() twice without having extra PEBS samples, right? Which also seems to suggest we have a problem in intel_pmu_drain_pebs*() because those will simply not call us when there aren't any PEBS events in the buffer, even though the count value can have changed. Your patch doesn't appear to address that. > + > + /* > + * drain_pebs() only happens when the PMU is disabled. > + * It doesnot need to specially handle the previous event value. > + * The hwc->prev_count will be updated in x86_perf_event_set_period(). That's completely buggered. You shouldn't need to call x86_perf_event_set_period() _at_all_. > + */ WARN_ON(this_cpu_read(cpu_hw_events.enabled)); > + prev_raw_count = local64_read(&hwc->prev_count); > + rdpmcl(hwc->event_base_rdpmc, new_raw_count); > + > + /* > + * Now we have the new raw value and have updated the prev > + * timestamp already. We can now calculate the elapsed delta > + * (event-)time and add that to the generic event. > + * > + * Careful, not all hw sign-extends above the physical width > + * of the count. > + * > + * There is a small gap between 'PEBS hardware is armed' and 'the NMI > + * is handled'. Because of the gap, the first record also needs to be > + * specially handled. True, but not at all relevant. > + * The formula to calculate the increments of event->count is as below. > + * The increments = the period for first record + > + * (reload_times - 1) * reload_val + > + * the gap > + * 'The period for first record' can be got from -prev_raw_count. > + * > + * 'The gap' = new_raw_count + reload_val. Because the start point of > + * counting is always -reload_val for auto-reload. That is still very much confused, what you seem to want is something like: Since the counter increments a negative counter value and overflows on the sign switch, giving the interval: [-period, 0] the difference between two consequtive reads is: A) value2 - value1; when no overflows have happened in between, B) (0 - value1) + (value2 - (-period)); when one overflow happened in between, C) (0 - value1) + (n - 1) * (period) + (value2 - (-period)); when @n overflows happened in betwee. Here A) is the obvious difference, B) is the extention to the discrete interval, where the first term is to the top of the interval and the second term is from the bottom of the next interval and 3) the extention to multiple intervals, where the middle term is the whole intervals covered. An equivalent of C, by reduction, is: value2 - value1 + n * period > + * > + * The period_left needs to do the same adjustment by adding > + * reload_val. > + */ > + delta = (reload_val << shift) + (new_raw_count << shift) - > + (prev_raw_count << shift); > + delta >>= shift; > + > + local64_add(reload_val * (reload_times - 1), &event->count); > + local64_add(delta, &event->count); > + local64_sub(delta, &hwc->period_left); Nobody cares about period_left, since AUTO_RELOAD already limits the periods to what the hardware supports, right? > + > + return x86_perf_event_set_period(event); > +} With the exception of handling 'empty' buffers, I ended up with the below. Please try again. --- --- a/arch/x86/events/core.c +++ b/arch/x86/events/core.c @@ -1156,16 +1156,13 @@ int x86_perf_event_set_period(struct per per_cpu(pmc_prev_left[idx], smp_processor_id()) = left; - if (!(hwc->flags & PERF_X86_EVENT_AUTO_RELOAD) || - local64_read(&hwc->prev_count) != (u64)-left) { - /* - * The hw event starts counting from this event offset, - * mark it to be able to extra future deltas: - */ - local64_set(&hwc->prev_count, (u64)-left); + /* + * The hw event starts counting from this event offset, + * mark it to be able to extra future deltas: + */ + local64_set(&hwc->prev_count, (u64)-left); - wrmsrl(hwc->event_base, (u64)(-left) & x86_pmu.cntval_mask); - } + wrmsrl(hwc->event_base, (u64)(-left) & x86_pmu.cntval_mask); /* * Due to erratum on certan cpu we need --- a/arch/x86/events/intel/ds.c +++ b/arch/x86/events/intel/ds.c @@ -1306,17 +1306,89 @@ get_next_pebs_record_by_bit(void *base, return NULL; } +/* + * Special variant of intel_pmu_save_and_restart() for auto-reload. + */ +static int +intel_pmu_save_and_restart_reload(struct perf_event *event, int count) +{ + struct hw_perf_event *hwc = &event->hw; + int shift = 64 - x86_pmu.cntval_bits; + u64 period = hwc->sample_period; + u64 prev_raw_count, new_raw_count; + u64 delta; + + WARN_ON(!period); + + /* + * drain_pebs() only happens when the PMU is disabled. + */ + WARN_ON(this_cpu_read(cpu_hw_events.enabled)); + + prev_raw_count = local64_read(&hwc->prev_count); + rdpmcl(hwc->event_base_rdpmc, new_raw_count); + local64_set(&hwx->prev_count, new_raw_count); + + /* + * Careful, not all hw sign-extends above the physical width + * of the counter. + */ + delta = (new_raw_count << shift) - (prev_raw_count << shift); + delta >>= shift; + + /* + * Since the counter increments a negative counter value and + * overflows on the sign switch, giving the interval: + * + * [-period, 0] + * + * the difference between two consequtive reads is: + * + * A) value2 - value1; + * when no overflows have happened in between, + * + * B) (0 - value1) + (value2 - (-period)); + * when one overflow happened in between, + * + * C) (0 - value1) + (n - 1) * (period) + (value2 - (-period)); + * when @n overflows happened in between. + * + * Here A) is the obvious difference, B) is the extention to the + * discrete interval, where the first term is to the top of the + * interval and the second term is from the bottom of the next + * interval and 3) the extention to multiple intervals, where the + * middle term is the whole intervals covered. + * + * An equivalent of C, by reduction, is: + * + * value2 - value1 + n * period + */ + local64_add(delta + period * count, &event->count); + + perf_event_update_userpage(event); + + return 0; +} + static void __intel_pmu_pebs_event(struct perf_event *event, struct pt_regs *iregs, void *base, void *top, int bit, int count) { + struct hw_perf_event *hwc = &event->hw; struct perf_sample_data data; struct pt_regs regs; void *at = get_next_pebs_record_by_bit(base, top, bit); - if (!intel_pmu_save_and_restart(event) && - !(event->hw.flags & PERF_X86_EVENT_AUTO_RELOAD)) + if (hwc->flags & PERF_X86_EVENT_AUTO_RELOAD) { + /* + * Now, auto-reload is only enabled in fixed period mode. + * The reload value is always hwc->sample_period. + * May need to change it, if auto-reload is enabled in + * freq mode later. + */ + intel_pmu_save_and_restart_reload(event, count); + } else if (!intel_pmu_save_and_restart(event)) return; while (count > 1) {