Received: by 10.223.176.46 with SMTP id f43csp681972wra; Wed, 24 Jan 2018 04:27:04 -0800 (PST) X-Google-Smtp-Source: AH8x226k6mEZjIt/gLObSz3dPcAjEAWFJTwmr3UtCuIUaX4277pCLThjvK2p2dv4JfScI0KXilp9 X-Received: by 10.98.67.138 with SMTP id l10mr12715886pfi.72.1516796823980; Wed, 24 Jan 2018 04:27:03 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1516796823; cv=none; d=google.com; s=arc-20160816; b=mIyvgZ3hpZDJ74YeAhWkuw/XMDCMpNRqCknEk9BC95nDTw6WyxBotlGm2ST8ZyteZC tVATuom3VjktLNH4RE+OwIqaAlTH/j7dtSn8VTgmCoBvwrefQ9Q7nsq9BAafkDWzROM5 vQaIa7+sNKqs1/jLBpBPfYejrn64h0ca+X4WVx37XmSB7rMh4xX3hrErn1cGANmUmCpO Id2nsM+wjF1sbSHzaQdO17Lugg6z328n7Ic1JYnOLjBwE0zlE+w/4I7SScVIgKCRcL4j FyzdZLJ2eRx5l6b/TBNAZGqfx1nyDQvP01sN8cUljaaUurENs4urTjxmjHAZgElEYmBy 2f7g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature:arc-authentication-results; bh=H8nk/v+X7s2sKeEJgichDr23wT//dwoSdJVD8m9FcBw=; b=NSmkd+5Vr0KFZObBYd3J4A9Gjbxh0LVnBs/JdrjbD9B4oyIXJFrCaX3tgRMEeuDY3U MR9FjNTEyXP9SvTdIFTBZhVg6nF05V2nvr+VEg5jnBOlHjTueJFkzc+0Ez0UZQN6KfWD JtLv3bGZ567vw4XKQKWSGuF4WrL6rY9d2K8xvf1Nr06YXZ04iL26rjxkK4+sumJ63aEs QzjkkbZwSQYK368eX28LgtU0iY53aYYMUfUmhDNvvvGH/q1Dk8eCRuIclJkPFd5NGW6p p5auPApnarncs9U5s+6PeK5Ray7qQR4dlUV5Lz/6w7upOPWne48SOP6WYoFuHYV8e0Oz Sfsg== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@infradead.org header.s=bombadil.20170209 header.b=ZHuNKmNj; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id m26-v6si131042pli.564.2018.01.24.04.26.50; Wed, 24 Jan 2018 04:27:03 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail header.i=@infradead.org header.s=bombadil.20170209 header.b=ZHuNKmNj; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933500AbeAXM00 (ORCPT + 99 others); Wed, 24 Jan 2018 07:26:26 -0500 Received: from bombadil.infradead.org ([65.50.211.133]:60769 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933386AbeAXM0Z (ORCPT ); Wed, 24 Jan 2018 07:26:25 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20170209; h=In-Reply-To:Content-Type:MIME-Version :References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=H8nk/v+X7s2sKeEJgichDr23wT//dwoSdJVD8m9FcBw=; b=ZHuNKmNjRGtZUeVWbQfPuSQJ0 HsDrNVBy+4hVNYaA+UfKlr7JGF6QKC7zfphzFbctQ86t6Tcs2OvqbMiUtSAbN3/ggT8fOihghPzd6 E5CPQW/Lp2bUcwoDcjnPGtg0wMO+Y2mZ/x2Y0fTbOTUqwJ8RM0GZhNh1qY0MDsVYc1bi3NDMdw72M nU0B/AJX3HrD1vV0tpnnvOM6N8zgpYS6kSzxbvxqtLc8m1dllaP5Lr+PmzK69SfpodILOBBzIrY8r tHwA2zp43+BHCtgPnqpQ2uDlBZnZANhA05H4KfiGcbYNG3lSY01ukXr7oBUg/Hn2ROtMmp3t9A8Hg KW13xZJEA==; Received: from j217100.upc-j.chello.nl ([24.132.217.100] helo=hirez.programming.kicks-ass.net) by bombadil.infradead.org with esmtpsa (Exim 4.89 #1 (Red Hat Linux)) id 1eeK8B-0005f4-VV; Wed, 24 Jan 2018 12:26:20 +0000 Received: by hirez.programming.kicks-ass.net (Postfix, from userid 1000) id 0BE002029B0FD; Wed, 24 Jan 2018 13:26:18 +0100 (CET) Date: Wed, 24 Jan 2018 13:26:18 +0100 From: Peter Zijlstra To: kan.liang@intel.com Cc: mingo@redhat.com, acme@kernel.org, linux-kernel@vger.kernel.org, tglx@linutronix.de, jolsa@redhat.com, eranian@google.com, ak@linux.intel.com, Kan Liang Subject: Re: [RESEND PATCH V2 1/4] perf/x86/intel: fix event update for auto-reload Message-ID: <20180124122618.GH2249@hirez.programming.kicks-ass.net> References: <1515424516-143728-1-git-send-email-kan.liang@intel.com> <1515424516-143728-2-git-send-email-kan.liang@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1515424516-143728-2-git-send-email-kan.liang@intel.com> User-Agent: Mutt/1.9.2 (2017-12-15) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Jan 08, 2018 at 07:15:13AM -0800, kan.liang@intel.com wrote: > The formula to calculate the event->count is as below: > event->count = period left from last time + > (reload_times - 1) * reload_val + > latency of PMI handler > > prev_count is the last observed hardware counter value. Just the same as > non-auto-reload, its absolute value is the period of the first record. > It should not update with each reload. Because it doesn't 'observe' the > hardware counter for each auto-reload. > > For the second and later records, the period is exactly the reload > value. Just need to simply add (reload_times - 1) * reload_val to > event->count. > > The calculation of the latency of PMI handler is a little bit different > as non-auto-reload. Because the start point is -reload_value. It needs > to be adjusted by adding reload_value. > The period_left needs to do the same adjustment. What's this about the PMI latency, we don't care about that in any other situation, right? Sure the PMI takes a bit of time, but we're not correcting for that anywhere, so why start now? > There is nothing need to do in x86_perf_event_set_period(). Because it > is fixed period. The period_left is already adjusted. Fixes tag is missing. > Signed-off-by: Kan Liang > --- > arch/x86/events/intel/ds.c | 69 ++++++++++++++++++++++++++++++++++++++++++++-- > 1 file changed, 67 insertions(+), 2 deletions(-) > > diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c > index 3674a4b..cc1f373 100644 > --- a/arch/x86/events/intel/ds.c > +++ b/arch/x86/events/intel/ds.c > @@ -1251,17 +1251,82 @@ get_next_pebs_record_by_bit(void *base, void *top, int bit) > return NULL; > } > > +/* > + * Specific intel_pmu_save_and_restart() for auto-reload. > + */ > +static int intel_pmu_save_and_restart_reload(struct perf_event *event, > + u64 reload_val, > + int reload_times) > +{ > + struct hw_perf_event *hwc = &event->hw; > + int shift = 64 - x86_pmu.cntval_bits; > + u64 prev_raw_count, new_raw_count; > + u64 delta; > + > + if ((reload_times == 0) || (reload_val == 0)) > + return intel_pmu_save_and_restart(event); Like Jiri, I find this confusing at best. If we need to call that one, you shouldn't have called this function to begin with. At best, have a WARN here or something. > + > + /* > + * Careful: an NMI might modify the previous event value. > + * > + * Our tactic to handle this is to first atomically read and > + * exchange a new raw count - then add that new-prev delta > + * count to the generic event atomically: > + */ For now this seems to only get called from *drain_pebs* which afaict only happens when we've disabled the PMU (either from sched_task or PMI). Now, you want to put this in the pmu::read() path, and that does not disable the PMU, but I don't think we can drain the PEBS buffer while its active, that's too full of races, so even there you'll have to disable stuff. So I don't think this is correct/desired for this case. > +again: > + prev_raw_count = local64_read(&hwc->prev_count); > + rdpmcl(hwc->event_base_rdpmc, new_raw_count); > + > + if (local64_cmpxchg(&hwc->prev_count, prev_raw_count, > + new_raw_count) != prev_raw_count) > + goto again; > + > + /* > + * Now we have the new raw value and have updated the prev > + * timestamp already. We can now calculate the elapsed delta > + * (event-)time and add that to the generic event. > + * > + * Careful, not all hw sign-extends above the physical width > + * of the count. > + * > + * event->count = period left from last time + > + * (reload_times - 1) * reload_val + > + * latency of PMI handler * > + * The period left from last time can be got from -prev_count. > + * The start points of counting is always -reload_val. > + * So the real latency of PMI handler is reload_val + new_raw_count. > + */ That is very confused, the PMI latency is utterly unrelated to anything you do here. > + delta = (reload_val << shift) + (new_raw_count << shift) - > + (prev_raw_count << shift); > + delta >>= shift; > + > + local64_add(reload_val * (reload_times - 1), &event->count); > + local64_add(delta, &event->count); And this is still wrong I think. Consider the case where !reload_times. We can easily call pmu::read() twice in one period. In that case we should increment count with (new - prev). Only once we get a new sample and are known to have wrapped, do we need to consider that wrap. > + local64_sub(delta, &hwc->period_left); > + > + return x86_perf_event_set_period(event); > +} > + > static void __intel_pmu_pebs_event(struct perf_event *event, > struct pt_regs *iregs, > void *base, void *top, > int bit, int count) > { > + struct hw_perf_event *hwc = &event->hw; > struct perf_sample_data data; > struct pt_regs regs; > void *at = get_next_pebs_record_by_bit(base, top, bit); > > - if (!intel_pmu_save_and_restart(event) && > - !(event->hw.flags & PERF_X86_EVENT_AUTO_RELOAD)) > + if (hwc->flags & PERF_X86_EVENT_AUTO_RELOAD) { > + /* > + * Now, auto-reload is only enabled in fixed period mode. > + * The reload value is always hwc->sample_period. > + * May need to change it, if auto-reload is enabled in > + * freq mode later. > + */ > + intel_pmu_save_and_restart_reload(event, hwc->sample_period, > + count); Since you pass in @event, hwc->sample_period is already available to it, no need to pass that in as well. > + } else if (!intel_pmu_save_and_restart(event)) > return; > > while (count > 1) {