Received: by 10.223.176.46 with SMTP id f43csp911215wra; Wed, 24 Jan 2018 07:46:04 -0800 (PST) X-Google-Smtp-Source: AH8x2244o5idAVsVcp7B1tU7cmPOFEQCgkjFQeXhSDym9IBEQ2Ue7Evi9mloOjcTSm70Txfx9DGP X-Received: by 2002:a17:902:768b:: with SMTP id m11-v6mr8296425pll.50.1516808764027; Wed, 24 Jan 2018 07:46:04 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1516808763; cv=none; d=google.com; s=arc-20160816; b=x6ZeklKGtSQB82TTeOOa3wzV9PjAqB1h5V1xSbJfGaWbGr56A6puXqubELTFoHuRgf rWS1z9GNY8QIQlswmIRs94f3pbg5msDD5J9U/YIUUU9jn7t+/hpRZ4jfjwXbLkYafwG/ 7VZE2pWrDh8ZI2Zx+9aXL5H5HnKcNkIqkxmai7tAya9FQuPEefTQpwGu5sHpi0J7vT0I NtClcohmXBjHBBuKWz7lJAnhBT78/IRdZWH05QongMQpZbMssT8Mi4rECu+q2Yjkq1h+ i352KHNhbL60uzIkmwNX4OXCHNsDSV6ERKvnihKch32vvJf7l3m7+NKsKm4/J25NoiD8 Qxzw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:arc-authentication-results; bh=zukiCExySbm22xqC4TdLcMdeDY/Z+CX/rjw2fRGSkGw=; b=GYf/fu7/eWJS18NjyrP62fZKjo8ugNUd6TtjbvKbdJHeTnTceSEl6TKq/7Edx7q0/U 2G4rir7Z5MwveocBTr1sLyIKgv3/JNK3Reo3YJX6TGBVLWbZUV1Wa2qI/0H1Sfi9UNpR IZ5aSQaQSbvu7dfmF328NEs2qr88aZ6naOx3/1BFNurMZFraWzTxTM8KLk2zLmjPZkl9 u0I6sqf2RLhqgGXDREtVury7oJkKyYKKjf49DGyZQ4UShK8pQH5WfHgbti1rzxMcWzUV jACky9l1khhLvPRGC+4fY0TR1/QKQJD0R58G7h5+OOV8SPEqvXVcuv1dvt8dJKF81wOi boNQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id s9-v6si338285plr.778.2018.01.24.07.45.50; Wed, 24 Jan 2018 07:46:03 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934261AbeAXPpH (ORCPT + 99 others); Wed, 24 Jan 2018 10:45:07 -0500 Received: from mga03.intel.com ([134.134.136.65]:65136 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S934058AbeAXPpG (ORCPT ); Wed, 24 Jan 2018 10:45:06 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga007.jf.intel.com ([10.7.209.58]) by orsmga103.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 24 Jan 2018 07:45:05 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.46,408,1511856000"; d="scan'208";a="12285729" Received: from linux.intel.com ([10.54.29.200]) by orsmga007.jf.intel.com with ESMTP; 24 Jan 2018 07:45:05 -0800 Received: from [10.254.67.244] (kliang2-mobl1.ccr.corp.intel.com [10.254.67.244]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by linux.intel.com (Postfix) with ESMTPS id DB1EB580385; Wed, 24 Jan 2018 07:45:04 -0800 (PST) Subject: Re: [RESEND PATCH V2 1/4] perf/x86/intel: fix event update for auto-reload To: Peter Zijlstra , kan.liang@intel.com Cc: mingo@redhat.com, acme@kernel.org, linux-kernel@vger.kernel.org, tglx@linutronix.de, jolsa@redhat.com, eranian@google.com, ak@linux.intel.com References: <1515424516-143728-1-git-send-email-kan.liang@intel.com> <1515424516-143728-2-git-send-email-kan.liang@intel.com> <20180124122618.GH2249@hirez.programming.kicks-ass.net> From: "Liang, Kan" Message-ID: Date: Wed, 24 Jan 2018 10:45:03 -0500 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.5.2 MIME-Version: 1.0 In-Reply-To: <20180124122618.GH2249@hirez.programming.kicks-ass.net> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 1/24/2018 7:26 AM, Peter Zijlstra wrote: > On Mon, Jan 08, 2018 at 07:15:13AM -0800, kan.liang@intel.com wrote: > >> The formula to calculate the event->count is as below: > >> event->count = period left from last time + >> (reload_times - 1) * reload_val + >> latency of PMI handler >> >> prev_count is the last observed hardware counter value. Just the same as >> non-auto-reload, its absolute value is the period of the first record. >> It should not update with each reload. Because it doesn't 'observe' the >> hardware counter for each auto-reload. >> >> For the second and later records, the period is exactly the reload >> value. Just need to simply add (reload_times - 1) * reload_val to >> event->count. >> >> The calculation of the latency of PMI handler is a little bit different >> as non-auto-reload. Because the start point is -reload_value. It needs >> to be adjusted by adding reload_value. >> The period_left needs to do the same adjustment. > > What's this about the PMI latency, we don't care about that in any other > situation, right? Sure the PMI takes a bit of time, but we're not > correcting for that anywhere, so why start now? The latency is the gap between PEBS hardware is armed and the NMI is handled. Sorry for the misleading description. I will rewrite the description in V3. > >> There is nothing need to do in x86_perf_event_set_period(). Because it >> is fixed period. The period_left is already adjusted. > > Fixes tag is missing. Will add it in V3. > >> Signed-off-by: Kan Liang >> --- >> arch/x86/events/intel/ds.c | 69 ++++++++++++++++++++++++++++++++++++++++++++-- >> 1 file changed, 67 insertions(+), 2 deletions(-) >> >> diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c >> index 3674a4b..cc1f373 100644 >> --- a/arch/x86/events/intel/ds.c >> +++ b/arch/x86/events/intel/ds.c >> @@ -1251,17 +1251,82 @@ get_next_pebs_record_by_bit(void *base, void *top, int bit) >> return NULL; >> } >> >> +/* >> + * Specific intel_pmu_save_and_restart() for auto-reload. >> + */ >> +static int intel_pmu_save_and_restart_reload(struct perf_event *event, >> + u64 reload_val, >> + int reload_times) >> +{ >> + struct hw_perf_event *hwc = &event->hw; >> + int shift = 64 - x86_pmu.cntval_bits; >> + u64 prev_raw_count, new_raw_count; >> + u64 delta; >> + >> + if ((reload_times == 0) || (reload_val == 0)) >> + return intel_pmu_save_and_restart(event); > > Like Jiri, I find this confusing at best. If we need to call that one, > you shouldn't have called this function to begin with. > > At best, have a WARN here or something. > Will add a WARN in V3. >> + >> + /* >> + * Careful: an NMI might modify the previous event value. >> + * >> + * Our tactic to handle this is to first atomically read and >> + * exchange a new raw count - then add that new-prev delta >> + * count to the generic event atomically: >> + */ > > For now this seems to only get called from *drain_pebs* which afaict > only happens when we've disabled the PMU (either from sched_task or > PMI). > > Now, you want to put this in the pmu::read() path, and that does not > disable the PMU, but I don't think we can drain the PEBS buffer while > its active, that's too full of races, so even there you'll have to > disable stuff. > > So I don't think this is correct/desired for this case. > Could we do something as below in the pmu::read() path? (not test yet) if (pebs_needs_sched_cb(cpuc)) { perf_pmu_disable(); intel_pmu_drain_pebs_buffer(); x86_perf_event_update(); perf_pmu_enable(); return; } x86_perf_event_update() is to handle the case !reload_times which you mentioned as below. Because the PMU is disabled, nothing will be changed for other cases. >> +again: >> + prev_raw_count = local64_read(&hwc->prev_count); >> + rdpmcl(hwc->event_base_rdpmc, new_raw_count); >> + >> + if (local64_cmpxchg(&hwc->prev_count, prev_raw_count, >> + new_raw_count) != prev_raw_count) >> + goto again; >> + >> + /* >> + * Now we have the new raw value and have updated the prev >> + * timestamp already. We can now calculate the elapsed delta >> + * (event-)time and add that to the generic event. >> + * >> + * Careful, not all hw sign-extends above the physical width >> + * of the count. >> + * >> + * event->count = period left from last time + >> + * (reload_times - 1) * reload_val + >> + * latency of PMI handler > * >> + * The period left from last time can be got from -prev_count. >> + * The start points of counting is always -reload_val. >> + * So the real latency of PMI handler is reload_val + new_raw_count. >> + */ > > That is very confused, the PMI latency is utterly unrelated to anything > you do here. Will fix the confusing comments in V3. > >> + delta = (reload_val << shift) + (new_raw_count << shift) - >> + (prev_raw_count << shift); >> + delta >>= shift; >> + >> + local64_add(reload_val * (reload_times - 1), &event->count); >> + local64_add(delta, &event->count); > > And this is still wrong I think. Consider the case where !reload_times. > > We can easily call pmu::read() twice in one period. In that case we > should increment count with (new - prev). > > Only once we get a new sample and are known to have wrapped, do we need > to consider that wrap. The code as above should fix this issue. > >> + local64_sub(delta, &hwc->period_left); >> + >> + return x86_perf_event_set_period(event); >> +} >> + >> static void __intel_pmu_pebs_event(struct perf_event *event, >> struct pt_regs *iregs, >> void *base, void *top, >> int bit, int count) >> { >> + struct hw_perf_event *hwc = &event->hw; >> struct perf_sample_data data; >> struct pt_regs regs; >> void *at = get_next_pebs_record_by_bit(base, top, bit); >> >> - if (!intel_pmu_save_and_restart(event) && >> - !(event->hw.flags & PERF_X86_EVENT_AUTO_RELOAD)) >> + if (hwc->flags & PERF_X86_EVENT_AUTO_RELOAD) { >> + /* >> + * Now, auto-reload is only enabled in fixed period mode. >> + * The reload value is always hwc->sample_period. >> + * May need to change it, if auto-reload is enabled in >> + * freq mode later. >> + */ >> + intel_pmu_save_and_restart_reload(event, hwc->sample_period, >> + count); > > Since you pass in @event, hwc->sample_period is already available to it, > no need to pass that in as well. OK. I will change it. Thanks, Kan > >> + } else if (!intel_pmu_save_and_restart(event)) >> return; >> >> while (count > 1) {