Received: by 2002:ac0:a5b6:0:0:0:0:0 with SMTP id m51-v6csp2210056imm; Mon, 28 May 2018 04:01:34 -0700 (PDT) X-Google-Smtp-Source: AB8JxZpEsejVQ2U+1gjEivTUhtEQfKU2XbgcDEUF4iiCaCVe24Lax/fWjLppd/sHAcUQ6HS1rNBm X-Received: by 2002:a65:4ecc:: with SMTP id w12-v6mr10200727pgq.214.1527505294279; Mon, 28 May 2018 04:01:34 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1527505294; cv=none; d=google.com; s=arc-20160816; b=lGCFGV4bYKKsRQCVp7Iys3/NbNKsc1mjCgYrQH0Hr1fiWwCIVEhk07Mr8qMIfjK605 pQdBaeKLdxPOCLOU9rxs8M8DGKNBR8jCbbD4yt5hr2doYETs8HpADWjJmTSqVjcbbLem teWsXvVYQ/zL747TUAGXOl2GWLs0xG8LiggCnkI9bNDuxdOdRiQvLHbDuPNE1LTCObnd JhjnVrwai3BCKoRSYcrnhkLDc3eHZh2VEDuJ0sAOb8uMFuWFXdblBdQXXRQB9hQMnKUT Xqu8TcYrMviJnK1NN7ysJ2Ge1kICSgqHqbdfm73/yyMp8NbeRZ+6Uroyixzo56NVo3XW KkFA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :in-reply-to:message-id:date:subject:cc:to:from:dkim-signature :arc-authentication-results; bh=vfVnDR5ef0MPgFXfxQ9ImhYn4UWrAryXswlzJF45iSg=; b=h3iAjkFgUqjJXQsqzp3HflGUaRD/F86UYFtVLv9CK6bbVIscdmfRjGk5wx62bFcXkg BWNLsZP+SoOoU6EGutMbFS1HgoUHlzrPtJRSatkowiQuLQSEUP7b17w68CgK1PGLS2y6 BMJ8SVY6P2YCUxbE3mF3vzv3ZEi7/450AyX5ABiD4b+P2qKqwOf9fZyzRgWPgjiyqTcE 2gDN45DuPpeWIHVmYNikCJ9kd/JgbHyw69CbEknAnYlNdAEePPBnv8xyi+zmHHCj0wHf ebDcUhP/TnlGrqrHdAwok9vSqWfiloml5b2mwgchi9hV6Ldh9Na0gFebT30H4vkw0TsA HX+A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=PnB1zdvn; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id b59-v6si29981289plc.335.2018.05.28.04.01.19; Mon, 28 May 2018 04:01:34 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=PnB1zdvn; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1164828AbeE1LBG (ORCPT + 99 others); Mon, 28 May 2018 07:01:06 -0400 Received: from mail.kernel.org ([198.145.29.99]:48558 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1164613AbeE1LA6 (ORCPT ); Mon, 28 May 2018 07:00:58 -0400 Received: from localhost (LFbn-1-12247-202.w90-92.abo.wanadoo.fr [90.92.61.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 2587C20845; Mon, 28 May 2018 11:00:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1527505257; bh=Pw2b4h8wkK+rC7Bjm0QZI5h5yalT9YH7+cVLnMl2sp0=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=PnB1zdvnfvjTtoIhu4YKLftTIo/2Yj45r1ZAhbfH65ZvcM3P8H6e6dxGYWPMeoGGM 3LpyTl84Kc7krmYg+lVLqFAK14vfwJUi9CgUMiYjFWc9Gm85aP2aag5RMF1ma8IzOx bTThlgsBkFf+t5yMi3xIwIrhAI6lUvaXx0Io4CA8= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Kan Liang , "Peter Zijlstra (Intel)" , Alexander Shishkin , Arnaldo Carvalho de Melo , Jiri Olsa , Linus Torvalds , Stephane Eranian , Thomas Gleixner , Vince Weaver , acme@kernel.org, Ingo Molnar , Sasha Levin Subject: [PATCH 4.14 433/496] perf/x86/intel: Fix event update for auto-reload Date: Mon, 28 May 2018 12:03:38 +0200 Message-Id: <20180528100337.981660770@linuxfoundation.org> X-Mailer: git-send-email 2.17.0 In-Reply-To: <20180528100319.498712256@linuxfoundation.org> References: <20180528100319.498712256@linuxfoundation.org> User-Agent: quilt/0.65 X-stable: review MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 4.14-stable review patch. If anyone has any objections, please let me know. ------------------ From: Kan Liang [ Upstream commit d31fc13fdcb20e1c317f9a7dd6273c18fbd58308 ] There is a bug when reading event->count with large PEBS enabled. Here is an example: # ./read_count 0x71f0 0x122c0 0x1000000001c54 0x100000001257d 0x200000000bdc5 In fixed period mode, the auto-reload mechanism could be enabled for PEBS events, but the calculation of event->count does not take the auto-reload values into account. Anyone who reads event->count will get the wrong result, e.g x86_pmu_read(). This bug was introduced with the auto-reload mechanism enabled since commit: 851559e35fd5 ("perf/x86/intel: Use the PEBS auto reload mechanism when possible") Introduce intel_pmu_save_and_restart_reload() to calculate the event->count only for auto-reload. Since the counter increments a negative counter value and overflows on the sign switch, giving the interval: [-period, 0] the difference between two consequtive reads is: A) value2 - value1; when no overflows have happened in between, B) (0 - value1) + (value2 - (-period)); when one overflow happened in between, C) (0 - value1) + (n - 1) * (period) + (value2 - (-period)); when @n overflows happened in between. Here A) is the obvious difference, B) is the extension to the discrete interval, where the first term is to the top of the interval and the second term is from the bottom of the next interval and C) the extension to multiple intervals, where the middle term is the whole intervals covered. The equation for all cases is: value2 - value1 + n * period Previously the event->count is updated right before the sample output. But for case A, there is no PEBS record ready. It needs to be specially handled. Remove the auto-reload code from x86_perf_event_set_period() since we'll not longer call that function in this case. Based-on-code-from: Peter Zijlstra (Intel) Signed-off-by: Kan Liang Signed-off-by: Peter Zijlstra (Intel) Cc: Alexander Shishkin Cc: Arnaldo Carvalho de Melo Cc: Jiri Olsa Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Thomas Gleixner Cc: Vince Weaver Cc: acme@kernel.org Fixes: 851559e35fd5 ("perf/x86/intel: Use the PEBS auto reload mechanism when possible") Link: http://lkml.kernel.org/r/1518474035-21006-2-git-send-email-kan.liang@linux.intel.com Signed-off-by: Ingo Molnar Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- arch/x86/events/core.c | 15 ++----- arch/x86/events/intel/ds.c | 92 +++++++++++++++++++++++++++++++++++++++++++-- 2 files changed, 94 insertions(+), 13 deletions(-) --- a/arch/x86/events/core.c +++ b/arch/x86/events/core.c @@ -1162,16 +1162,13 @@ int x86_perf_event_set_period(struct per per_cpu(pmc_prev_left[idx], smp_processor_id()) = left; - if (!(hwc->flags & PERF_X86_EVENT_AUTO_RELOAD) || - local64_read(&hwc->prev_count) != (u64)-left) { - /* - * The hw event starts counting from this event offset, - * mark it to be able to extra future deltas: - */ - local64_set(&hwc->prev_count, (u64)-left); + /* + * The hw event starts counting from this event offset, + * mark it to be able to extra future deltas: + */ + local64_set(&hwc->prev_count, (u64)-left); - wrmsrl(hwc->event_base, (u64)(-left) & x86_pmu.cntval_mask); - } + wrmsrl(hwc->event_base, (u64)(-left) & x86_pmu.cntval_mask); /* * Due to erratum on certan cpu we need --- a/arch/x86/events/intel/ds.c +++ b/arch/x86/events/intel/ds.c @@ -1312,17 +1312,84 @@ get_next_pebs_record_by_bit(void *base, return NULL; } +/* + * Special variant of intel_pmu_save_and_restart() for auto-reload. + */ +static int +intel_pmu_save_and_restart_reload(struct perf_event *event, int count) +{ + struct hw_perf_event *hwc = &event->hw; + int shift = 64 - x86_pmu.cntval_bits; + u64 period = hwc->sample_period; + u64 prev_raw_count, new_raw_count; + s64 new, old; + + WARN_ON(!period); + + /* + * drain_pebs() only happens when the PMU is disabled. + */ + WARN_ON(this_cpu_read(cpu_hw_events.enabled)); + + prev_raw_count = local64_read(&hwc->prev_count); + rdpmcl(hwc->event_base_rdpmc, new_raw_count); + local64_set(&hwc->prev_count, new_raw_count); + + /* + * Since the counter increments a negative counter value and + * overflows on the sign switch, giving the interval: + * + * [-period, 0] + * + * the difference between two consequtive reads is: + * + * A) value2 - value1; + * when no overflows have happened in between, + * + * B) (0 - value1) + (value2 - (-period)); + * when one overflow happened in between, + * + * C) (0 - value1) + (n - 1) * (period) + (value2 - (-period)); + * when @n overflows happened in between. + * + * Here A) is the obvious difference, B) is the extension to the + * discrete interval, where the first term is to the top of the + * interval and the second term is from the bottom of the next + * interval and C) the extension to multiple intervals, where the + * middle term is the whole intervals covered. + * + * An equivalent of C, by reduction, is: + * + * value2 - value1 + n * period + */ + new = ((s64)(new_raw_count << shift) >> shift); + old = ((s64)(prev_raw_count << shift) >> shift); + local64_add(new - old + count * period, &event->count); + + perf_event_update_userpage(event); + + return 0; +} + static void __intel_pmu_pebs_event(struct perf_event *event, struct pt_regs *iregs, void *base, void *top, int bit, int count) { + struct hw_perf_event *hwc = &event->hw; struct perf_sample_data data; struct pt_regs regs; void *at = get_next_pebs_record_by_bit(base, top, bit); - if (!intel_pmu_save_and_restart(event) && - !(event->hw.flags & PERF_X86_EVENT_AUTO_RELOAD)) + if (hwc->flags & PERF_X86_EVENT_AUTO_RELOAD) { + /* + * Now, auto-reload is only enabled in fixed period mode. + * The reload value is always hwc->sample_period. + * May need to change it, if auto-reload is enabled in + * freq mode later. + */ + intel_pmu_save_and_restart_reload(event, count); + } else if (!intel_pmu_save_and_restart(event)) return; while (count > 1) { @@ -1374,8 +1441,11 @@ static void intel_pmu_drain_pebs_core(st return; n = top - at; - if (n <= 0) + if (n <= 0) { + if (event->hw.flags & PERF_X86_EVENT_AUTO_RELOAD) + intel_pmu_save_and_restart_reload(event, 0); return; + } __intel_pmu_pebs_event(event, iregs, at, top, 0, n); } @@ -1398,8 +1468,22 @@ static void intel_pmu_drain_pebs_nhm(str ds->pebs_index = ds->pebs_buffer_base; - if (unlikely(base >= top)) + if (unlikely(base >= top)) { + /* + * The drain_pebs() could be called twice in a short period + * for auto-reload event in pmu::read(). There are no + * overflows have happened in between. + * It needs to call intel_pmu_save_and_restart_reload() to + * update the event->count for this case. + */ + for_each_set_bit(bit, (unsigned long *)&cpuc->pebs_enabled, + x86_pmu.max_pebs_events) { + event = cpuc->events[bit]; + if (event->hw.flags & PERF_X86_EVENT_AUTO_RELOAD) + intel_pmu_save_and_restart_reload(event, 0); + } return; + } for (at = base; at < top; at += x86_pmu.pebs_record_size) { struct pebs_record_nhm *p = at;