Received: by 10.223.164.202 with SMTP id h10csp2087579wrb; Sat, 18 Nov 2017 12:39:37 -0800 (PST) X-Google-Smtp-Source: AGs4zMZEJ2ylq8uAID4q+1y8A+QEePKkGrL1Q8K8u+yp3LvcYTZZWN/c2MFc3fSIBGX8a2/yNC2U X-Received: by 10.99.2.23 with SMTP id 23mr8638848pgc.99.1511037577475; Sat, 18 Nov 2017 12:39:37 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1511037577; cv=none; d=google.com; s=arc-20160816; b=GvcJ1e4d6aYVvv0ptYkjLWLBQxmoZ2AANtJtWIH5GrI2Fy5mCUfAxjiRRwFDh0LHjG VtRD3T2fXyJGbDU7hr9pxDEh2au1WbY+HlWHI9zmh0b/Cyw57LHcQus9PLA1tQ9kZlJC 84oI7Ai6CQCSrclLThPYa0ulFoAWwRFGNYcftezzd3vpXa7GOrSAAPwdzCfUDgWBRYrP vhX1T410rcmaEMJ+CXs1fi7fJXG/1kUqwEPeEhXqT/tsu0FpVTPEDmePkAr95S9q2Snw /lzCfcH+T4IOU+VcdwGHxrCq5TNIZRbnlo1GOAZw4zgsnH66Se5/KLrt7AQShFVOCmDP spsw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :arc-authentication-results; bh=F7JaDLSNAmAh5i3XZw2QB4LjZZSfol4ZunS3PEfYLZg=; b=YT+++Z3AogBnp5jRVNvaTUtrnvNa/+A/38iitDIXTAM/jPP/UIdLkFFtQX2tw3+SZR nnbvT7RvgS0Dzw6eC7TjPiVo3PG1ZIHiE2Yp6Ta42o1ptqtdcF5TBovmQtGCCFHl2mg8 24MkRiizcZo8lrA8hM0mmKteASzTXSqWRsVq0D/5aZAlk6fpQBe+2kJ/KtaobYeB4Sve j5yOzVNvra4enPGRSFQdyYYClhwd15mnPV3mDzF62fXe6xfYtZs1ho2FcoWhb+m6vlKu VEKKxCULtLxpft77dUqiIxYpP+MTx4n2gUoCY+yy7MrP/OS/lQq8Jh3mIiN4MGwtESQD ZSFg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id h14si5564225pfk.18.2017.11.18.12.39.23; Sat, 18 Nov 2017 12:39:37 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1162454AbdKRBjT (ORCPT + 93 others); Fri, 17 Nov 2017 20:39:19 -0500 Received: from mga05.intel.com ([192.55.52.43]:33533 "EHLO mga05.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1162407AbdKRBit (ORCPT ); Fri, 17 Nov 2017 20:38:49 -0500 Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga105.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 17 Nov 2017 17:38:48 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.44,411,1505804400"; d="scan'208";a="150617070" Received: from megha-z97x-ud7-th.sc.intel.com ([143.183.85.162]) by orsmga004.jf.intel.com with ESMTP; 17 Nov 2017 17:38:47 -0800 From: Megha Dey To: x86@kernel.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org Cc: tglx@linutronix.de, mingo@redhat.com, hpa@zytor.com, andriy.shevchenko@linux.intel.com, kstewart@linuxfoundation.org, yu-cheng.yu@intel.com, len.brown@intel.com, gregkh@linuxfoundation.org, peterz@infradead.org, acme@kernel.org, alexander.shishkin@linux.intel.com, jolsa@redhat.com, namhyung@kernel.org, vikas.shivappa@linux.intel.com, pombredanne@nexb.com, me@kylehuey.com, bp@suse.de, grzegorz.andrejczuk@intel.com, tony.luck@intel.com, corbet@lwn.net, ravi.v.shankar@intel.com, megha.dey@intel.com, Megha Dey Subject: [PATCH V2 2/3] perf/x86/intel/bm.c: Add Intel Branch Monitoring support Date: Fri, 17 Nov 2017 17:54:05 -0800 Message-Id: <1510970046-25387-3-git-send-email-megha.dey@linux.intel.com> X-Mailer: git-send-email 1.9.1 In-Reply-To: <1510970046-25387-1-git-send-email-megha.dey@linux.intel.com> References: <1510970046-25387-1-git-send-email-megha.dey@linux.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Currently, the cannonlake family of Intel processors support the branch monitoring feature. Intel's Branch monitoring feature is trying to utilize heuristics to detect the occurrence of an ROP (Return Oriented Programming) attack. A perf-based kernel driver has been used to monitor the occurrence of one of the 6 branch monitoring events. There are 2 counters that each can select between one of these events for evaluation over a specified instruction window size (0 to 1023). For each counter, a threshold value (0 to 127) can be configured to set a point at which ROP detection event action is taken (determined by user-space). Each task can monitor a maximum of 2 events at any given time. Apart from window_size(global) and threshold(per-counter), various sysfs entries are provided for the user to configure: guest_disable, lbr_freeze, window_cnt_sel, cnt_and_mode (all global) and mispred_evt_cnt(per-counter). For all events belonging to the same task, the global parameters are shared. Everytime a task is scheduled out, we save current window and count associated with the event being monitored. When the task is scheduled next, we start counting from previous count associated with this event. To monitor a user space application for ROP related events, perf command line can be used as follows: perf stat -e eg. For the following test program (test.c) and threshold = 100 (echo 100 > /sys/devices/intel_bm/threshold) void func(void) { return; } void main(void) { int i; for (i = 0; i < 128; i++) { func(); } return; } perf stat -e intel_bm/rets/ ./test Performance counter stats for './test': 1 intel_bm/rets/ 0.104705937 seconds time elapsed perf returns the number of branch monitoring interrupts occurred during the execution of the user-space application. Signed-off-by: Yu-Cheng Yu Signed-off-by: Megha Dey --- arch/x86/events/Kconfig | 10 + arch/x86/events/intel/Makefile | 2 + arch/x86/events/intel/bm.c | 605 +++++++++++++++++++++++++++++++++++++++ arch/x86/include/asm/msr-index.h | 5 + arch/x86/include/asm/processor.h | 4 + include/linux/perf_event.h | 9 +- kernel/events/core.c | 16 ++ 7 files changed, 650 insertions(+), 1 deletion(-) create mode 100644 arch/x86/events/intel/bm.c diff --git a/arch/x86/events/Kconfig b/arch/x86/events/Kconfig index 9a7a144..40903ca 100644 --- a/arch/x86/events/Kconfig +++ b/arch/x86/events/Kconfig @@ -9,6 +9,16 @@ config PERF_EVENTS_INTEL_UNCORE Include support for Intel uncore performance events. These are available on NehalemEX and more modern processors. +config PERF_EVENTS_INTEL_BM + bool "Intel Branch Monitoring support" + depends on PERF_EVENTS && CPU_SUP_INTEL && PCI + ---help--- + Include support for Intel Branch monitoring. This feature utilizes + heuristics for detecting ROP(Return oriented programming) like + attacks. These heuristics are based off certain performance + monitoring statistics, measured dynamically over a short + configurable window period. + config PERF_EVENTS_INTEL_RAPL tristate "Intel rapl performance events" depends on PERF_EVENTS && CPU_SUP_INTEL && PCI diff --git a/arch/x86/events/intel/Makefile b/arch/x86/events/intel/Makefile index 3468b0c..14235ec 100644 --- a/arch/x86/events/intel/Makefile +++ b/arch/x86/events/intel/Makefile @@ -2,6 +2,8 @@ obj-$(CONFIG_CPU_SUP_INTEL) += core.o bts.o obj-$(CONFIG_CPU_SUP_INTEL) += ds.o knc.o obj-$(CONFIG_CPU_SUP_INTEL) += lbr.o p4.o p6.o pt.o +obj-$(CONFIG_PERF_EVENTS_INTEL_BM) += intel-bm-perf.o +intel-bm-perf-objs := bm.o obj-$(CONFIG_PERF_EVENTS_INTEL_RAPL) += intel-rapl-perf.o intel-rapl-perf-objs := rapl.o obj-$(CONFIG_PERF_EVENTS_INTEL_UNCORE) += intel-uncore.o diff --git a/arch/x86/events/intel/bm.c b/arch/x86/events/intel/bm.c new file mode 100644 index 0000000..68d8f6d --- /dev/null +++ b/arch/x86/events/intel/bm.c @@ -0,0 +1,605 @@ +/* + * Support for Intel branch monitoring counters + * + * Intel branch monitoring MSRs are specified in the Intel® 64 and IA-32 + * Software Developer’s Manual Volume 4 section 2.16.2 (October 2017) + * + * Copyright (c) 2017, Intel Corporation. + * + * Contact Information: + * Megha Dey + * Yu-Cheng Yu + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "../perf_event.h" + +/* Branch Monitoring default and mask values */ +#define BM_MAX_WINDOW_SIZE 0x3ff +#define BM_MAX_THRESHOLD 0x7f +#define BM_MAX_EVENTS 6 +#define BM_WINDOW_SIZE_SHIFT 8 +#define BM_THRESHOLD_SHIFT 8 +#define BM_EVENT_TYPE_SHIFT 1 +#define BM_GUEST_DISABLE_SHIFT 3 +#define BM_LBR_FREEZE_SHIFT 2 +#define BM_WINDOW_CNT_SEL_SHIFT 24 +#define BM_CNT_AND_MODE_SHIFT 26 +#define BM_MISPRED_EVT_CNT_SHIFT 15 +#define BM_ENABLE 0x3 +#define BM_CNTR_ENABLE 1 + +static unsigned int bm_window_size = BM_MAX_WINDOW_SIZE; +static unsigned int bm_guest_disable; +static unsigned int bm_lbr_freeze; +static unsigned int bm_window_cnt_sel; +static unsigned int bm_cnt_and_mode; + +static unsigned int bm_threshold = BM_MAX_THRESHOLD; +static unsigned int bm_mispred_evt_cnt; + +/* Branch monitoring counter owners */ +static struct perf_event **bm_counter_owner; + +static struct pmu intel_bm_pmu; + +DEFINE_PER_CPU(int, bm_unmask_apic) = 0; + +union bm_detect_status { + struct { + __u8 event: 1; + __u8 lbrs_valid: 1; + __u8 reserved0: 6; + __u8 ctrl_hit: 4; + __u8 reserved1: 4; + __u16 count_window: 10; + __u8 reserved2: 6; + __u8 count[4]; + } __packed; + uint64_t raw; +}; + +static int intel_bm_event_nmi_handler(unsigned int cmd, struct pt_regs *regs) +{ + struct perf_event *event; + union bm_detect_status stat; + int i; + unsigned long x; + + rdmsrl(BR_DETECT_STATUS_MSR, stat.raw); + + if (!stat.event) + return NMI_DONE; + + wrmsrl(BR_DETECT_STATUS_MSR, 0); + apic_write(APIC_LVTPC, APIC_DM_NMI); + /* + * Issue wake-up to corresponding polling event + */ + x = stat.ctrl_hit; + for_each_set_bit(i, &x, BM_MAX_COUNTERS) { + event = current->thread.bm_counter_owner[i]; + local64_set(&event->hw.prev_count, 0); + local64_inc(&event->count); + atomic_set(&event->hw.bm_poll, POLLIN); + event->pending_wakeup = 1; + irq_work_queue(&event->pending); + } + + return NMI_HANDLED; +} + +static int intel_bm_event_add(struct perf_event *event, int mode) +{ + union bm_detect_status cur_stat, prev_stat; + + WARN_ON(event->hw.id >= BM_MAX_COUNTERS); + + prev_stat.raw = local64_read(&event->hw.prev_count); + + /* + * Start counting from previous count associated with this event + */ + cur_stat.count[event->hw.id] = prev_stat.count[event->hw.id]; + cur_stat.count_window = prev_stat.count_window; + wrmsrl(BR_DETECT_STATUS_MSR, cur_stat.raw); + + wrmsrl(BR_DETECT_CONTROL_MSR, event->hw.bm_ctrl); + + /* + * Unmask the NMI bit of the local APIC the first time task is + * scheduled on a particular CPU. + */ + if (!(this_cpu_read(bm_unmask_apic))) { + apic_write(APIC_LVTPC, APIC_DM_NMI); + this_cpu_inc(bm_unmask_apic); + } + + wrmsrl(BR_DETECT_COUNTER_CONFIG_BASE + event->hw.id, + event->hw.bm_counter_conf); + + return 0; +} + +static void intel_bm_event_del(struct perf_event *event, int flags) +{ + union bm_detect_status cur_stat; + + WARN_ON(event->hw.id >= BM_MAX_COUNTERS); + + wrmsrl(BR_DETECT_COUNTER_CONFIG_BASE + event->hw.id, 0); + + rdmsrl(BR_DETECT_STATUS_MSR, cur_stat.raw); + local64_set(&event->hw.prev_count, (uint64_t)cur_stat.raw); +} + +static void intel_bm_event_destroy(struct perf_event *event) +{ + bm_counter_owner[event->hw.id] = NULL; +} + +static DEFINE_MUTEX(bm_counter_mutex); + +static int intel_bm_event_init(struct perf_event *event) +{ + u64 cfg; + int counter_to_use = -1, i; + + local64_set(&event->hw.prev_count, 0); + + if (perf_paranoid_cpu() && !capable(CAP_SYS_ADMIN)) + return -EACCES; + + /* + * Type is assigned by kernel, see /sys/devices/intel_bm/type + */ + if (event->attr.type != intel_bm_pmu.type) + return -ENOENT; + + /* + * Only per tasks events are supported. It does not make sense to + * monitor all tasks for an ROP attack. This could generate a lot + * of false positives. + */ + if (event->hw.target == NULL) + return -EINVAL; + + /* No sampling supported */ + if (is_sampling_event(event)) + return -EINVAL; + + event->event_caps |= PERF_EV_CAP_BM; + /* + * cfg contains one of the 6 possible Branch Monitoring events + */ + cfg = event->attr.config; + if (cfg < 0 || cfg > (BM_MAX_EVENTS - 1)) + return -EINVAL; + + /* + * Find a hardware counter for the target task + */ + bm_counter_owner = event->hw.target->thread.bm_counter_owner; + + mutex_lock(&bm_counter_mutex); + for (i = 0; i < BM_MAX_COUNTERS; i++) { + if (bm_counter_owner[i] == NULL) { + counter_to_use = i; + bm_counter_owner[i] = event; + break; + } + } + mutex_unlock(&bm_counter_mutex); + + if (counter_to_use == -1) + return -EBUSY; + + event->hw.bm_ctrl = (bm_window_size << BM_WINDOW_SIZE_SHIFT) | + (bm_guest_disable << BM_GUEST_DISABLE_SHIFT) | + (bm_lbr_freeze << BM_LBR_FREEZE_SHIFT) | + (bm_window_cnt_sel << BM_WINDOW_CNT_SEL_SHIFT) | + (bm_cnt_and_mode << BM_CNT_AND_MODE_SHIFT) | + BM_ENABLE; + event->hw.bm_counter_conf = (bm_threshold << BM_THRESHOLD_SHIFT) | + (bm_mispred_evt_cnt << BM_MISPRED_EVT_CNT_SHIFT) | + (cfg << BM_EVENT_TYPE_SHIFT) | BM_CNTR_ENABLE; + + event->hw.id = counter_to_use; + + event->destroy = intel_bm_event_destroy; + + return 0; +} + +EVENT_ATTR_STR(rets, rets, "event=0x0"); +EVENT_ATTR_STR(call-ret, call_ret, "event=0x01"); +EVENT_ATTR_STR(ret-misp, ret_misp, "event=0x02"); +EVENT_ATTR_STR(branch-misp, branch_mispredict, "event=0x03"); +EVENT_ATTR_STR(indirect-branch-misp, indirect_branch_mispredict, "event=0x04"); +EVENT_ATTR_STR(far-branch, far_branch, "event=0x05"); + +static struct attribute *intel_bm_events_attr[] = { + EVENT_PTR(rets), + EVENT_PTR(call_ret), + EVENT_PTR(ret_misp), + EVENT_PTR(branch_mispredict), + EVENT_PTR(indirect_branch_mispredict), + EVENT_PTR(far_branch), + NULL, +}; + +static struct attribute_group intel_bm_events_group = { + .name = "events", + .attrs = intel_bm_events_attr, +}; + +PMU_FORMAT_ATTR(event, "config:0-7"); +static struct attribute *intel_bm_formats_attr[] = { + &format_attr_event.attr, + NULL, +}; + +static struct attribute_group intel_bm_format_group = { + .name = "format", + .attrs = intel_bm_formats_attr, +}; + +/* + * User can configure the BM MSRs using the corresponding sysfs entries + */ + +static ssize_t +threshold_show(struct device *dev, struct device_attribute *attr, + char *buf) +{ + ssize_t rv; + + rv = sprintf(buf, "%d\n", bm_threshold); + + return rv; +} + +static ssize_t +threshold_store(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t count) +{ + unsigned int threshold; + int err; + + err = kstrtouint(buf, 0, &threshold); + if (err) + return err; + + if ((threshold > BM_MAX_THRESHOLD) || (threshold == 0)) { + pr_err("invalid threshold value\n"); + return -EINVAL; + } + + bm_threshold = threshold; + + return count; +} + +static DEVICE_ATTR_RW(threshold); + +static ssize_t +window_size_show(struct device *dev, struct device_attribute *attr, + char *buf) +{ + ssize_t rv; + + rv = sprintf(buf, "%d\n", bm_window_size); + + return rv; +} + +static ssize_t +window_size_store(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t count) +{ + unsigned int window_size; + int err; + + err = kstrtouint(buf, 0, &window_size); + if (err) + return err; + + if (window_size > BM_MAX_WINDOW_SIZE) { + pr_err("illegal window size\n"); + return -EINVAL; + } + + bm_window_size = window_size; + + return count; +} + +static DEVICE_ATTR_RW(window_size); + +static ssize_t +lbr_freeze_show(struct device *dev, struct device_attribute *attr, + char *buf) +{ + ssize_t rv; + + rv = sprintf(buf, "%d\n", bm_lbr_freeze); + + return rv; +} + +static ssize_t +lbr_freeze_store(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t count) +{ + unsigned int lbr_freeze; + int err; + + err = kstrtouint(buf, 0, &lbr_freeze); + if (err) + return err; + + if (lbr_freeze > 1) { + pr_err("lbr freeze can only be 0 or 1\n"); + return -EINVAL; + } + + bm_lbr_freeze = lbr_freeze; + + return count; +} + +static DEVICE_ATTR_RW(lbr_freeze); + +static ssize_t +guest_disable_show(struct device *dev, struct device_attribute *attr, + char *buf) +{ + ssize_t rv; + + rv = sprintf(buf, "%d\n", bm_guest_disable); + + return rv; +} + +static ssize_t +guest_disable_store(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t count) +{ + unsigned int guest_disable; + int err; + + err = kstrtouint(buf, 0, &guest_disable); + if (err) + return err; + + if (guest_disable > 1) { + pr_err("guest disable can only be 0 or 1\n"); + return -EINVAL; + } + + bm_guest_disable = guest_disable; + + return count; +} + +static DEVICE_ATTR_RW(guest_disable); + +static ssize_t +window_cnt_sel_show(struct device *dev, struct device_attribute *attr, + char *buf) +{ + ssize_t rv; + + rv = sprintf(buf, "%d\n", bm_window_cnt_sel); + + return rv; +} + +static ssize_t +window_cnt_sel_store(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t count) +{ + unsigned int window_cnt_sel; + int err; + + err = kstrtouint(buf, 0, &window_cnt_sel); + if (err) + return err; + + if (window_cnt_sel > 3) { + pr_err("invalid window_cnt_sel value\n"); + return -EINVAL; + } + + bm_window_cnt_sel = window_cnt_sel; + + return count; +} + +static DEVICE_ATTR_RW(window_cnt_sel); + +static ssize_t +cnt_and_mode_show(struct device *dev, struct device_attribute *attr, + char *buf) +{ + ssize_t rv; + + rv = sprintf(buf, "%d\n", bm_cnt_and_mode); + + return rv; +} + +static ssize_t +cnt_and_mode_store(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t count) +{ + unsigned int cnt_and_mode; + int err; + + err = kstrtouint(buf, 0, &cnt_and_mode); + if (err) + return err; + + if (cnt_and_mode > 1) { + pr_err("invalid cnt_and_mode value\n"); + return -EINVAL; + } + + bm_cnt_and_mode = cnt_and_mode; + + return count; +} + +static DEVICE_ATTR_RW(cnt_and_mode); + +static ssize_t +mispred_evt_cnt_show(struct device *dev, struct device_attribute *attr, + char *buf) +{ + ssize_t rv; + + rv = sprintf(buf, "%d\n", bm_mispred_evt_cnt); + + return rv; +} + +static ssize_t +mispred_evt_cnt_store(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t count) +{ + unsigned int mispred_evt_cnt; + int err; + + err = kstrtouint(buf, 0, &mispred_evt_cnt); + if (err) + return err; + + if (mispred_evt_cnt > 1) { + pr_err("invalid mispred_evt_cnt value\n"); + return -EINVAL; + } + + bm_mispred_evt_cnt = mispred_evt_cnt; + + return count; +} + +static DEVICE_ATTR_RW(mispred_evt_cnt); + +static ssize_t +num_counters_show(struct device *dev, struct device_attribute *attr, + char *buf) +{ + ssize_t rv; + + rv = sprintf(buf, "%d\n", BM_MAX_COUNTERS); + + return rv; +} + +static DEVICE_ATTR_RO(num_counters); + +static struct attribute *intel_bm_attrs[] = { + &dev_attr_window_size.attr, + &dev_attr_threshold.attr, + &dev_attr_lbr_freeze.attr, + &dev_attr_guest_disable.attr, + &dev_attr_window_cnt_sel.attr, + &dev_attr_cnt_and_mode.attr, + &dev_attr_mispred_evt_cnt.attr, + &dev_attr_num_counters.attr, + NULL, +}; + +static const struct attribute_group intel_bm_group = { + .attrs = intel_bm_attrs, +}; + +static const struct attribute_group *intel_bm_attr_groups[] = { + &intel_bm_events_group, + &intel_bm_format_group, + &intel_bm_group, + NULL, +}; + +static struct pmu intel_bm_pmu = { + .task_ctx_nr = perf_sw_context, + .attr_groups = intel_bm_attr_groups, + .event_init = intel_bm_event_init, + .add = intel_bm_event_add, + .del = intel_bm_event_del, +}; + +#define X86_BM_MODEL_MATCH(model) \ + { X86_VENDOR_INTEL, 6, model, X86_FEATURE_ANY } + +static const struct x86_cpu_id bm_cpu_match[] __initconst = { + X86_BM_MODEL_MATCH(INTEL_FAM6_CANNONLAKE_MOBILE), + {}, +}; + +MODULE_DEVICE_TABLE(x86cpu, bm_cpu_match); + +static __init int intel_bm_init(void) +{ + int ret, err; + + /* + * Only CNL supports branch monitoring + */ + if (!(x86_match_cpu(bm_cpu_match))) + return -ENODEV; + + err = register_nmi_handler(NMI_LOCAL, intel_bm_event_nmi_handler, + 0, "BM"); + + if (err) + goto fail_nmi; + + ret = perf_pmu_register(&intel_bm_pmu, "intel_bm", -1); + if (ret) { + pr_err("Intel BM perf registration failed: %d\n", ret); + return ret; + } + + return 0; + +fail_nmi: + unregister_nmi_handler(NMI_LOCAL, "BM"); + return err; +} +module_init(intel_bm_init); + +static void __exit intel_bm_exit(void) +{ + perf_pmu_unregister(&intel_bm_pmu); + unregister_nmi_handler(NMI_LOCAL, "BM"); +} +module_exit(intel_bm_exit); + +MODULE_LICENSE("GPL"); diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h index 34c4922..a311d30 100644 --- a/arch/x86/include/asm/msr-index.h +++ b/arch/x86/include/asm/msr-index.h @@ -294,6 +294,11 @@ /* Alternative perfctr range with full access. */ #define MSR_IA32_PMC0 0x000004c1 +/* Intel Branch Monitoring MSRs */ +#define BR_DETECT_CONTROL_MSR 0x00000350 +#define BR_DETECT_STATUS_MSR 0x00000351 +#define BR_DETECT_COUNTER_CONFIG_BASE 0x00000354 + /* AMD64 MSRs. Not complete. See the architecture manual for a more complete list. */ diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index 2db7cf7..6bdbe9e 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -42,6 +42,8 @@ #define NET_IP_ALIGN 0 #define HBP_NUM 4 + +#define BM_MAX_COUNTERS 2 /* * Default implementation of macro that returns current * instruction pointer ("program counter"). @@ -460,6 +462,8 @@ struct thread_struct { /* Save middle states of ptrace breakpoints */ struct perf_event *ptrace_bps[HBP_NUM]; + /* Branch Monitoring counter owners */ + struct perf_event *bm_counter_owner[BM_MAX_COUNTERS]; /* Debug status used for traps, single steps, etc... */ unsigned long debugreg6; /* Keep track of the exact dr7 value set by the user */ diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index 2c9c87d..bcd3826 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -168,6 +168,13 @@ struct hw_perf_event { */ struct task_struct *target; + struct { /* intel_bm */ + u64 bm_ctrl; + u64 bm_counter_conf; + atomic_t bm_poll; + u64 id; + }; + /* * PMU would store hardware filter configuration * here. @@ -191,7 +198,6 @@ struct hw_perf_event { * local64_cmpxchg() such that pmu::read() can be called nested. */ local64_t prev_count; - /* * The period to start the next sample with. */ @@ -512,6 +518,7 @@ typedef void (*perf_overflow_handler_t)(struct perf_event *, */ #define PERF_EV_CAP_SOFTWARE BIT(0) #define PERF_EV_CAP_READ_ACTIVE_PKG BIT(1) +#define PERF_EV_CAP_BM BIT(2) #define SWEVENT_HLIST_BITS 8 #define SWEVENT_HLIST_SIZE (1 << SWEVENT_HLIST_BITS) diff --git a/kernel/events/core.c b/kernel/events/core.c index 9404c63..0e66d5e 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -4519,6 +4519,15 @@ static unsigned int perf_poll(struct file *file, poll_table *wait) poll_wait(file, &event->waitq, wait); + /* + * Branch monitoring events do not support ring buffer. + * For users polling on these events, return appropriate poll state. + */ + if (event->event_caps & PERF_EV_CAP_BM) { + events = atomic_xchg(&event->hw.bm_poll, 0); + return events; + } + if (is_event_hup(event)) return events; @@ -5420,6 +5429,13 @@ void perf_event_wakeup(struct perf_event *event) { ring_buffer_wakeup(event); + /* + * Since branch monitoring events do not have ring buffer, they + * have to be woken up separately + */ + if (event->event_caps & PERF_EV_CAP_BM) + wake_up_all(&event->waitq); + if (event->pending_kill) { kill_fasync(perf_event_fasync(event), SIGIO, event->pending_kill); event->pending_kill = 0; -- 1.9.1 From 1586888154151695097@xxx Fri Dec 15 21:47:55 +0000 2017 X-GM-THRID: 1586527317343891553 X-Gmail-Labels: Inbox,Category Forums