Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753085AbbGWJn1 (ORCPT ); Thu, 23 Jul 2015 05:43:27 -0400 Received: from szxga03-in.huawei.com ([119.145.14.66]:9382 "EHLO szxga03-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752431AbbGWJnF (ORCPT ); Thu, 23 Jul 2015 05:43:05 -0400 From: Kaixu Xia To: , , , , , , CC: , , , , Subject: [PATCH v3 0/3] bpf: Introduce the new ability of eBPF programs to access hardware PMU counter Date: Thu, 23 Jul 2015 09:42:39 +0000 Message-ID: <1437644562-84431-1-git-send-email-xiakaixu@huawei.com> X-Mailer: git-send-email 1.8.3.4 MIME-Version: 1.0 Content-Type: text/plain X-Originating-IP: [10.107.193.250] X-CFilter-Loop: Reflected X-Mirapoint-Virus-RAPID-Raw: score=unknown(0), refid=str=0001.0A020204.55B0B723.01A1,ss=1,re=0.000,recu=0.000,reip=0.000,cl=1,cld=1,fgs=0, ip=0.0.0.0, so=2013-05-26 15:14:31, dmn=2013-03-21 17:37:32 X-Mirapoint-Loop-Id: f87aae05ae6a79499ea9c07468912fea Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4218 Lines: 100 Previous patch v2 url: https://lkml.org/lkml/2015/7/22/104 This patchset allows user read PMU events in the following way: 1. Open the PMU using perf_event_open() (for each CPUs or for each processes he/she'd like to watch); 2. Create a BPF_MAP_TYPE_PERF_EVENT_ARRAY BPF map; 3. Insert FDs into the map with some key-value mapping scheme (i.e. cpuid -> event on that CPU); 4. Load and attach eBPF programs as usual; 5. In eBPF program, get the perf_event_map_fd and index (i.e. cpuid get from bpf_get_smp_processor_id()) then use bpf_perf_event_read() to read from it. 6. Do anything he/her want. changes in V3: - collapse V2 patches 1-3 into one; - drop the function map->ops->map_traverse_elem() and release the struct perf_event in map_free; - only allow to access bpf_perf_event_read() from programs; - update the perf_event_array_map elem via xchg(); - pass index directly to bpf_perf_event_read() instead of MAP_KEY; changes in V2: - put atomic_long_inc_not_zero() between fdget() and fdput(); - limit the event type to PERF_TYPE_RAW and PERF_TYPE_HARDWARE; - Only read the event counter on current CPU or on current process; - add new map type BPF_MAP_TYPE_PERF_EVENT_ARRAY to store the pointer to the struct perf_event; - according to the perf_event_map_fd and key, the function bpf_perf_event_read() can get the Hardware PMU counter value; Patch 3/3 is a simple example and shows how to use this new eBPF programs ability. The PMU counter data can be found in /sys/kernel/debug/tracing/trace(trace_pipe).(the cycles PMU value when 'kprobe/sys_write' sampling) $ cat /sys/kernel/debug/tracing/trace_pipe $ ./tracex6 ... cat-674 [000] d..1 146.413405: : CPU-0 2558223 <...>-699 [003] d..1 146.413441: : CPU-3 2663985 cat-674 [000] d..1 146.413480: : CPU-0 2659705 <...>-699 [003] d..1 146.413516: : CPU-3 2765199 cat-674 [000] d..1 146.413555: : CPU-0 2761277 <...>-699 [003] d..1 146.413600: : CPU-3 2877051 cat-674 [000] d..1 146.413651: : CPU-0 2889668 <...>-699 [003] d..1 146.413696: : CPU-3 3006447 cat-674 [000] d..1 146.413749: : CPU-0 3021285 <...>-699 [003] d..1 146.413787: : CPU-3 3131459 cat-674 [000] d..1 146.413838: : CPU-0 3141959 <...>-699 [003] d..1 146.413886: : CPU-3 3264188 cat-674 [000] d..1 146.413930: : CPU-0 3266461 <...>-699 [003] d..1 146.413973: : CPU-3 3381038 cat-674 [000] d..1 146.414025: : CPU-0 3393730 <...>-699 [003] d..1 146.414071: : CPU-3 3514676 ... The detail of patches is as follow: Patch 1/3 introduces a new bpf map type. This map only stores the pointer to struct perf_event; Patch 2/3 implement function bpf_perf_event_read() that get the selected hardware PMU conuter; Patch 3/3 give a simple example. Kaixu Xia (3): bpf: Add new bpf map type to store the pointer to struct perf_event bpf: Implement function bpf_perf_event_read() that get the selected hardware PMU conuter samples/bpf: example of get selected PMU counter value include/linux/bpf.h | 3 ++ include/linux/perf_event.h | 5 +- include/uapi/linux/bpf.h | 2 + kernel/bpf/arraymap.c | 113 +++++++++++++++++++++++++++++++++++++++++++++ kernel/bpf/helpers.c | 36 +++++++++++++++ kernel/bpf/verifier.c | 15 ++++++ kernel/events/core.c | 27 ++++++++++- kernel/trace/bpf_trace.c | 2 + samples/bpf/Makefile | 4 ++ samples/bpf/bpf_helpers.h | 2 + samples/bpf/tracex6_kern.c | 26 +++++++++++ samples/bpf/tracex6_user.c | 67 +++++++++++++++++++++++++++ 12 files changed, 299 insertions(+), 3 deletions(-) create mode 100644 samples/bpf/tracex6_kern.c create mode 100644 samples/bpf/tracex6_user.c -- 1.8.3.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/