Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754844AbbGYCPW (ORCPT ); Fri, 24 Jul 2015 22:15:22 -0400 Received: from szxga03-in.huawei.com ([119.145.14.66]:20279 "EHLO szxga03-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753030AbbGYCPV (ORCPT ); Fri, 24 Jul 2015 22:15:21 -0400 Message-ID: <55B2F123.4060908@huawei.com> Date: Sat, 25 Jul 2015 10:14:59 +0800 From: xiakaixu User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:17.0) Gecko/20130801 Thunderbird/17.0.8 MIME-Version: 1.0 To: Daniel Borkmann CC: , , , , , , , , , , Subject: Re: [PATCH v2 0/5] bpf: Introduce the new ability of eBPF programs to access hardware PMU counter References: <1437552572-84748-1-git-send-email-xiakaixu@huawei.com> <55B179DB.4080308@iogearbox.net> In-Reply-To: <55B179DB.4080308@iogearbox.net> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8bit X-Originating-IP: [10.111.101.23] X-CFilter-Loop: Reflected X-Mirapoint-Virus-RAPID-Raw: score=unknown(0), refid=str=0001.0A020205.55B2F133.0065,ss=1,re=0.000,recu=0.000,reip=0.000,cl=1,cld=1,fgs=0, ip=0.0.0.0, so=2013-05-26 15:14:31, dmn=2013-03-21 17:37:32 X-Mirapoint-Loop-Id: 2fe2bb5241c38c6ca9dafaac029730b7 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6151 Lines: 129 于 2015/7/24 7:33, Daniel Borkmann 写道: > On 07/22/2015 10:09 AM, Kaixu Xia wrote: >> Previous patch v1 url: >> https://lkml.org/lkml/2015/7/17/287 > > [ Sorry to chime in late, just noticed this series now as I wasn't in Cc for > the core BPF changes. More below ... ] Sorry about this, will add you to the CC list:) Welcome your comments. > >> This patchset allows user read PMU events in the following way: >> 1. Open the PMU using perf_event_open() (for each CPUs or for >> each processes he/she'd like to watch); >> 2. Create a BPF_MAP_TYPE_PERF_EVENT_ARRAY BPF map; >> 3. Insert FDs into the map with some key-value mapping scheme >> (i.e. cpuid -> event on that CPU); >> 4. Load and attach eBPF programs as usual; >> 5. In eBPF program, get the perf_event_map_fd and key (i.e. >> cpuid get from bpf_get_smp_processor_id()) then use >> bpf_perf_event_read() to read from it. >> 6. Do anything he/her want. >> >> changes in V2: >> - put atomic_long_inc_not_zero() between fdget() and fdput(); >> - limit the event type to PERF_TYPE_RAW and PERF_TYPE_HARDWARE; >> - Only read the event counter on current CPU or on current >> process; >> - add new map type BPF_MAP_TYPE_PERF_EVENT_ARRAY to store the >> pointer to the struct perf_event; >> - according to the perf_event_map_fd and key, the function >> bpf_perf_event_read() can get the Hardware PMU counter value; >> >> Patch 5/5 is a simple example and shows how to use this new eBPF >> programs ability. The PMU counter data can be found in >> /sys/kernel/debug/tracing/trace(trace_pipe).(the cycles PMU >> value when 'kprobe/sys_write' sampling) >> >> $ cat /sys/kernel/debug/tracing/trace_pipe >> $ ./tracex6 >> ... >> cat-677 [002] d..1 210.299270: : bpf count: CPU-2 5316659 >> cat-677 [002] d..1 210.299316: : bpf count: CPU-2 5378639 >> cat-677 [002] d..1 210.299362: : bpf count: CPU-2 5440654 >> cat-677 [002] d..1 210.299408: : bpf count: CPU-2 5503211 >> cat-677 [002] d..1 210.299454: : bpf count: CPU-2 5565438 >> cat-677 [002] d..1 210.299500: : bpf count: CPU-2 5627433 >> cat-677 [002] d..1 210.299547: : bpf count: CPU-2 5690033 >> cat-677 [002] d..1 210.299593: : bpf count: CPU-2 5752184 >> cat-677 [002] d..1 210.299639: : bpf count: CPU-2 5814543 >> <...>-548 [009] d..1 210.299667: : bpf count: CPU-9 605418074 >> <...>-548 [009] d..1 210.299692: : bpf count: CPU-9 605452692 >> cat-677 [002] d..1 210.299700: : bpf count: CPU-2 5896319 >> <...>-548 [009] d..1 210.299710: : bpf count: CPU-9 605477824 >> <...>-548 [009] d..1 210.299728: : bpf count: CPU-9 605501726 >> <...>-548 [009] d..1 210.299745: : bpf count: CPU-9 605525279 >> <...>-548 [009] d..1 210.299762: : bpf count: CPU-9 605547817 >> <...>-548 [009] d..1 210.299778: : bpf count: CPU-9 605570433 >> <...>-548 [009] d..1 210.299795: : bpf count: CPU-9 605592743 >> ... >> >> The detail of patches is as follow: >> >> Patch 1/5 introduces a new bpf map type. This map only stores the >> pointer to struct perf_event; >> >> Patch 2/5 introduces a map_traverse_elem() function for further use; >> >> Patch 3/5 convets event file descriptors into perf_event structure when >> add new element to the map; > > So far all the map backends are of generic nature, knowing absolutely nothing > about a particular consumer/subsystem of eBPF (tc, socket filters, etc). The > tail call is a bit special, but nevertheless generic for each user and [very] > useful, so it makes sense to inherit from the array map and move the code there. > > I don't really like that we start add new _special_-cased maps here into the > eBPF core code, it seems quite hacky. :( From your rather terse commit description > where you introduce the maps, I failed to see a detailed elaboration on this i.e. > why it cannot be abstracted any different? It will be very useful that giving the eBPF programs the ablility to access hardware PMU counter, just as I mentioned in V1 commit message. Of course, there are some special code when creating the perf_event type map in V2, but you will find less special code in the next version(V3). I have reused most of the prog_array map implementation. We can make the perf_event array map more generic in the future. BR. > >> Patch 4/5 implement function bpf_perf_event_read() that get the selected >> hardware PMU conuter; >> >> Patch 5/5 give a simple example. >> >> Kaixu Xia (5): >> bpf: Add new bpf map type to store the pointer to struct perf_event >> bpf: Add function map->ops->map_traverse_elem() to traverse map elems >> bpf: Save the pointer to struct perf_event to map >> bpf: Implement function bpf_perf_event_read() that get the selected >> hardware PMU conuter >> samples/bpf: example of get selected PMU counter value >> >> include/linux/bpf.h | 6 +++ >> include/linux/perf_event.h | 5 ++- >> include/uapi/linux/bpf.h | 3 ++ >> kernel/bpf/arraymap.c | 110 +++++++++++++++++++++++++++++++++++++++++++++ >> kernel/bpf/helpers.c | 42 +++++++++++++++++ >> kernel/bpf/syscall.c | 26 +++++++++++ >> kernel/events/core.c | 30 ++++++++++++- >> kernel/trace/bpf_trace.c | 2 + >> samples/bpf/Makefile | 4 ++ >> samples/bpf/bpf_helpers.h | 2 + >> samples/bpf/tracex6_kern.c | 27 +++++++++++ >> samples/bpf/tracex6_user.c | 67 +++++++++++++++++++++++++++ >> 12 files changed, 321 insertions(+), 3 deletions(-) >> create mode 100644 samples/bpf/tracex6_kern.c >> create mode 100644 samples/bpf/tracex6_user.c >> > > > . > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/