Message-ID: <55C47B38.60400@huawei.com>
Date: Fri, 7 Aug 2015 17:32:40 +0800
From: xiakaixu <xiakaixu@huawei.com>
User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:17.0) Gecko/20130801 Thunderbird/17.0.8
MIME-Version: 1.0
To: Alexei Starovoitov <ast@plumgrid.com>,
        Arnaldo Carvalho de Melo <acme@kernel.org>, <mingo@redhat.com>,
        <a.p.zijlstra@chello.nl>, <masami.hiramatsu.pt@hitachi.com>,
        <jolsa@kernel.org>
CC: "Wangnan (F)" <wangnan0@huawei.com>, <pi3orama@163.com>,
        <linux-kernel@vger.kernel.org>
Subject: [RFC] perf ebpf: The example that how to access hardware PMU counter
 in eBPF programs bu using perf
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 2786
Lines: 74

By combining PMU, kprobe and eBPF program together, many interesting things
can be done. For example, by probing at sched:sched_switch we can
measure IPC changing between different processes by watching 'cycle' PMU
counter; by probing at entry and exit points of a kernel function we are
able to compute cache miss rate for a function by collecting
'cache-misses' counter and see the differences. In summary, we can
define the begin and end points of a procedure, insert kprobes on them,
attach two BPF programs and let them collect specific PMU counter.
Further, by reading those PMU counter BPF program can bring some hints
to resource schedulers.

I am focusing on the work that giving eBPF programs the new ability to
access hardware PMU counter and using it from perf. In recent weeks I have
submitted the kernel space code first and the latest V7 version is here
(www.spinics.net/lists/netdev/msg338468.html). According to the design
plan, we still need the perf side code. I will do it based on Wang Nan's
patches (perf tools: filtering events using eBPF programs).

Here is a simple eBPF program example that is loaded by using perf.
It is just the basic design principle, and if OK, we will implement
the perf side code refer to it.

Waiting for your comments.
Thanks.

====================================================================

struct bpf_map_def SEC("maps") my_cycles_map = {
        .type = BPF_MAP_TYPE_PERF_EVENT_ARRAY,
        .key_size = sizeof(int),
        .value_size = sizeof(u32),
        .max_entries = 32,
};
struct bpf_map_def SEC("maps") my_exception_map = {
        .type = BPF_MAP_TYPE_PERF_EVENT_ARRAY,
        .key_size = sizeof(int),
        .value_size = sizeof(u32),
        .max_entries = 32,
};

struct perf_event_map {
        struct bpf_map_def *map_def;
        char description[64];
};

struct perf_event_map SEC("perf_event_map") cycles = {
        .map_def = &my_cycles_map,
        .description = "cycles",
};

struct perf_event_map SEC("perf_event_map") exception = {
        .map_def = &my_exception_map,
        .description = "exception",
};

SEC("kprobe/sys_write")
int bpf_prog(struct pt_regs *ctx)
{
        u64 count_cycles, count_exception;
        u32 key = bpf_get_smp_processor_id();
        char fmt[] = "CPU-%d   cyc:%llu  exc:%llu\n";

        count_cycles = bpf_perf_event_read(&my_cycles_map, key);
        count_exception = bpf_perf_event_read(&my_exception_map, key);
        bpf_trace_printk(fmt, sizeof(fmt), key, count_cycles, count_exception);

        return 0;
}

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/