Subject: Re: [RFC PATCH 0/6] bpf: Introduce the new ability of eBPF programs
 to access hardware PMU counter
To: pi3orama <pi3orama@163.com>
References: <1437129816-13176-1-git-send-email-xiakaixu@huawei.com>
 <55A98808.9010307@plumgrid.com>
 <C1242C27-229A-4B98-9C44-29CD20049779@163.com>
Cc: kaixu xia <xiakaixu@huawei.com>,
        "davem@davemloft.net" <davem@davemloft.net>,
        "acme@kernel.org" <acme@kernel.org>,
        "mingo@redhat.com" <mingo@redhat.com>,
        "a.p.zijlstra@chello.nl" <a.p.zijlstra@chello.nl>,
        "masami.hiramatsu.pt@hitachi.com" <masami.hiramatsu.pt@hitachi.com>,
        "jolsa@kernel.org" <jolsa@kernel.org>,
        "wangnan0@huawei.com" <wangnan0@huawei.com>,
        "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
        "hekuang@huawei.com" <hekuang@huawei.com>
From: Alexei Starovoitov <ast@plumgrid.com>
Message-ID: <55A9A0F5.10409@plumgrid.com>
Date: Fri, 17 Jul 2015 17:42:29 -0700
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:38.0)
 Gecko/20100101 Thunderbird/38.0.1
MIME-Version: 1.0
In-Reply-To: <C1242C27-229A-4B98-9C44-29CD20049779@163.com>
Content-Type: text/plain; charset=gbk; format=flowed
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 1666
Lines: 32

On 7/17/15 4:27 PM, pi3orama wrote:
> Then we also need another BPF_MAP_TYPE_PERF_EVENT_HASHMAP for events in task context.

hmm. why? don't see a use case yet.

> I choose current implementation because I think we may need perf event not wrapped in map in future (for example, tracepoints). With the design you suggested in this case we have to create a map with only 1 element in it.

what you had also needs a map of one element.
also I don't think perf_events can be 'detached'. User space always will
perf_event_open one first and only then program will use it.
So passing FD from user space to the program is inevitable.
Other than storing FD into map the other alternative is to use ld_imm64
mechanism. Then the helper will only have one argument,
but then you'd need to extend 'used_maps' logic with 'used_fds'.
It's doable as well, but I think the use case of only one pmu counter
per cpu is artificial. You'll always have an array of events. One for
each cpu. So perf_event_array mechanism fits the best.

>> >btw, make sure you do your tests with lockdep and other debugs on.
>> >and for the sample code please use C for the bpf program. Not many
>> >people can read bpf asm ;)
>> >
> We still need some perf side code to make a c program work.

no, what I meant is to do sample code as tracex[1-5]*
where there is distinct kernel and user space parts. Both in C.
At this stage perf patches are way too early.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/