Content-Type: text/plain;
	charset=gb2312
Mime-Version: 1.0 (1.0)
Subject: Re: [RFC PATCH 0/6] bpf: Introduce the new ability of eBPF programs to access hardware PMU counter
From: pi3orama <pi3orama@163.com>
In-Reply-To: <55A9A0F5.10409@plumgrid.com>
Date: Sat, 18 Jul 2015 09:02:58 +0800
Cc: kaixu xia <xiakaixu@huawei.com>,
        "davem@davemloft.net" <davem@davemloft.net>,
        "acme@kernel.org" <acme@kernel.org>,
        "mingo@redhat.com" <mingo@redhat.com>,
        "a.p.zijlstra@chello.nl" <a.p.zijlstra@chello.nl>,
        "masami.hiramatsu.pt@hitachi.com" <masami.hiramatsu.pt@hitachi.com>,
        "jolsa@kernel.org" <jolsa@kernel.org>,
        "wangnan0@huawei.com" <wangnan0@huawei.com>,
        "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
        "hekuang@huawei.com" <hekuang@huawei.com>
Content-Transfer-Encoding: 8BIT
Message-Id: <940779A8-2D11-4D34-946F-17E32C9F8268@163.com>
References: <1437129816-13176-1-git-send-email-xiakaixu@huawei.com> <55A98808.9010307@plumgrid.com> <C1242C27-229A-4B98-9C44-29CD20049779@163.com> <55A9A0F5.10409@plumgrid.com>
To: Alexei Starovoitov <ast@plumgrid.com>
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 2507
Lines: 50


?????ҵ? iPhone

> ?? 2015??7??18?գ?????8:42??Alexei Starovoitov <ast@plumgrid.com> д????
> 
>> On 7/17/15 4:27 PM, pi3orama wrote:
>> Then we also need another BPF_MAP_TYPE_PERF_EVENT_HASHMAP for events in task context.
> 
> hmm. why? don't see a use case yet.

An example: we want to count the number of cycles between entry and exit point of a particular library function (glibc write() for example). Context switch is possible, but we don't care cycles consumed by other tasks. Then we need to create a perf event in task context using:

perf _event_open(&attr, pid, -1/* cpu */, ...);
Since it is a library function, we have to choose pids we interest. We should also probe sys_clone and create new perf event when thread creating, we haven't think how to do that now.

Then when inserting into map, the meaning of key should not 'cpuid'. Pid should be mush reasonable.

Although we can use a auxiliary map which maps pid to array index...

Thank you.
> 
>> I choose current implementation because I think we may need perf event not wrapped in map in future (for example, tracepoints). With the design you suggested in this case we have to create a map with only 1 element in it.
> 
> what you had also needs a map of one element.
> also I don't think perf_events can be 'detached'. User space always will
> perf_event_open one first and only then program will use it.
> So passing FD from user space to the program is inevitable.
> Other than storing FD into map the other alternative is to use ld_imm64
> mechanism. Then the helper will only have one argument,
> but then you'd need to extend 'used_maps' logic with 'used_fds'.
> It's doable as well, but I think the use case of only one pmu counter
> per cpu is artificial. You'll always have an array of events. One for
> each cpu. So perf_event_array mechanism fits the best.
> 
>>> >btw, make sure you do your tests with lockdep and other debugs on.
>>> >and for the sample code please use C for the bpf program. Not many
>>> >people can read bpf asm ;)
>>> >
>> We still need some perf side code to make a c program work.
> 
> no, what I meant is to do sample code as tracex[1-5]*
> where there is distinct kernel and user space parts. Both in C.
> At this stage perf patches are way too early.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/