Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755570AbbEEFtH (ORCPT ); Tue, 5 May 2015 01:49:07 -0400 Received: from mail-ob0-f176.google.com ([209.85.214.176]:35646 "EHLO mail-ob0-f176.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751272AbbEEFtF (ORCPT ); Tue, 5 May 2015 01:49:05 -0400 Message-ID: <554859CD.4090206@plumgrid.com> Date: Mon, 04 May 2015 22:49:01 -0700 From: Alexei Starovoitov User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:31.0) Gecko/20100101 Thunderbird/31.6.0 MIME-Version: 1.0 To: Wang Nan , davem@davemloft.net, acme@kernel.org, mingo@redhat.com, a.p.zijlstra@chello.nl, masami.hiramatsu.pt@hitachi.com, jolsa@kernel.org CC: linux-kernel@vger.kernel.org, pi3orama@163.com, hekuang@huawei.com, bgregg@netflix.com Subject: Re: [RFC PATCH 00/22] perf tools: introduce 'perf bpf' command to load eBPF programs. References: <1430391165-30267-1-git-send-email-wangnan0@huawei.com> <554302F0.3070101@plumgrid.com> <55447A7D.4000205@huawei.com> <554832AA.5050503@plumgrid.com> <55484A11.7070603@huawei.com> In-Reply-To: <55484A11.7070603@huawei.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3765 Lines: 83 On 5/4/15 9:41 PM, Wang Nan wrote: > > That's great. Could you please append the description of 'llvm -s' into your README > or comments? It has cost me a lot of time for dumping eBPF instructions so I decide to > add it into perf... sure. it's just -filetype=asm flag to llc instead of -filetype=obj. Eventually it will work as normal 'clang -S file.c' when few more llvm commits are accepted upstream. >>> My collage He Kuang is working on variable accessing. Probing inside function body >>> and accessing its local variable will be supported like this: >>> >>> SEC("config") char _prog_config[] = "prog: func_name:1234 vara=localvara" >>> int prog(struct pt_regs *ctx, unsigned long vara) { >>> // vara is the value of localvara of function func_name >>> } >> >> that would be great. I'm not sure though how you can achieve that >> without changing C front-end ? > > It's not very difficult. He is trying to generate the loader of vara > as prologue, then paste the prologue and the main eBPF program together. > From the viewpoint of kernel bpf verifier, there is only one param (ctx); the > prologue program fetches the value of vara then put it into a propoer register, > then main program work. got it. I think that's much cleaner than what I was proposing. The only question is then: char _prog_config[] = "prog: func_name:1234 vara=localvara" should actually be something like "... r2=localvara", right? since prologue would need to assign into r2. Otherwise I don't see where you find out about 'vara' inside compiled bpf code. Would be nice if this can be done without debug info. Like in tracex2_kern.c I have: SEC("kprobe/sys_write") int bpf_prog(struct pt_regs *ctx) { long wr_size = ctx->dx; /* arg3 */ with your prolog generator the above can be rewritten as: SEC("kprobe/sys_write") int bpf_prog(struct pt_regs *unused, int fd, char *buf, size_t wr_size) { /* use wr_size */ that will improve ease of use a lot. > Another possible solution is to change the protocol between kprobe and eBPF > program, makes kprobes calls fetchers and passes them to eBPF program as > a second param (group all varx together). > A prologue may still need in this case to load each param into correct > register. you mean grouping varx together in some other struct and embedding it together with pt_regs into new container struct? doable, but your first approach is quite clean already. why bother. > Could you please consider the following problem? > > We find there are serval __lock_page() calls last very long time. We are going > to find corresponding __unlock_page() so we can know what blocks them. We want to > insert eBPF programs before io_schedule() in __lock_page(), and also add eBPF program > on the entry of __unlock_page(), so we can compute the interval between page locking and > unlocking. If time is longer than a threshold, let __unlock_page() trigger a perf sampling > so we get its call stack. In this case, eBPF program acts as a trace filter. all makes sense and your use case fits quite well into existing bpf+kprobe model. I'm not sure why you're calling a 'problem'. A problem of how to display that call stack from perf? I would say it fits better as a sample than a trace. If you dump it as a trace, it won't easy to decipher, whereas if you treat it a sampling event, perf record/report facility will pick it up and display nicely. Meaning that one sample == lock_page/unlock_page latency > N. Then existing sample_callchain flag should work. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/