Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932071AbbEEEmb (ORCPT ); Tue, 5 May 2015 00:42:31 -0400 Received: from szxga03-in.huawei.com ([119.145.14.66]:46466 "EHLO szxga03-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751135AbbEEEmV (ORCPT ); Tue, 5 May 2015 00:42:21 -0400 Message-ID: <55484A11.7070603@huawei.com> Date: Tue, 5 May 2015 12:41:53 +0800 From: Wang Nan User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:24.0) Gecko/20100101 Thunderbird/24.0.1 MIME-Version: 1.0 To: Alexei Starovoitov , , , , , , CC: , , , Subject: Re: [RFC PATCH 00/22] perf tools: introduce 'perf bpf' command to load eBPF programs. References: <1430391165-30267-1-git-send-email-wangnan0@huawei.com> <554302F0.3070101@plumgrid.com> <55447A7D.4000205@huawei.com> <554832AA.5050503@plumgrid.com> In-Reply-To: <554832AA.5050503@plumgrid.com> Content-Type: text/plain; charset="windows-1252" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.111.69.129] X-CFilter-Loop: Reflected X-Mirapoint-Virus-RAPID-Raw: score=unknown(0), refid=str=0001.0A020202.55484A2A.0048,ss=1,re=0.000,recu=0.000,reip=0.000,cl=1,cld=1,fgs=0, ip=0.0.0.0, so=2013-05-26 15:14:31, dmn=2013-03-21 17:37:32 X-Mirapoint-Loop-Id: 9f72fe92d7ba9964f282857cedbb6877 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5454 Lines: 123 On 2015/5/5 11:02, Alexei Starovoitov wrote: > On 5/2/15 12:19 AM, Wang Nan wrote: >> >> I'd like to do following works in the next version (based on my experience and feedbacks): >> >> 1. Safely clean up kprobe points after unloading; >> >> 2. Add subcommand space to 'perf bpf'. Current staff should be reside in 'perf bpf load'; >> >> 3. Extract eBPF ELF walking and collecting work to a separated library to help others. > > that's a good list. > > The feedback for existing patches: > patch 18 - since we're creating a generic library for bpf elf > loading it would great to do the following: > first try to load with > attr.log_buf = NULL; > attr.log_level = 0; > then only if it fails, allocate a buffer and repeat with log_level = 1. > The reason is that it's better to have fast program loading by default > without any verbosity emitted by verifier. > Will do. > patch 19 - I think it's unnecessary. > verifier already dumps it. so this '-v' flag can be translated into > verbose loading. > There is also .s output from llvm for those interested in bpf asm > instructions. > That's great. Could you please append the description of 'llvm -s' into your README or comments? It has cost me a lot of time for dumping eBPF instructions so I decide to add it into perf... >> My collage He Kuang is working on variable accessing. Probing inside function body >> and accessing its local variable will be supported like this: >> >> SEC("config") char _prog_config[] = "prog: func_name:1234 vara=localvara" >> int prog(struct pt_regs *ctx, unsigned long vara) { >> // vara is the value of localvara of function func_name >> } > > that would be great. I'm not sure though how you can achieve that > without changing C front-end ? It's not very difficult. He is trying to generate the loader of vara as prologue, then paste the prologue and the main eBPF program together. >From the viewpoint of kernel bpf verifier, there is only one param (ctx); the prologue program fetches the value of vara then put it into a propoer register, then main program work. Another possible solution is to change the protocol between kprobe and eBPF program, makes kprobes calls fetchers and passes them to eBPF program as a second param (group all varx together). A prologue may still need in this case to load each param into correct register. > This type of feature is exactly the reason why we're trying to write > our front-end. > In general there are two ways to achieve 'restricted C' language: > - start from clang and chop all features that are not supported. > I believe Jovi already tried to do that and it became very difficult. > - start from simple front-end with minimal C and add all things one by > one. That's what we're trying to do. So far we have most of normal > syntax. The problem with our approach is that we cannot easily do > #include of existing .h files. We're working on that. > It's too experimental still. May be will be drop it and go back to > first approach. > > The reason for extending front-end is your example above, where > the user would want to write: > int prog(struct pt_regs *ctx, unsigned long vara) { > // use 'vara' > but generated BPF should have only one 'ctx' pointer, since that's > the only thing that verifier will accept. bpf/core and JITs expect > only one argument, etc. > So this func definition + 'vara' access can be compiled as ctx->si > (if vara is actually in register) or > bpf_probe_read(ctx->bp + magic_offset_from_debug_info) > (if vara is on stack) > or it can also be done via store_trace_args() but that will be slower > and requires hacking kernel, whereas ctx->... style is pure userspace. > Lot's of things to brainstorm. So please share your progress soon. > >> And I want to discuss with you and others about: >> >> 1. How to make eBPF output its tracing and aggregation results to perf? > > well, the output of bpf program is a data stored in maps. Each program > needs a corresponding user space reader/printer/sorter of this data. > Like tracex2 prints this data as histogram and tracex3 prints it as > heatmap. We can standardize few things like this, but ideally we > keep it up to user. So that user can write single file that consists > of functions that are loaded as bpf into kernel and other functions > that are executed in user space. llvm can jit first set to bpf and > second set to x86. That's distant future though. > So far samples/bpf/ style of kern.c+user.c worked quite well. > Well, looks like in your design the usage of BPF programs are some aggration results. In my side, I want they also ack as trace filters. Could you please consider the following problem? We find there are serval __lock_page() calls last very long time. We are going to find corresponding __unlock_page() so we can know what blocks them. We want to insert eBPF programs before io_schedule() in __lock_page(), and also add eBPF program on the entry of __unlock_page(), so we can compute the interval between page locking and unlocking. If time is longer than a threshold, let __unlock_page() trigger a perf sampling so we get its call stack. In this case, eBPF program acts as a trace filter. Thank you. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/