Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751987AbcDRVoU (ORCPT ); Mon, 18 Apr 2016 17:44:20 -0400 Received: from mx0b-00082601.pphosted.com ([67.231.153.30]:63066 "EHLO mx0b-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751375AbcDRVoT (ORCPT ); Mon, 18 Apr 2016 17:44:19 -0400 Subject: Re: [PATCH net-next 2/8] perf, bpf: allow bpf programs attach to tracepoints To: Steven Rostedt References: <1459831974-2891931-1-git-send-email-ast@fb.com> <1459831974-2891931-3-git-send-email-ast@fb.com> <20160418162905.220df2f4@gandalf.local.home> CC: Peter Zijlstra , "David S . Miller" , Ingo Molnar , Daniel Borkmann , Arnaldo Carvalho de Melo , Wang Nan , Josef Bacik , Brendan Gregg , , , From: Alexei Starovoitov Message-ID: <571554EB.9010702@fb.com> Date: Mon, 18 Apr 2016 14:43:07 -0700 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:38.0) Gecko/20100101 Thunderbird/38.7.1 MIME-Version: 1.0 In-Reply-To: <20160418162905.220df2f4@gandalf.local.home> Content-Type: text/plain; charset="windows-1252"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [192.168.52.123] X-Proofpoint-Spam-Reason: safe X-FB-Internal: Safe X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2016-04-18_13:,, signatures=0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3060 Lines: 69 On 4/18/16 1:29 PM, Steven Rostedt wrote: > On Mon, 4 Apr 2016 21:52:48 -0700 > Alexei Starovoitov wrote: > >> introduce BPF_PROG_TYPE_TRACEPOINT program type and allow it to be >> attached to tracepoints. >> The tracepoint will copy the arguments in the per-cpu buffer and pass >> it to the bpf program as its first argument. >> The layout of the fields can be discovered by doing >> 'cat /sys/kernel/debug/tracing/events/sched/sched_switch/format' >> prior to the compilation of the program with exception that first 8 bytes >> are reserved and not accessible to the program. This area is used to store >> the pointer to 'struct pt_regs' which some of the bpf helpers will use: >> +---------+ >> | 8 bytes | hidden 'struct pt_regs *' (inaccessible to bpf program) >> +---------+ >> | N bytes | static tracepoint fields defined in tracepoint/format (bpf readonly) >> +---------+ >> | dynamic | __dynamic_array bytes of tracepoint (inaccessible to bpf yet) >> +---------+ >> >> Not that all of the fields are already dumped to user space via perf ring buffer >> and some application access it directly without consulting tracepoint/format. >> Same rule applies here: static tracepoint fields should only be accessed >> in a format defined in tracepoint/format. The order of fields and >> field sizes are not an ABI. >> >> Signed-off-by: Alexei Starovoitov >> --- ... >> - entry = perf_trace_buf_prepare(__entry_size, \ >> - event_call->event.type, &__regs, &rctx); \ >> + event_type = prog ? TRACE_EVENT_TYPE_MAX : event_call->event.type; \ > > Can you move this into perf_trace_entry_prepare? that's the old version. The last one are commits 1e1dcd93b46 and 98b5c2c65c295 in net-next. >> + if (prog) { \ >> + *(struct pt_regs **)entry = __regs; \ >> + if (!trace_call_bpf(prog, entry) || hlist_empty(head)) { \ >> + perf_swevent_put_recursion_context(rctx); \ >> + return; \ >> + } \ >> + memset(&entry->ent, 0, sizeof(entry->ent)); \ >> + } \ > > And perhaps this into perf_trace_buf_submit()? > > Tracepoints are a major cause of bloat, and the reasons for these > prepare and submit functions is to move code out of the macros. Every > tracepoint in the kernel (1000 and counting) will include this code. > I've already had complaints that each tracepoint can add up to 5k to > the core. I was worried about this too, but single 'if' and two calls (as in commit 98b5c2c65c295) is a better way, since it's faster, cleaner and doesn't need to refactor the whole perf_trace_buf_submit() to pass extra event_call argument to it. perf_trace_buf_submit() is already ugly with 8 arguments! Passing more args or creating a struct to pass args only going to hurt performance without much reduction in .text size. tinyfication folks will disable tracepoints anyway. Note that the most common case is bpf returning 0 and not even calling perf_trace_buf_submit() which is already slow due to so many args passed on stack. This stuff is called million times a second, so every instruction counts.