Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752814AbbGBDmX (ORCPT ); Wed, 1 Jul 2015 23:42:23 -0400 Received: from szxga03-in.huawei.com ([119.145.14.66]:18203 "EHLO szxga03-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751378AbbGBDmN (ORCPT ); Wed, 1 Jul 2015 23:42:13 -0400 Subject: Re: [RFC PATCH 0/5] Make eBPF programs output data to perf event To: Alexei Starovoitov , Peter Zijlstra , "Wangnan (F)" References: <1435719455-91155-1-git-send-email-hekuang@huawei.com> <20150701054458.GN19282@twins.programming.kicks-ass.net> <559386D7.1020208@huawei.com> <20150701115825.GV19282@twins.programming.kicks-ass.net> <5594A679.7010108@plumgrid.com> CC: , , , , , , , pi3orama From: He Kuang Message-ID: <5594B22F.8090500@huawei.com> Date: Thu, 2 Jul 2015 11:38:23 +0800 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.0 MIME-Version: 1.0 In-Reply-To: <5594A679.7010108@plumgrid.com> Content-Type: text/plain; charset="windows-1252"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [10.110.54.65] X-CFilter-Loop: Reflected X-Mirapoint-Virus-RAPID-Raw: score=unknown(0), refid=str=0001.0A020202.5594B248.0105,ss=1,re=0.000,recu=0.000,reip=0.000,cl=1,cld=1,fgs=0, ip=0.0.0.0, so=2013-05-26 15:14:31, dmn=2013-03-21 17:37:32 X-Mirapoint-Loop-Id: 4ea9c609f8d12cbc23a57c4a4c1b63f3 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2604 Lines: 71 On 2015/7/2 10:48, Alexei Starovoitov wrote: > On 7/1/15 4:58 AM, Peter Zijlstra wrote: >> >> But why create a separate trace buffer, it should go into the regular >> perf buffer. > > +1 > > I think > +static char __percpu *perf_extra_trace_buf[PERF_NR_CONTEXTS]; > is redundant. > It adds quite a bit of unnecessary complexity to the whole patch set. > > Also the call to bpf_output_sample() is not effective unless program > returns 1. It's a confusing user interface. > > Also you cannot ever do: > BPF_FUNC_probe_read, > + BPF_FUNC_output_sample, > BPF_FUNC_ktime_get_ns, > new functions must be added to the end. > > Why not just do: > perf_trace_buf_prepare() + perf_trace_buf_submit() from the helper? > No changes to current code. > No need to call __get_data_size() and other overhead. > The helper can be called multiple times from the same program. > imo much cleaner. > Invoke perf_trace_buf_submit() will generate a second perf event (header->type = PERF_RECORD_SAMPLE) entry which is different from the event entry outputed by the orignial kprobe. So the final result of the example in 00/00 patch may like this: sample entry 1(from bpf_prog): comm timestamp1 generic_perform_write pmu_value=0x1234 sample entry 2(from original kprobe): comm timestamp2 generic_perform_write: (ffffffff81140b60) Compared with current implementation: combined sample entry: comm timestamp generic_perform_write: (ffffffff81140b60) pmu_value=0x1234 The former two entries may be discontinuous as there are multiple threads and kprobes to be recorded, and there's a chance that one entry is missed but the other is recorded. What we need is the pmu_value read when 'generic_perform_write' enters, the two entries result is not intuitive enough and userspace tools have to do the work to find and combine those two sample entries to get the result. Thank you. > Also how about calling this helper: > bpf_trace_buf_submit(void *stack_ptr, int size) ? > bpf_output_sample, I think, is odd name. It's not a sample. > May be other name? > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/