Message-ID: <5594B569.60103@plumgrid.com>
Date: Wed, 01 Jul 2015 20:52:09 -0700
From: Alexei Starovoitov <ast@plumgrid.com>
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:31.0) Gecko/20100101 Thunderbird/31.7.0
MIME-Version: 1.0
To: He Kuang <hekuang@huawei.com>, Peter Zijlstra <peterz@infradead.org>,
        "Wangnan (F)" <wangnan0@huawei.com>
CC: rostedt@goodmis.org, masami.hiramatsu.pt@hitachi.com, mingo@redhat.com,
        acme@redhat.com, jolsa@kernel.org, namhyung@kernel.org,
        linux-kernel@vger.kernel.org, pi3orama <pi3orama@163.com>
Subject: Re: [RFC PATCH 0/5] Make eBPF programs output data to perf event
References: <1435719455-91155-1-git-send-email-hekuang@huawei.com> <20150701054458.GN19282@twins.programming.kicks-ass.net> <559386D7.1020208@huawei.com> <20150701115825.GV19282@twins.programming.kicks-ass.net> <5594A679.7010108@plumgrid.com> <5594B22F.8090500@huawei.com>
In-Reply-To: <5594B22F.8090500@huawei.com>
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 2316
Lines: 64

On 7/1/15 8:38 PM, He Kuang wrote:
>
>
> On 2015/7/2 10:48, Alexei Starovoitov wrote:
>> On 7/1/15 4:58 AM, Peter Zijlstra wrote:
>>>
>>> But why create a separate trace buffer, it should go into the regular
>>> perf buffer.
>>
>> +1
>>
>> I think
>> +static char __percpu *perf_extra_trace_buf[PERF_NR_CONTEXTS];
>> is redundant.
>> It adds quite a bit of unnecessary complexity to the whole patch set.
>>
>> Also the call to bpf_output_sample() is not effective unless program
>> returns 1. It's a confusing user interface.
>>
>> Also you cannot ever do:
>>       BPF_FUNC_probe_read,
>> +    BPF_FUNC_output_sample,
>>       BPF_FUNC_ktime_get_ns,
>> new functions must be added to the end.
>>
>> Why not just do:
>> perf_trace_buf_prepare() + perf_trace_buf_submit() from the helper?
>> No changes to current code.
>> No need to call __get_data_size() and other overhead.
>> The helper can be called multiple times from the same program.
>> imo much cleaner.
>>
>
> Invoke perf_trace_buf_submit() will generate a second perf
> event (header->type = PERF_RECORD_SAMPLE) entry which is
> different from the event entry outputed by the orignial
> kprobe. So the final result of the example in 00/00 patch may
> like this:
>
>    sample entry 1(from bpf_prog):
>      comm timestamp1 generic_perform_write pmu_value=0x1234
>    sample entry 2(from original kprobe):
>      comm timestamp2 generic_perform_write: (ffffffff81140b60)
> Compared with current implementation:
>    combined sample entry:
>      comm timestamp generic_perform_write: (ffffffff81140b60)
> pmu_value=0x1234
>
> The former two entries may be discontinuous as there are multiple
> threads and kprobes to be recorded, and there's a chance that one
> entry is missed but the other is recorded. What we need is the
> pmu_value read when 'generic_perform_write' enters, the two
> entries result is not intuitive enough and userspace tools have
> to do the work to find and combine those two sample entries to
> get the result.

Just change your example to return 0 and user space will see
one sample.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/