Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752069AbbGAGW7 (ORCPT ); Wed, 1 Jul 2015 02:22:59 -0400 Received: from szxga02-in.huawei.com ([119.145.14.65]:12030 "EHLO szxga02-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751184AbbGAGWx (ORCPT ); Wed, 1 Jul 2015 02:22:53 -0400 Message-ID: <559386D7.1020208@huawei.com> Date: Wed, 1 Jul 2015 14:21:11 +0800 From: "Wangnan (F)" User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Thunderbird/31.6.0 MIME-Version: 1.0 To: Peter Zijlstra , He Kuang CC: , , , , , , , , pi3orama Subject: Re: [RFC PATCH 0/5] Make eBPF programs output data to perf event References: <1435719455-91155-1-git-send-email-hekuang@huawei.com> <20150701054458.GN19282@twins.programming.kicks-ass.net> In-Reply-To: <20150701054458.GN19282@twins.programming.kicks-ass.net> Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [10.111.66.109] X-CFilter-Loop: Reflected Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2569 Lines: 69 On 2015/7/1 13:44, Peter Zijlstra wrote: > On Wed, Jul 01, 2015 at 02:57:30AM +0000, He Kuang wrote: >> This patch adds an extra perf trace buffer for other utilities like >> bpf to fill extra data to perf events. > What!, why? The goal of this patchset is to give BPF program a mean to output something through perf samples. BPF programs give us a way to filter and aggregate events, which makes us do many interesting things. For example, we can count the number of context switches in sys_write system calls by attaching BPF programs onto the entry and exit points of the system call and the entry of __schedule, then count the number when exiting. Combined with BPF reading PMU which we are working on, BPF programs can be used to profile kernel functions in a fine-grained manner. However, currently the only ways that BPF programs can transfer something to perf are: 1. By returning 0 and 1 a BPF program can prevent perf to collect a sample; 2. By map mechanism, user programs (perf) is possible to read the aggregation result computed by BPF program (not implemented now); 3. By BPF_FUNC_trace_printk they are able to output string into ftrace ring buffer. For the task I mentioned above, the best way do it is to print results into ring buffer in the program attached to sys_write%return, and merge them and perf.data together using timestamps. We believe it can be improved. These patches is a try that, allows bpf programs call something like 'BPF_FUNC_output_sample' to output something, and collects them with other data output by a perf sample together. With the help of perf (not implemented yet), perf will be able to extract those data through 'perf script' or 'perf data convert --to-ctf'. Some further analysis can be made then. The extra perf trace buffer is added for that reason. Currently, we use perf_trace_buf as a per_cpu buffer for other parts of a perf sample data. Making bpf program to append information into that buffer is possible, but requires us to caculate data size a perf sample require (by calling __get_data_size) before we can ensure the samples will not be filtered out. Also, we can make BPF program write from the beginning of that buffer and append perf sample data to it. However, they will not able to be parsed by current perf then. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/