Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933408AbcDEOFk (ORCPT ); Tue, 5 Apr 2016 10:05:40 -0400 Received: from szxga02-in.huawei.com ([119.145.14.65]:53509 "EHLO szxga02-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751976AbcDEOFi (ORCPT ); Tue, 5 Apr 2016 10:05:38 -0400 Subject: Re: [PATCH 4/4] perf core: Add backward attribute to perf event To: Peter Zijlstra References: <1459147292-239310-1-git-send-email-wangnan0@huawei.com> <1459147292-239310-5-git-send-email-wangnan0@huawei.com> <20160329140439.GK3408@twins.programming.kicks-ass.net> <56FB39E9.7020006@huawei.com> <56FB3C3D.3050400@huawei.com> CC: Alexei Starovoitov , Arnaldo Carvalho de Melo , , Brendan Gregg , He Kuang , Jiri Olsa , Masami Hiramatsu , Namhyung Kim , , Zefan Li From: "Wangnan (F)" Message-ID: <5703C612.1080608@huawei.com> Date: Tue, 5 Apr 2016 22:05:06 +0800 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.5.0 MIME-Version: 1.0 In-Reply-To: <56FB3C3D.3050400@huawei.com> Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [10.111.66.109] X-CFilter-Loop: Reflected X-Mirapoint-Virus-RAPID-Raw: score=unknown(0), refid=str=0001.0A090201.5703C622.0329,ss=1,re=0.000,recu=0.000,reip=0.000,cl=1,cld=1,fgs=0, ip=0.0.0.0, so=2013-06-18 04:22:30, dmn=2013-03-21 17:37:32 X-Mirapoint-Loop-Id: da25edd64f6d47e9d11e4303bb1192de Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3289 Lines: 120 On 2016/3/30 10:38, Wangnan (F) wrote: > > > On 2016/3/30 10:28, Wangnan (F) wrote: >> >> >> On 2016/3/29 22:04, Peter Zijlstra wrote: >>> On Mon, Mar 28, 2016 at 06:41:32AM +0000, Wang Nan wrote: >>> >>> Could you maybe write a perf/tests thingy for this so that _some_ >>> userspace exists that exercises this new code? >>> >>> >>>> int perf_output_begin(struct perf_output_handle *handle, >>>> struct perf_event *event, unsigned int size) >>>> { >>>> + if (unlikely(is_write_backward(event))) >>>> + return __perf_output_begin(handle, event, size, true); >>>> return __perf_output_begin(handle, event, size, false); >>>> } >>> Would something like: >>> >>> int perf_output_begin(...) >>> { >>> if (unlikely(is_write_backward(event)) >>> return perf_output_begin_backward(...); >>> return perf_output_begin_forward(...); >>> } >>> >>> make sense; I'm not sure how much is still using this, but it seems >>> somewhat excessive to inline two copies of that thing into a single >>> function. >> >> [SNIP] > > Sorry. Your second suggestion seems also good: > > My implementation makes a big perf_output_begin(), but introduces only > one load and one branch. > > Your first suggestion introduces one load, one branch and one function > call. > > Your second suggestion introduces one load, and at least one (and at > most three) branches. > > I need some benchmarking result. > > Thank you. No obviously performance divergence among all 3 implementations. Here are some numbers: I tested the cost of generating PERF_RECORD_COMM event using prctl with following code: ... gettimeofday(&tv1, NULL); for (i = 0; i < 1000 * 1000 * 3; i++) { char proc_name[10]; snprintf(proc_name, sizeof(proc_name), "p:%d\n", i); prctl(PR_SET_NAME, proc_name); } gettimeofday(&tv2, NULL); us1 = tv1.tv_sec * 1000000 + tv1.tv_usec; us2 = tv2.tv_sec * 1000000 + tv2.tv_usec; printf("%ld\n", us2 - us1); ... Run this benchmark 100 time in each experiment. Bind benchmark to core 2 and perf to core 1 to ensure they are on a same CPU. Result: BASE : execute without perf 4.5 : pure v4.5 TIP : with only patch 1-3/4 in this patch set applied BIGFUNC : the implementation in my original patch FUNCCALL: the implememtation in Peter's first suggestion: int perf_output_begin(...) { if (unlikely(is_write_backward(event)) return perf_output_begin_backward(...); return perf_output_begin_forward(...); } BRANCH : the implememtation in Peter's second suggestion: int perf_output_begin(...) { return __perf_output_begin(..., unlikely(event->attr.backwards)); } 'perf' is executed using: # perf record -o /dev/null --no-buildid-cache -e syscalls:sys_enter_read ... Results: MEAN STDVAR BASE : 1122968.85 33492.52 4.5 : 2714200.70 26231.69 TIP : 2646260.46 32610.56 BIGFUNC : 2661308.46 52707.47 FUNCCALL: 2636061.10 52607.80 BRANCH : 2651335.74 34910.04 Considering the stdvar, the performance result is nearly identical. I'd like to choose 'BRANCH' because its code looks better. Thank you.