Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752940AbcCYMhG (ORCPT ); Fri, 25 Mar 2016 08:37:06 -0400 Received: from szxga01-in.huawei.com ([58.251.152.64]:10426 "EHLO szxga01-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752884AbcCYMhD (ORCPT ); Fri, 25 Mar 2016 08:37:03 -0400 Subject: Re: [PATCH 3/5] perf core: Prepare writing into ring buffer from end To: Peter Zijlstra References: <1457949585-191064-1-git-send-email-wangnan0@huawei.com> <1457949585-191064-4-git-send-email-wangnan0@huawei.com> <20160323095007.GW6344@twins.programming.kicks-ass.net> <56F52E83.70409@huawei.com> CC: , , He Kuang , Alexei Starovoitov , "Arnaldo Carvalho de Melo" , Brendan Gregg , "Jiri Olsa" , Masami Hiramatsu , Namhyung Kim , Zefan Li , From: "Wangnan (F)" Message-ID: <56F530C1.9010106@huawei.com> Date: Fri, 25 Mar 2016 20:36:17 +0800 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.5.0 MIME-Version: 1.0 In-Reply-To: <56F52E83.70409@huawei.com> Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [10.111.66.109] X-CFilter-Loop: Reflected X-Mirapoint-Virus-RAPID-Raw: score=unknown(0), refid=str=0001.0A090204.56F530DB.0115,ss=1,re=0.000,recu=0.000,reip=0.000,cl=1,cld=1,fgs=0, ip=0.0.0.0, so=2013-06-18 04:22:30, dmn=2013-03-21 17:37:32 X-Mirapoint-Loop-Id: d90c0237f44123ab41d8f087aae4b125 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3588 Lines: 91 On 2016/3/25 20:26, Wangnan (F) wrote: > > > On 2016/3/23 17:50, Peter Zijlstra wrote: >> On Mon, Mar 14, 2016 at 09:59:43AM +0000, Wang Nan wrote: >>> Convert perf_output_begin to __perf_output_begin and make the later >>> function able to write records from the end of the ring buffer. >>> Following commits will utilize the 'backward' flag. >>> >>> This patch doesn't introduce any extra performance overhead since we >>> use always_inline. >> So while I agree that with __always_inline and constant propagation we >> _should_ end up with the same code, we have: >> >> $ size defconfig-build/kernel/events/ring_buffer.o.{pre,post} >> text data bss dec hex filename >> 3785 2 0 3787 ecb >> defconfig-build/kernel/events/ring_buffer.o.pre >> 3673 2 0 3675 e5b >> defconfig-build/kernel/events/ring_buffer.o.post >> >> The patch actually makes the file shrink. >> >> So I think we still want to have some actual performance numbers. > > In my environment the two objects are nearly idential: > > > $ objdump -d kernel/events/ring_buffer.o.new > ./out.new.S > $ objdump -d kernel/events/ring_buffer.o.old > ./out.old.S > > --- ./out.old.S 2016-03-25 12:18:52.060656423 +0000 > +++ ./out.new.S 2016-03-25 12:18:45.376630269 +0000 > @@ -1,5 +1,5 @@ > > -kernel/events/ring_buffer.o.old: file format elf64-x86-64 > +kernel/events/ring_buffer.o.new: file format elf64-x86-64 > > > Disassembly of section .text: > @@ -320,7 +320,7 @@ > 402: 4d 8d 04 0f lea (%r15,%rcx,1),%r8 > 406: 48 89 c8 mov %rcx,%rax > 409: 4c 0f b1 43 40 cmpxchg %r8,0x40(%rbx) > - 40e: 48 39 c8 cmp %rcx,%rax > + 40e: 48 39 c1 cmp %rax,%rcx > 411: 75 b4 jne 3c7 > 413: 48 8b 73 58 mov 0x58(%rbx),%rsi > 417: 48 8b 43 68 mov 0x68(%rbx),%rax > @@ -357,7 +357,7 @@ > 480: 85 c0 test %eax,%eax > 482: 0f 85 02 ff ff ff jne 38a > 488: 48 c7 c2 00 00 00 00 mov $0x0,%rdx > - 48f: be 7c 00 00 00 mov $0x7c,%esi > + 48f: be 89 00 00 00 mov $0x89,%esi > 494: 48 c7 c7 00 00 00 00 mov $0x0,%rdi > 49b: c6 05 00 00 00 00 01 movb $0x1,0x0(%rip) # 4a2 > > 4a2: e8 00 00 00 00 callq 4a7 > @@ -874,7 +874,7 @@ > c39: eb e7 jmp c22 > > c3b: 80 3d 00 00 00 00 00 cmpb $0x0,0x0(%rip) # c42 > > c42: 75 93 jne bd7 > > - c44: be 2b 01 00 00 mov $0x12b,%esi > + c44: be 49 01 00 00 mov $0x149,%esi > c49: 48 c7 c7 00 00 00 00 mov $0x0,%rdi > c50: e8 00 00 00 00 callq c55 > > c55: c6 05 00 00 00 00 01 movb $0x1,0x0(%rip) # c5c > > > > I think you enabled some unusual config options? > You must enabled CONFIG_OPTIMIZE_INLINING. Now I get similar result: $ size kernel/events/ring_buffer.o* text data bss dec hex filename 4545 4 8 4557 11cd kernel/events/ring_buffer.o.new 4641 4 8 4653 122d kernel/events/ring_buffer.o.old Thank you.