Subject: Re: [PATCH 3/5] perf core: Prepare writing into ring buffer from end
To: Peter Zijlstra <peterz@infradead.org>
References: <1457949585-191064-1-git-send-email-wangnan0@huawei.com>
 <1457949585-191064-4-git-send-email-wangnan0@huawei.com>
 <20160323095007.GW6344@twins.programming.kicks-ass.net>
CC: <mingo@redhat.com>, <linux-kernel@vger.kernel.org>,
        He Kuang <hekuang@huawei.com>, Alexei Starovoitov <ast@kernel.org>,
        "Arnaldo Carvalho de Melo" <acme@redhat.com>,
        Brendan Gregg <brendan.d.gregg@gmail.com>,
        "Jiri Olsa" <jolsa@kernel.org>,
        Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>,
        Namhyung Kim <namhyung@kernel.org>, Zefan Li <lizefan@huawei.com>,
        <pi3orama@163.com>
From: "Wangnan (F)" <wangnan0@huawei.com>
Message-ID: <56F52E83.70409@huawei.com>
Date: Fri, 25 Mar 2016 20:26:43 +0800
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101
 Thunderbird/38.5.0
MIME-Version: 1.0
In-Reply-To: <20160323095007.GW6344@twins.programming.kicks-ass.net>
Content-Type: text/plain; charset="utf-8"; format=flowed
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 3091
Lines: 72


On 2016/3/23 17:50, Peter Zijlstra wrote:
> On Mon, Mar 14, 2016 at 09:59:43AM +0000, Wang Nan wrote:
>> Convert perf_output_begin to __perf_output_begin and make the later
>> function able to write records from the end of the ring buffer.
>> Following commits will utilize the 'backward' flag.
>>
>> This patch doesn't introduce any extra performance overhead since we
>> use always_inline.
> So while I agree that with __always_inline and constant propagation we
> _should_ end up with the same code, we have:
>
> $ size defconfig-build/kernel/events/ring_buffer.o.{pre,post}
>     text    data     bss     dec     hex filename
>     3785       2       0    3787     ecb defconfig-build/kernel/events/ring_buffer.o.pre
>     3673       2       0    3675     e5b defconfig-build/kernel/events/ring_buffer.o.post
>
> The patch actually makes the file shrink.
>
> So I think we still want to have some actual performance numbers.

In my environment the two objects are nearly idential:


$ objdump -d kernel/events/ring_buffer.o.new  > ./out.new.S
$ objdump -d kernel/events/ring_buffer.o.old  > ./out.old.S

--- ./out.old.S    2016-03-25 12:18:52.060656423 +0000
+++ ./out.new.S    2016-03-25 12:18:45.376630269 +0000
@@ -1,5 +1,5 @@

-kernel/events/ring_buffer.o.old:     file format elf64-x86-64
+kernel/events/ring_buffer.o.new:     file format elf64-x86-64


  Disassembly of section .text:
@@ -320,7 +320,7 @@
   402:    4d 8d 04 0f              lea    (%r15,%rcx,1),%r8
   406:    48 89 c8                 mov    %rcx,%rax
   409:    4c 0f b1 43 40           cmpxchg %r8,0x40(%rbx)
- 40e:    48 39 c8                 cmp    %rcx,%rax
+ 40e:    48 39 c1                 cmp    %rax,%rcx
   411:    75 b4                    jne    3c7 <perf_output_begin+0xc7>
   413:    48 8b 73 58              mov    0x58(%rbx),%rsi
   417:    48 8b 43 68              mov    0x68(%rbx),%rax
@@ -357,7 +357,7 @@
   480:    85 c0                    test   %eax,%eax
   482:    0f 85 02 ff ff ff        jne    38a <perf_output_begin+0x8a>
   488:    48 c7 c2 00 00 00 00     mov    $0x0,%rdx
- 48f:    be 7c 00 00 00           mov    $0x7c,%esi
+ 48f:    be 89 00 00 00           mov    $0x89,%esi
   494:    48 c7 c7 00 00 00 00     mov    $0x0,%rdi
   49b:    c6 05 00 00 00 00 01     movb   $0x1,0x0(%rip)        # 4a2 
<perf_output_begin+0x1a2>
   4a2:    e8 00 00 00 00           callq  4a7 <perf_output_begin+0x1a7>
@@ -874,7 +874,7 @@
   c39:    eb e7                    jmp    c22 <perf_aux_output_begin+0x172>
   c3b:    80 3d 00 00 00 00 00     cmpb   $0x0,0x0(%rip)        # c42 
<perf_aux_output_begin+0x192>
   c42:    75 93                    jne    bd7 <perf_aux_output_begin+0x127>
- c44:    be 2b 01 00 00           mov    $0x12b,%esi
+ c44:    be 49 01 00 00           mov    $0x149,%esi
   c49:    48 c7 c7 00 00 00 00     mov    $0x0,%rdi
   c50:    e8 00 00 00 00           callq  c55 <perf_aux_output_begin+0x1a5>
   c55:    c6 05 00 00 00 00 01     movb   $0x1,0x0(%rip)        # c5c 
<perf_aux_output_begin+0x1ac>


I think you enabled some unusual config options?

Thank you.