Date: Mon, 27 Apr 2015 16:39:05 +0200
From: Borislav Petkov <bp@alien8.de>
To: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>, X86 ML <x86@kernel.org>,
        "H. Peter Anvin" <hpa@zytor.com>,
        Denys Vlasenko <vda.linux@googlemail.com>,
        Linus Torvalds <torvalds@linux-foundation.org>,
        Brian Gerst <brgerst@gmail.com>, Denys Vlasenko <dvlasenk@redhat.com>,
        Ingo Molnar <mingo@kernel.org>, Steven Rostedt <rostedt@goodmis.org>,
        Oleg Nesterov <oleg@redhat.com>,
        Frederic Weisbecker <fweisbec@gmail.com>,
        Alexei Starovoitov <ast@plumgrid.com>, Will Drewry <wad@chromium.org>,
        Kees Cook <keescook@chromium.org>,
        Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] x86_64, asm: Work around AMD SYSRET SS descriptor
 attribute issue
Message-ID: <20150427143905.GK6774@pd.tnic>
References: <5d120f358612d73fc909f5bfa47e7bd082db0af0.1429841474.git.luto@kernel.org>
 <20150425211206.GE32099@pd.tnic>
 <CALCETrWvqyCYOzCYXz7ZnzaM0obbidqo5CxVp2Hn1ELEiC3m3g@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
In-Reply-To: <CALCETrWvqyCYOzCYXz7ZnzaM0obbidqo5CxVp2Hn1ELEiC3m3g@mail.gmail.com>
User-Agent: Mutt/1.5.23 (2014-03-12)
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 6128
Lines: 91

On Sun, Apr 26, 2015 at 04:39:38PM -0700, Andy Lutomirski wrote:
> I know it would be ugly, but would it be worth saving two bytes by
> using ALTERNATIVE "jmp 1f", "shl ...", ...?

Damn, it is actually visible even that saving the unconditional forward
JMP makes the numbers marginally nicer (E: row). So I guess we'll be
dropping the forward JMP too.

A:    2835570.145246      cpu-clock (msec)                                              ( +-  0.02% ) [100.00%]
B:    2833364.074970      cpu-clock (msec)                                              ( +-  0.04% ) [100.00%]
C:    2834708.335431      cpu-clock (msec)                                              ( +-  0.02% ) [100.00%]
D:    2835055.118431      cpu-clock (msec)                                              ( +-  0.01% ) [100.00%]
E:    2833115.118624      cpu-clock (msec)                                              ( +-  0.06% ) [100.00%]

A:    2835570.099981      task-clock (msec)         #    3.996 CPUs utilized            ( +-  0.02% ) [100.00%]
B:    2833364.073633      task-clock (msec)         #    3.996 CPUs utilized            ( +-  0.04% ) [100.00%]
C:    2834708.350387      task-clock (msec)         #    3.996 CPUs utilized            ( +-  0.02% ) [100.00%]
D:    2835055.094383      task-clock (msec)         #    3.996 CPUs utilized            ( +-  0.01% ) [100.00%]
E:    2833115.145292      task-clock (msec)         #    3.996 CPUs utilized            ( +-  0.06% ) [100.00%]

A: 5,591,213,166,613      cycles                    #    1.972 GHz                      ( +-  0.03% ) [75.00%]
B: 5,585,023,802,888      cycles                    #    1.971 GHz                      ( +-  0.03% ) [75.00%]
C: 5,587,983,212,758      cycles                    #    1.971 GHz                      ( +-  0.02% ) [75.00%]
D: 5,584,838,532,936      cycles                    #    1.970 GHz                      ( +-  0.03% ) [75.00%]
E: 5,583,979,727,842      cycles                    #    1.971 GHz                      ( +-  0.05% ) [75.00%]

cycles is the lowest, nice.

A: 3,106,707,101,530      instructions              #    0.56  insns per cycle          ( +-  0.01% ) [75.00%]
B: 3,106,632,251,528      instructions              #    0.56  insns per cycle          ( +-  0.00% ) [75.00%]
C: 3,106,265,958,142      instructions              #    0.56  insns per cycle          ( +-  0.00% ) [75.00%]
D: 3,106,294,801,185      instructions              #    0.56  insns per cycle          ( +-  0.00% ) [75.00%]
E: 3,106,381,223,355      instructions              #    0.56  insns per cycle          ( +-  0.01% ) [75.00%]

Understandable - we end up executing 5 insns more:

ffffffff815b90ac:       66 66 66 90             data16 data16 xchg %ax,%ax
ffffffff815b90b0:       66 66 66 90             data16 data16 xchg %ax,%ax
ffffffff815b90b4:       66 66 66 90             data16 data16 xchg %ax,%ax
ffffffff815b90b8:       66 66 66 90             data16 data16 xchg %ax,%ax
ffffffff815b90bc:       90                      nop


A:   683,676,044,429      branches                  #  241.107 M/sec                    ( +-  0.01% ) [75.00%]
B:   683,670,899,595      branches                  #  241.293 M/sec                    ( +-  0.01% ) [75.00%]
C:   683,675,772,858      branches                  #  241.180 M/sec                    ( +-  0.01% ) [75.00%]
D:   683,683,533,664      branches                  #  241.154 M/sec                    ( +-  0.00% ) [75.00%]
E:   683,648,518,667      branches                  #  241.306 M/sec                    ( +-  0.01% ) [75.00%]

Lowest.

A:    43,829,535,008      branch-misses             #    6.41% of all branches          ( +-  0.02% ) [75.00%]
B:    43,844,118,416      branch-misses             #    6.41% of all branches          ( +-  0.03% ) [75.00%]
C:    43,819,871,086      branch-misses             #    6.41% of all branches          ( +-  0.02% ) [75.00%]
D:    43,795,107,998      branch-misses             #    6.41% of all branches          ( +-  0.02% ) [75.00%]
E:    43,801,985,070      branch-misses             #    6.41% of all branches          ( +-  0.02% ) [75.00%]

That looks like noise to me - we shouldn't be getting more branch misses
with the E: version.

A:         2,030,357      context-switches          #    0.716 K/sec                    ( +-  0.06% ) [100.00%]
B:         2,029,313      context-switches          #    0.716 K/sec                    ( +-  0.05% ) [100.00%]
C:         2,028,566      context-switches          #    0.716 K/sec                    ( +-  0.06% ) [100.00%]
D:         2,028,895      context-switches          #    0.716 K/sec                    ( +-  0.06% ) [100.00%]
E:         2,031,008      context-switches          #    0.717 K/sec                    ( +-  0.09% ) [100.00%]

A:            52,421      migrations                #    0.018 K/sec                    ( +-  1.13% )
B:            52,049      migrations                #    0.018 K/sec                    ( +-  1.02% )
C:            51,365      migrations                #    0.018 K/sec                    ( +-  0.92% )
D:            51,766      migrations                #    0.018 K/sec                    ( +-  1.11% )
E:            53,047      migrations                #    0.019 K/sec                    ( +-  1.08% )

A:     709.528485252 seconds time elapsed                                          ( +-  0.02% )
B:     708.976557288 seconds time elapsed                                          ( +-  0.04% )
C:     709.312844791 seconds time elapsed                                          ( +-  0.02% )
D:     709.400050112 seconds time elapsed                                          ( +-  0.01% )
E:     708.914562508 seconds time elapsed                                          ( +-  0.06% )

Nice.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/