User-Agent: K-9 Mail for Android
In-Reply-To: <20150427185344.GI28871@pd.tnic>
References: <CALCETrWvqyCYOzCYXz7ZnzaM0obbidqo5CxVp2Hn1ELEiC3m3g@mail.gmail.com> <20150427085305.GB6774@pd.tnic> <20150427113506.GG6774@pd.tnic> <CA+55aFykNBrdHFSVeLftG7Ujx4Tk=MuUJXD2rNYn9kwL+kLN5w@mail.gmail.com> <20150427154631.GB28871@pd.tnic> <CA+55aFxWyhc+ax_jnqH9zsvqruL63gcePMbRQGJ_wM06e1QKpA@mail.gmail.com> <20150427164024.GD28871@pd.tnic> <CA+55aFySbPjeCgetxeSQPSP-h+yKnrHPcTGuoorAHJbCj53-8A@mail.gmail.com> <20150427183854.GG28871@pd.tnic> <CA+55aFzYz=b887TLTuJ1u6Fv3B0eeXtnx+XzojJ8BBqm+Eha_g@mail.gmail.com> <20150427185344.GI28871@pd.tnic>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Content-Type: text/plain;
 charset=UTF-8
Subject: Re: [PATCH] x86_64, asm: Work around AMD SYSRET SS descriptor attribute issue
From: "H. Peter Anvin" <hpa@zytor.com>
Date: Mon, 27 Apr 2015 12:59:11 -0700
To: Borislav Petkov <bp@alien8.de>,
        Linus Torvalds <torvalds@linux-foundation.org>
CC: Andy Lutomirski <luto@amacapital.net>, Andy Lutomirski <luto@kernel.org>,
        X86 ML <x86@kernel.org>, Denys Vlasenko <vda.linux@googlemail.com>,
        Brian Gerst <brgerst@gmail.com>, Denys Vlasenko <dvlasenk@redhat.com>,
        Ingo Molnar <mingo@kernel.org>, Steven Rostedt <rostedt@goodmis.org>,
        Oleg Nesterov <oleg@redhat.com>,
        Frederic Weisbecker <fweisbec@gmail.com>,
        Alexei Starovoitov <ast@plumgrid.com>, Will Drewry <wad@chromium.org>,
        Kees Cook <keescook@chromium.org>,
        Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Message-ID: <61BCF405-8000-43EB-A6B1-2BF5677E4ADE@zytor.com>
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 2391
Lines: 66

It really comes down to this: it seems older cores from both Intel and AMD perform better with 66 66 66 90, whereas the 0F 1F series is better on newer cores.

When I measured it, the differences were sometimes dramatic.

On April 27, 2015 11:53:44 AM PDT, Borislav Petkov <bp@alien8.de> wrote:
>On Mon, Apr 27, 2015 at 11:47:30AM -0700, Linus Torvalds wrote:
>> On Mon, Apr 27, 2015 at 11:38 AM, Borislav Petkov <bp@alien8.de>
>wrote:
>> >
>> > So our current NOP-infrastructure does ASM_NOP_MAX NOPs of 8 bytes
>so
>> > without more invasive changes, our longest NOPs are 8 byte long and
>then
>> > we have to repeat.
>> 
>> Btw (and I'm too lazy to check) do we take alignment into account?
>> 
>> Because if you have to split, and use multiple nops, it is *probably*
>> a good idea to try to avoid 16-byte boundaries, since that's can be
>> the I$ fetch granularity from L1 (although I guess 32B is getting
>more
>> common).
>
>Yeah, on F16h you have 32B fetch but the paths later in the machine
>gets narrower, so to speak.
>
>> So the exact split might depend on the alignment of the nop
>replacement..
>
>Yeah, no. Our add_nops() is trivial:
>
>/* Use this to add nops to a buffer, then text_poke the whole buffer.
>*/
>static void __init_or_module add_nops(void *insns, unsigned int len)
>{
>        while (len > 0) {
>                unsigned int noplen = len;
>                if (noplen > ASM_NOP_MAX)
>                        noplen = ASM_NOP_MAX;
>                memcpy(insns, ideal_nops[noplen], noplen);
>                insns += noplen;
>                len -= noplen;
>        }
>}
>
>> Can we perhaps get rid of the distinction entirely, and just use one
>> set of 64-bit nops for both Intel/AMD?
>
>I *think* hpa would have an opinion here. I'm judging by looking at
>comments like this one in the code:
>
>        /*
>         * Due to a decoder implementation quirk, some
>         * specific Intel CPUs actually perform better with
>         * the "k8_nops" than with the SDM-recommended NOPs.
>         */
>
>which is a fun one in itself. :-)

-- 
Sent from my mobile phone.  Please pardon brevity and lack of formatting.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/