Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S964801AbbD0UAG (ORCPT ); Mon, 27 Apr 2015 16:00:06 -0400 Received: from terminus.zytor.com ([198.137.202.10]:59198 "EHLO mail.zytor.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753142AbbD0UAE (ORCPT ); Mon, 27 Apr 2015 16:00:04 -0400 User-Agent: K-9 Mail for Android In-Reply-To: <20150427185344.GI28871@pd.tnic> References: <20150427085305.GB6774@pd.tnic> <20150427113506.GG6774@pd.tnic> <20150427154631.GB28871@pd.tnic> <20150427164024.GD28871@pd.tnic> <20150427183854.GG28871@pd.tnic> <20150427185344.GI28871@pd.tnic> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset=UTF-8 Subject: Re: [PATCH] x86_64, asm: Work around AMD SYSRET SS descriptor attribute issue From: "H. Peter Anvin" Date: Mon, 27 Apr 2015 12:59:11 -0700 To: Borislav Petkov , Linus Torvalds CC: Andy Lutomirski , Andy Lutomirski , X86 ML , Denys Vlasenko , Brian Gerst , Denys Vlasenko , Ingo Molnar , Steven Rostedt , Oleg Nesterov , Frederic Weisbecker , Alexei Starovoitov , Will Drewry , Kees Cook , Linux Kernel Mailing List Message-ID: <61BCF405-8000-43EB-A6B1-2BF5677E4ADE@zytor.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2391 Lines: 66 It really comes down to this: it seems older cores from both Intel and AMD perform better with 66 66 66 90, whereas the 0F 1F series is better on newer cores. When I measured it, the differences were sometimes dramatic. On April 27, 2015 11:53:44 AM PDT, Borislav Petkov wrote: >On Mon, Apr 27, 2015 at 11:47:30AM -0700, Linus Torvalds wrote: >> On Mon, Apr 27, 2015 at 11:38 AM, Borislav Petkov >wrote: >> > >> > So our current NOP-infrastructure does ASM_NOP_MAX NOPs of 8 bytes >so >> > without more invasive changes, our longest NOPs are 8 byte long and >then >> > we have to repeat. >> >> Btw (and I'm too lazy to check) do we take alignment into account? >> >> Because if you have to split, and use multiple nops, it is *probably* >> a good idea to try to avoid 16-byte boundaries, since that's can be >> the I$ fetch granularity from L1 (although I guess 32B is getting >more >> common). > >Yeah, on F16h you have 32B fetch but the paths later in the machine >gets narrower, so to speak. > >> So the exact split might depend on the alignment of the nop >replacement.. > >Yeah, no. Our add_nops() is trivial: > >/* Use this to add nops to a buffer, then text_poke the whole buffer. >*/ >static void __init_or_module add_nops(void *insns, unsigned int len) >{ > while (len > 0) { > unsigned int noplen = len; > if (noplen > ASM_NOP_MAX) > noplen = ASM_NOP_MAX; > memcpy(insns, ideal_nops[noplen], noplen); > insns += noplen; > len -= noplen; > } >} > >> Can we perhaps get rid of the distinction entirely, and just use one >> set of 64-bit nops for both Intel/AMD? > >I *think* hpa would have an opinion here. I'm judging by looking at >comments like this one in the code: > > /* > * Due to a decoder implementation quirk, some > * specific Intel CPUs actually perform better with > * the "k8_nops" than with the SDM-recommended NOPs. > */ > >which is a fun one in itself. :-) -- Sent from my mobile phone. Please pardon brevity and lack of formatting. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/