Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1030625AbbD1RQj (ORCPT ); Tue, 28 Apr 2015 13:16:39 -0400 Received: from mail-ig0-f172.google.com ([209.85.213.172]:36695 "EHLO mail-ig0-f172.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1030316AbbD1RQh (ORCPT ); Tue, 28 Apr 2015 13:16:37 -0400 MIME-Version: 1.0 In-Reply-To: <20150428165807.GI19025@pd.tnic> References: <20150427164024.GD28871@pd.tnic> <20150427183854.GG28871@pd.tnic> <20150427185344.GI28871@pd.tnic> <61BCF405-8000-43EB-A6B1-2BF5677E4ADE@zytor.com> <20150427200329.GL28871@pd.tnic> <2F6CA156-F03F-4F49-A6B9-7D1D1E1D805B@zytor.com> <20150428155511.GF19025@pd.tnic> <20150428165807.GI19025@pd.tnic> Date: Tue, 28 Apr 2015 10:16:33 -0700 X-Google-Sender-Auth: 5XLvY4Tmi17UWSgAFvG1Yc3TprU Message-ID: Subject: Re: [PATCH] x86_64, asm: Work around AMD SYSRET SS descriptor attribute issue From: Linus Torvalds To: Borislav Petkov Cc: "H. Peter Anvin" , Andy Lutomirski , Andy Lutomirski , X86 ML , Denys Vlasenko , Brian Gerst , Denys Vlasenko , Ingo Molnar , Steven Rostedt , Oleg Nesterov , Frederic Weisbecker , Alexei Starovoitov , Will Drewry , Kees Cook , Linux Kernel Mailing List , Mel Gorman Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2630 Lines: 53 On Tue, Apr 28, 2015 at 9:58 AM, Borislav Petkov wrote: > > Well, AFAIK, NOPs do require resources for tracking in the machine. I > was hoping that hw would be smarter and discard at decode time but there > probably are reasons that it can't be done (...yet). I suspect it might be related to things like getting performance counters and instruction debug traps etc right. There are quite possibly also simply constraints where the front end has to generate *something* just to keep the back end happy. The front end can generally not just totally remove things without any tracking, since the front end doesn't know if things are speculative etc. So you can't do instruction debug traps in the front end afaik. Or rather, I'm sure you *could*, but in general I suspect the best way to handle nops without making them *too* special is to bunch up several to make them look like one big instruction, and then associate that bunch with some minimal tracking uop that uses minimal resources in the back end without losing sight of the original nop entirely, so that you can still do checks at retirement time. So I think the "you can do ~5 nops per cycle" is not unreasonable. Even in the uop cache, the nops have to take some space, and have to do things like update eip, so I don't think they'll ever be entirely free, the best you can do is minimize their impact. > $ taskset -c 3 ./t > Running 60 times, 1000000 loops per run. > nop_0x90 average: 0.390625 > nop_3_byte average: 0.390625 > > and those exact numbers are actually reproducible pretty reliably. Yeah. That looks somewhat reasonable. I think the 16h architecture technically decodes just two instructions per cycle, but I wouldn't be surprised if there's some simple nop special casing going on so that it can decode three nops in one go when things line up right. So you might get 0.33 cycles for the best case, but then 0.5 cycles when it crosses a 16-byte boundary or something. So you might have some pattern where it decodes 32 bytes worth of nops as 12/8/12 bytes (3/2/3 instructions), which would come out to 0.38 cycles. Add some random overhead for the loop, and I could see the 0.39 cycles. That was wild handwaving with no data to back it up, but I'm trying to explain to myself why you could get some odd number like that. It seems _possiible_ at least. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/