Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932936AbbD0OjY (ORCPT ); Mon, 27 Apr 2015 10:39:24 -0400 Received: from mail.skyhub.de ([78.46.96.112]:59016 "EHLO mail.skyhub.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932857AbbD0OjX (ORCPT ); Mon, 27 Apr 2015 10:39:23 -0400 Date: Mon, 27 Apr 2015 16:39:05 +0200 From: Borislav Petkov To: Andy Lutomirski Cc: Andy Lutomirski , X86 ML , "H. Peter Anvin" , Denys Vlasenko , Linus Torvalds , Brian Gerst , Denys Vlasenko , Ingo Molnar , Steven Rostedt , Oleg Nesterov , Frederic Weisbecker , Alexei Starovoitov , Will Drewry , Kees Cook , Linux Kernel Mailing List Subject: Re: [PATCH] x86_64, asm: Work around AMD SYSRET SS descriptor attribute issue Message-ID: <20150427143905.GK6774@pd.tnic> References: <5d120f358612d73fc909f5bfa47e7bd082db0af0.1429841474.git.luto@kernel.org> <20150425211206.GE32099@pd.tnic> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6128 Lines: 91 On Sun, Apr 26, 2015 at 04:39:38PM -0700, Andy Lutomirski wrote: > I know it would be ugly, but would it be worth saving two bytes by > using ALTERNATIVE "jmp 1f", "shl ...", ...? Damn, it is actually visible even that saving the unconditional forward JMP makes the numbers marginally nicer (E: row). So I guess we'll be dropping the forward JMP too. A: 2835570.145246 cpu-clock (msec) ( +- 0.02% ) [100.00%] B: 2833364.074970 cpu-clock (msec) ( +- 0.04% ) [100.00%] C: 2834708.335431 cpu-clock (msec) ( +- 0.02% ) [100.00%] D: 2835055.118431 cpu-clock (msec) ( +- 0.01% ) [100.00%] E: 2833115.118624 cpu-clock (msec) ( +- 0.06% ) [100.00%] A: 2835570.099981 task-clock (msec) # 3.996 CPUs utilized ( +- 0.02% ) [100.00%] B: 2833364.073633 task-clock (msec) # 3.996 CPUs utilized ( +- 0.04% ) [100.00%] C: 2834708.350387 task-clock (msec) # 3.996 CPUs utilized ( +- 0.02% ) [100.00%] D: 2835055.094383 task-clock (msec) # 3.996 CPUs utilized ( +- 0.01% ) [100.00%] E: 2833115.145292 task-clock (msec) # 3.996 CPUs utilized ( +- 0.06% ) [100.00%] A: 5,591,213,166,613 cycles # 1.972 GHz ( +- 0.03% ) [75.00%] B: 5,585,023,802,888 cycles # 1.971 GHz ( +- 0.03% ) [75.00%] C: 5,587,983,212,758 cycles # 1.971 GHz ( +- 0.02% ) [75.00%] D: 5,584,838,532,936 cycles # 1.970 GHz ( +- 0.03% ) [75.00%] E: 5,583,979,727,842 cycles # 1.971 GHz ( +- 0.05% ) [75.00%] cycles is the lowest, nice. A: 3,106,707,101,530 instructions # 0.56 insns per cycle ( +- 0.01% ) [75.00%] B: 3,106,632,251,528 instructions # 0.56 insns per cycle ( +- 0.00% ) [75.00%] C: 3,106,265,958,142 instructions # 0.56 insns per cycle ( +- 0.00% ) [75.00%] D: 3,106,294,801,185 instructions # 0.56 insns per cycle ( +- 0.00% ) [75.00%] E: 3,106,381,223,355 instructions # 0.56 insns per cycle ( +- 0.01% ) [75.00%] Understandable - we end up executing 5 insns more: ffffffff815b90ac: 66 66 66 90 data16 data16 xchg %ax,%ax ffffffff815b90b0: 66 66 66 90 data16 data16 xchg %ax,%ax ffffffff815b90b4: 66 66 66 90 data16 data16 xchg %ax,%ax ffffffff815b90b8: 66 66 66 90 data16 data16 xchg %ax,%ax ffffffff815b90bc: 90 nop A: 683,676,044,429 branches # 241.107 M/sec ( +- 0.01% ) [75.00%] B: 683,670,899,595 branches # 241.293 M/sec ( +- 0.01% ) [75.00%] C: 683,675,772,858 branches # 241.180 M/sec ( +- 0.01% ) [75.00%] D: 683,683,533,664 branches # 241.154 M/sec ( +- 0.00% ) [75.00%] E: 683,648,518,667 branches # 241.306 M/sec ( +- 0.01% ) [75.00%] Lowest. A: 43,829,535,008 branch-misses # 6.41% of all branches ( +- 0.02% ) [75.00%] B: 43,844,118,416 branch-misses # 6.41% of all branches ( +- 0.03% ) [75.00%] C: 43,819,871,086 branch-misses # 6.41% of all branches ( +- 0.02% ) [75.00%] D: 43,795,107,998 branch-misses # 6.41% of all branches ( +- 0.02% ) [75.00%] E: 43,801,985,070 branch-misses # 6.41% of all branches ( +- 0.02% ) [75.00%] That looks like noise to me - we shouldn't be getting more branch misses with the E: version. A: 2,030,357 context-switches # 0.716 K/sec ( +- 0.06% ) [100.00%] B: 2,029,313 context-switches # 0.716 K/sec ( +- 0.05% ) [100.00%] C: 2,028,566 context-switches # 0.716 K/sec ( +- 0.06% ) [100.00%] D: 2,028,895 context-switches # 0.716 K/sec ( +- 0.06% ) [100.00%] E: 2,031,008 context-switches # 0.717 K/sec ( +- 0.09% ) [100.00%] A: 52,421 migrations # 0.018 K/sec ( +- 1.13% ) B: 52,049 migrations # 0.018 K/sec ( +- 1.02% ) C: 51,365 migrations # 0.018 K/sec ( +- 0.92% ) D: 51,766 migrations # 0.018 K/sec ( +- 1.11% ) E: 53,047 migrations # 0.019 K/sec ( +- 1.08% ) A: 709.528485252 seconds time elapsed ( +- 0.02% ) B: 708.976557288 seconds time elapsed ( +- 0.04% ) C: 709.312844791 seconds time elapsed ( +- 0.02% ) D: 709.400050112 seconds time elapsed ( +- 0.01% ) E: 708.914562508 seconds time elapsed ( +- 0.06% ) Nice. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/