Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756834AbeAHLQ2 (ORCPT + 1 other); Mon, 8 Jan 2018 06:16:28 -0500 Received: from smtp.ctxuk.citrix.com ([185.25.65.24]:65066 "EHLO SMTP.EU.CITRIX.COM" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756298AbeAHLQ1 (ORCPT ); Mon, 8 Jan 2018 06:16:27 -0500 X-IronPort-AV: E=Sophos;i="5.46,330,1511827200"; d="scan'208";a="65729102" Subject: Re: [PATCH v6 00/10] Retpoline: Avoid speculative indirect calls in kernel To: Paul Turner , David Woodhouse CC: Andi Kleen , LKML , Linus Torvalds , Greg Kroah-Hartman , Tim Chen , Dave Hansen , Thomas Gleixner , Kees Cook , Rik van Riel , Peter Zijlstra , Andy Lutomirski , Jiri Kosina , One Thousand Gnomes References: <1515363085-4219-1-git-send-email-dwmw@amazon.co.uk> From: Andrew Cooper Message-ID: <37ca6fc5-78d5-5750-051f-a712343d4a8f@citrix.com> Date: Mon, 8 Jan 2018 11:16:24 +0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.5.2 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 8bit Content-Language: en-GB X-ClientProxiedBy: AMSPEX02CAS01.citrite.net (10.69.22.112) To AMSPEX02CL02.citrite.net (10.69.22.126) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: On 08/01/18 10:42, Paul Turner wrote: > A sequence for efficiently refilling the RSB is: > mov $8, %rax; > .align 16; > 3: call 4f; > 3p: pause; call 3p; > .align 16; > 4: call 5f; > 4p: pause; call 4p; > .align 16; > 5: dec %rax; > jnz 3b; > add $(16*8), %rsp; > This implementation uses 8 loops, with 2 calls per iteration. This is > marginally faster than a single call per iteration. We did not > observe useful benefit (particularly relative to text size) from > further unrolling. This may also be usefully split into smaller (e.g. > 4 or 8 call) segments where we can usefully pipeline/intermix with > other operations. It includes retpoline type traps so that if an > entry is consumed, it cannot lead to controlled speculation. On my > test system it took ~43 cycles on average. Note that non-zero > displacement calls should be used as these may be optimized to not > interact with the RSB due to their use in fetching RIP for 32-bit > relocations. Guidance from both Intel and AMD still states that 32 calls are required in general.  Is your above code optimised for a specific processor which you know the RSB to be smaller on? ~Andrew