Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751741AbdIUQSQ (ORCPT ); Thu, 21 Sep 2017 12:18:16 -0400 Received: from mx1.redhat.com ([209.132.183.28]:58680 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750790AbdIUQSO (ORCPT ); Thu, 21 Sep 2017 12:18:14 -0400 DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 4C8197EA97 Authentication-Results: ext-mx04.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx04.extmail.prod.ext.phx2.redhat.com; spf=fail smtp.mailfrom=jpoimboe@redhat.com Date: Thu, 21 Sep 2017 11:18:12 -0500 From: Josh Poimboeuf To: Ingo Molnar Cc: "H. Peter Anvin" , x86@kernel.org, linux-kernel@vger.kernel.org, Thomas Gleixner , Andy Lutomirski , Linus Torvalds , Alexander Potapenko , Dmitriy Vyukov , Matthias Kaehlcke , Arnd Bergmann , Peter Zijlstra , Andrey Ryabinin Subject: Re: [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang Message-ID: <20170921161812.lfwppclz3tpadg7o@treble> References: <31e96e6bcfcb47725e15a093b9c31660dfaad430.1505846562.git.jpoimboe@redhat.com> <20170920175125.2quikhfdy2okubsw@treble> <20170921153511.isc4oh4nmh6q37hh@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20170921153511.isc4oh4nmh6q37hh@gmail.com> User-Agent: Mutt/1.6.0.1 (2016-04-01) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.28]); Thu, 21 Sep 2017 16:18:14 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3327 Lines: 76 On Thu, Sep 21, 2017 at 05:35:11PM +0200, Ingo Molnar wrote: > > * Josh Poimboeuf wrote: > > > On Wed, Sep 20, 2017 at 10:32:43AM -0700, H. Peter Anvin wrote: > > > On 09/19/17 11:45, Josh Poimboeuf wrote: > > > > For inline asm statements which have a CALL instruction, we list the > > > > stack pointer as a constraint to convince GCC to ensure the frame > > > > pointer is set up first: > > > > > > > > static inline void foo() > > > > { > > > > register void *__sp asm(_ASM_SP); > > > > asm("call bar" : "+r" (__sp)) > > > > } > > > > > > > > Unfortunately, that pattern causes clang to corrupt the stack pointer. > > > > > > > > There's actually an easier way to achieve the same goal in GCC, without > > > > causing trouble for clang. If we declare the stack pointer register > > > > variable as a global variable, and remove the constraint altogether, > > > > that convinces GCC to always set up the frame pointer before inserting > > > > *any* inline asm. > > > > > > > > It basically acts as if *every* inline asm statement has a CALL > > > > instruction. It's a bit overkill, but the performance impact should be > > > > negligible. > > > > > > > > > > Again, probably negligible, but why do we need a frame pointer just > > > because we have a call assembly instruction? > > > > It's frame pointer convention. Without it, if dumping the stack from > > the called function, a function will get skipped in the stack trace. > > BTW., could we perhaps relax this and simply phase out the frame pointer on x86, > and simplify all our assembly in a cycle or two? ORC unwinder is working out very > well so far. Live kernel patching can use ORC data just fine, and nothing else > actually relies on frame pointers, right? > > That would give one more register to assembly code. > > I realize that we just rewrote a whole bunch of assembly code... but that was the > price for ORC, in a way. I think ORC isn't *quite* ready for livepatch yet, though it's close. We could probably make it ready in a cycle or two. However, I'm not sure whether we would want to remove livepatch support for frame pointers yet: - There might be some embedded livepatch users who can't deal with the extra 3MB of RAM needed by ORC. (Though this is purely theoretical, I don't actually know anybody with this problem. I suppose we could float the idea on the livepatch/kpatch mailing lists and see if anybody objects.) - Objtool's ORC generation is working fine now, but objtool maintenance is currently a little heavy-handed, and I'm currently the only person who knows how to do it. I've got an idea brewing for how to improve its maintainability with the help of compiler plugins, but until then, if I get hit by a bus tomorrow, who's going to pick it up? It's nice to have frame pointers as a backup solution for live patching. But also, is this a problem that's even worth fixing? Do we have any assembly code that would be noticeably better off with that extra register? Most of the changes were: - adding FRAME_BEGIN/FRAME_END to some asm functions; - juggling register usage in a few crypto functions; and - adding the stack pointer constraint to 15 or so inline asm functions. I think those changes weren't all that disruptive to start with. -- Josh