Received: by 10.223.185.116 with SMTP id b49csp510520wrg; Wed, 14 Feb 2018 02:36:42 -0800 (PST) X-Google-Smtp-Source: AH8x227NQm6DwNsOs4p8eNTo2uXp2gOkUs09R50/MyN5ZrCsQ+F+dUM534K57ytaj2NSb3p7PXIy X-Received: by 2002:a17:902:6083:: with SMTP id s3-v6mr3979360plj.6.1518604602866; Wed, 14 Feb 2018 02:36:42 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1518604602; cv=none; d=google.com; s=arc-20160816; b=olWoA7iBDR3XCmPGfMZ0OdBri8EesU4obiNhACY+G+LCzX/gOSQrBkN77quXSf4k+S Xa921gjBlBCcMdGURTClrPTDiAcUgYfNCPPKvK5gnzKeZRhY55nhVNmdnl3jv/q8ePxR LHb0vioe4anpYU7QUKs0Q3F2s/UZ/bZu5tnRuq975lKRq5YcBxIvRRhRTEAbBW6Kv2Dy LG+cQ2yNfaOfFjdiXMuAtue+pwyoqMd4HNsEWjYnsariPoH5KYW2RrAgWfQEEsG2JaU/ m4ZMlyRQrcmJkZDMT2gGvjCgEAJkkTmpRc16JEFsZecqvhtje5LhoWhFqriNgA8KcVcw 2h0Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature:arc-authentication-results; bh=YNx6Byf2KH+r/waAN9iGc4paKPuecGNeASAv1LdJF4c=; b=wRv8sC8g80UKX95KgNAXDsrz4PwLzV5wPw5QDvp9r5nImpMlxPbQM4kXAr2FzoLdBc LznLE9HHweszfc6t++GhtBR7H4RhnJBur3FHNuSDKt/5EOodkAzxxjgLQ9VKEPiKLNhW JFux9/tOrRzFX9h7f/GHxjqF1vjbiK07DCp2AT35pKmMbFTWR5GGd0waO3O+Pq3sSv7D H5xlpQCQ5CinIa1BSJWDy2lbeSAZmXIeRfwNPiMqXPwzArdSX683vrtUxjVmL2CS6Dwi 8eWCMWMpKQBLNtv6s3KUG6ik2q+IJwZk2f49BbszLGNtvBShXTXWa7PQrrQfPVIn5roy jORg== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@infradead.org header.s=bombadil.20170209 header.b=PLr6Jl0z; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id c5si23574pfn.386.2018.02.14.02.36.28; Wed, 14 Feb 2018 02:36:42 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail header.i=@infradead.org header.s=bombadil.20170209 header.b=PLr6Jl0z; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S967142AbeBNKfq (ORCPT + 99 others); Wed, 14 Feb 2018 05:35:46 -0500 Received: from bombadil.infradead.org ([198.137.202.133]:50546 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S966996AbeBNKfo (ORCPT ); Wed, 14 Feb 2018 05:35:44 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20170209; h=In-Reply-To:Content-Type:MIME-Version :References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=YNx6Byf2KH+r/waAN9iGc4paKPuecGNeASAv1LdJF4c=; b=PLr6Jl0zCJHKX1z3/2mr0Zy9E z881/XdMBWqSNU+/j1p6BsmNUfRRUPeJulIHeF5bPuUP0+KCi5KNcJzUC7hB3GGFpUEc061LHBxie Lsb3rXuG1DPlYvsjgeXwl9p8G7fzNdi2rqTd9Yw1LCvOzp10OIaT4jslmh7xo6hOIV4nQ0L4BpzeU sJEvqHfFQD9z4ILNy0dxgvCI/2a2IqnJnxDDIgCzelaTrIWA7pnNGQX5SGrvLpdVPCuJy9Hau5YqS ZVe0GdX7suBgjCW06c/TnwZoLwkzfrfwmT8bolBZCxNmqPTGBzIXjXBX0Hk/CsXD8CjXHBhVA0anT Clf88sB/g==; Received: from j217100.upc-j.chello.nl ([24.132.217.100] helo=hirez.programming.kicks-ass.net) by bombadil.infradead.org with esmtpsa (Exim 4.89 #1 (Red Hat Linux)) id 1eluOa-0004wC-F7; Wed, 14 Feb 2018 10:34:37 +0000 Received: by hirez.programming.kicks-ass.net (Postfix, from userid 1000) id C2CAD201FB4F7; Wed, 14 Feb 2018 11:34:34 +0100 (CET) Date: Wed, 14 Feb 2018 11:34:34 +0100 From: Peter Zijlstra To: "Yatsina, Marina" Cc: Kees Cook , David Woodhouse , Chandler Carruth , "Kreitzer, David L" , "Grischenko, Andrei L" , "rnk@google.com" , LLVM Developers , "ehsan@mozilla.com" , "Tayree, Coby" , Matthias Braun , Dean Michael Berris , James Y Knight , Guenter Roeck , X86 ML , LKML , Alan Cox , Rik van Riel , Andi Kleen , Josh Poimboeuf , Tom Lendacky , Linus Torvalds , Jiri Kosina , Andy Lutomirski , "Hansen, Dave" , Tim Chen , Greg Kroah-Hartman , Paul Turner , Stephen Hines , Nick Desaulniers , Will Deacon Subject: Re: clang asm-goto support (Was Re: [PATCH v2] x86/retpoline: Add clang support) Message-ID: <20180214103434.GY25181@hirez.programming.kicks-ass.net> References: <20180214090851.GU25181@hirez.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.9.2 (2017-12-15) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Feb 14, 2018 at 09:52:59AM +0000, Yatsina, Marina wrote: > Hi Peter, > > When I started the original thread last year I was in favor of adding > "asm goto" and didn't understand why it wasn't done by that time. The > feedback I got is that this feature (optimizing tracepoints) is very > useful and that we do want it in llvm, but perhaps there's a cleaner > way of implementing than "asm goto". An alternative suggestion arose > as well. So it's far more than just tracepoints. We use it all over the kernel to do runtime branch patching. One example is avoiding the scheduler preemption callbacks if we know there are no users. This shaves a few % off a context switch micro-bench. But it is really _all_ over the place. > I'm sure you can provide a lot of background for the decisions of why > "asm goto" was chosen and which other alternatives were considered, as > you were the one to implement this. I have very little memories from back then, but it was mostly us asking for label addresses in asm and them giving us asm-goto. Using asm we can build our own primitives, and I realize the llvm community doesn't like asm much, but then again, we treat C like a glorified assembler and don't like our compilers too smart :-) > Anyway, I think we should consider the alternatives and not take "asm > goto" as a given. Far too late for that, 7+ years ago when we did this was the time to talk about alternatives, now we have this code base. So we have the two jump_label things: static __always_inline bool arch_static_branch(struct static_key *key, bool branch) { asm_volatile_goto("1:" ".byte " __stringify(STATIC_KEY_INIT_NOP) "\n\t" ".pushsection __jump_table, \"aw\" \n\t" _ASM_ALIGN "\n\t" _ASM_PTR "1b, %l[l_yes], %c0 + %c1 \n\t" ".popsection \n\t" : : "i" (key), "i" (branch) : : l_yes); return false; l_yes: return true; } static __always_inline bool arch_static_branch_jump(struct static_key *key, bool branch) { asm_volatile_goto("1:" ".byte 0xe9\n\t .long %l[l_yes] - 2f\n\t" "2:\n\t" ".pushsection __jump_table, \"aw\" \n\t" _ASM_ALIGN "\n\t" _ASM_PTR "1b, %l[l_yes], %c0 + %c1 \n\t" ".popsection \n\t" : : "i" (key), "i" (branch) : : l_yes); return false; l_yes: return true; } Where we emit either a 5 byte jump or a 5 byte nop and write a special section with meta-data for the branch point. You could possibly capture all that with a built-in, but would have to exactly match our meta-data section and then we'd still be up some creek without no paddle when we need to change it. But we also have: static __always_inline __pure bool _static_cpu_has(u16 bit) { asm_volatile_goto("1: jmp 6f\n" "2:\n" ".skip -(((5f-4f) - (2b-1b)) > 0) * " "((5f-4f) - (2b-1b)),0x90\n" "3:\n" ".section .altinstructions,\"a\"\n" " .long 1b - .\n" /* src offset */ " .long 4f - .\n" /* repl offset */ " .word %P[always]\n" /* always replace */ " .byte 3b - 1b\n" /* src len */ " .byte 5f - 4f\n" /* repl len */ " .byte 3b - 2b\n" /* pad len */ ".previous\n" ".section .altinstr_replacement,\"ax\"\n" "4: jmp %l[t_no]\n" "5:\n" ".previous\n" ".section .altinstructions,\"a\"\n" " .long 1b - .\n" /* src offset */ " .long 0\n" /* no replacement */ " .word %P[feature]\n" /* feature bit */ " .byte 3b - 1b\n" /* src len */ " .byte 0\n" /* repl len */ " .byte 0\n" /* pad len */ ".previous\n" ".section .altinstr_aux,\"ax\"\n" "6:\n" " testb %[bitnum],%[cap_byte]\n" " jnz %l[t_yes]\n" " jmp %l[t_no]\n" ".previous\n" : : [feature] "i" (bit), [always] "i" (X86_FEATURE_ALWAYS), [bitnum] "i" (1 << (bit & 7)), [cap_byte] "m" (((const char *)boot_cpu_data.x86_capability)[bit >> 3]) : : t_yes, t_no); t_yes: return true; t_no: return false; } Which does something similar, but with a completely different meta-data section and a different pre-patch fallback path. But we also do things like: #define __GEN_RMWcc(fullop, var, cc, clobbers, ...) \ do { \ asm_volatile_goto (fullop "; j" #cc " %l[cc_label]" \ : : [counter] "m" (var), ## __VA_ARGS__ \ : clobbers : cc_label); \ return 0; \ cc_label: \ return 1; \ } while (0) #define GEN_UNARY_RMWcc(op, var, arg0, cc) \ __GEN_RMWcc(op " " arg0, var, cc, __CLOBBERS_MEM) static __always_inline bool atomic_dec_and_test(atomic_t *v) { GEN_UNARY_RMWcc(LOCK_PREFIX "decl", v->counter, "%0", e); } In order to not generate crap asm with SETcc + TEST. Of course, the last is superceded with asm-cc-output, which you _also_ don't support. And I know you're going to tell me you guys would prefer it if we switched to intrinsics for atomics, but then I'd have to tell you that the C11 memory model doesn't match the Linux Kernel memory model [*], another result of being late to the game. Also, we still support compilers from before that. So no, you're not going to give us something different. [*] http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0124r4.html So ideally, the compilers would actually commit to also implementing the linux-kernel memory model, otherwise we'll be fighting the compiler (like we have been for a while now) for even longer. Esp. with LTO we run a real risk of the compiler doing BAD things to our code.