Received: by 2002:ac0:a594:0:0:0:0:0 with SMTP id m20-v6csp1137655imm; Tue, 15 May 2018 14:28:03 -0700 (PDT) X-Google-Smtp-Source: AB8JxZorK2mfAUf4ulzS5biTrE/RFK/1hkaGUqbq6ICX+SxyJlyk3m79jZL47n4PA8tbtp7Mvubv X-Received: by 2002:a62:8dc9:: with SMTP id p70-v6mr16825710pfk.72.1526419683051; Tue, 15 May 2018 14:28:03 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1526419683; cv=none; d=google.com; s=arc-20160816; b=x8g1mjL9i7hQU5QyZ0S/2YI1P1s86r1outhNFoRXHpjMacVsR+oUJJZBJyMvGb15tg EaCeIXFuwBMvLS5b7746n96q1mhAeE9XyglljlIyzRYdsVmlNRC+h1rpHPwvZobX8A5R sWk8EivFbh4r/tw4h7r6WSklLfgBz7+blYdPeMCXTaXim2FiUhxMVnSL2kMTgCjLxTXc 67UWtwktNfd0DSv85bT82gYtAeXt6KZvi6H2oU3tPe89H5XevzatUbm5sCdL2VIMc6Ck AAuj+hxJv/IQwWuGXO/FmpyvMdkgAulob5/LltAGWxeh7BNZHVeUGCCoLJ33y1AJjrAI JgwA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:arc-authentication-results; bh=Yq48PaL8UwNTd0ii7XFMsonClO4mER5ULPT4PMvfzcY=; b=mC0AtGo8nOw+SiPIrRENAY58RJJl4LXb9LGCsJUr4y0cPppkV7GOZDdby+c5CoGbdl 8GNodp29azvBJCKAUhykQGiDKiOwRpmOYSy68Qi8mnCUdQhF+pRcUK/5gkCCAOYI8Ie4 qWUuNMY6k+SBwBg1OiD3hmZvD7uWYBAGKVODOwqzQ4jOnO8EPjw97tARHwdTfMq3zdwp 4w4J55d+BKeFcNdSTNzqinHfdLo5svkExmjmsC9nLfBPRBEesbfc84ax1gCigJNWljLy vCw6HzYZpgVzKdj06HnIRI0tQ5VN2kWPqALpriT/YZRLmRD8EaYHv96nu45V/fZhENU9 A06w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id p11-v6si764557pgn.348.2018.05.15.14.27.48; Tue, 15 May 2018 14:28:02 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752870AbeEOV0R (ORCPT + 99 others); Tue, 15 May 2018 17:26:17 -0400 Received: from ex13-edg-ou-001.vmware.com ([208.91.0.189]:48538 "EHLO EX13-EDG-OU-001.vmware.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752030AbeEOV0P (ORCPT ); Tue, 15 May 2018 17:26:15 -0400 Received: from sc9-mailhost2.vmware.com (10.113.161.72) by EX13-EDG-OU-001.vmware.com (10.113.208.155) with Microsoft SMTP Server id 15.0.1156.6; Tue, 15 May 2018 14:25:46 -0700 Received: from sc2-haas01-esx0118.eng.vmware.com (sc2-haas01-esx0118.eng.vmware.com [10.172.44.118]) by sc9-mailhost2.vmware.com (Postfix) with ESMTP id 1F6CEB0795; Tue, 15 May 2018 14:26:11 -0700 (PDT) From: Nadav Amit To: CC: , Nadav Amit , Alok Kataria , Christopher Li , "H. Peter Anvin" , Ingo Molnar , Jan Beulich , Jonathan Corbet , Josh Poimboeuf , Juergen Gross , Kees Cook , , Peter Zijlstra , Randy Dunlap , Thomas Gleixner , , Subject: [RFC 0/8] Improving compiler inlining decisions Date: Tue, 15 May 2018 07:11:16 -0700 Message-ID: <20180515141124.84254-10-namit@vmware.com> X-Mailer: git-send-email 2.17.0 In-Reply-To: <20180515141124.84254-1-namit@vmware.com> References: <20180515141124.84254-1-namit@vmware.com> MIME-Version: 1.0 Content-Type: text/plain Received-SPF: None (EX13-EDG-OU-001.vmware.com: namit@vmware.com does not designate permitted sender hosts) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This patch-set deals with an interesting yet stupid problem: code that does not get inlined despite its simplicity. I find 5 classes of causes: 1. Inline assembly blocks in which code and data are added to alternative sections. The compiler is oblivious to the content of the blocks and assumes their cost in space and time is proportional to the number of the perceived assembly "instruction", according to the number of newlines and semicolons. Alternatives, paravirt and other mechanisms are affected. 2. Inline assembly with redundant new-lines and semicolons. Similarly to (1) this code is considered "heavier" than it actually is. 3. Code with constant value optimizations. Quite a few parts of the kernel check whether a variable is constant (using __builtin_constant_p()) and perform heavy computations in that case. These computations are eventually optimized out so they do not land in the binary. However, the cost of these computations is also associated with the calling function, which might prevent inlining of the calling function. ilog2() is an example for such case. 4. Code that is marked with the "cold" attribute, including all the __init functions. Some may consider it the desired behavior. 5. Code that is marked with a different optimization levels. This affects for example vmx_vcpu_run(), inducing overheads of up to 10% on exit. This patch-set deals with some instances of first 3 classes. For (1) we insert an assembly macro, and call it from the inline assembly block. As a result, the compiler sees a single "instruction" and assigns the more appropriate cost to the code. For (2) the solution is trivial: just remove the newlines. (3) is somewhat tricky. The proposed solution is to use __builtin_choose_expr() to check whether a variable is actually constant instead of using an if-condition or the C ternary operator. __builtin_choose_expr() is evaluated earlier in the compilation, so it allows the compiler to associate the right cost for the variable case before the inlining decisions take place. So far so good. Still, there is a drawback. Since __builtin_choose_expr() is evaluated earlier, it can fail to recognize constants, which an if-condition would recognize correctly. As a result, this patch-set only applies it to the simplest cases. Overall this patch-set slightly increases the kernel size (my build was done using localmodconfig + localyesconfig for the record): text data bss dec hex filename 18126699 10066728 2936832 31130259 1db0293 ./vmlinux before 18149210 10064048 2936832 31150090 1db500a ./vmlinux after (+0.06%) The patch-set eliminates many of the static text symbols: Before: 40033 After: 39632 (-10%) There is a measurable effect on performance in some cases. A loop of MADV_DONTNEED/page-fault shows a 2% performance improvement with this patch-set. Some inline comments or self-explaining C macros might still be needed. [1] https://lkml.org/lkml/2018/5/5/159 Cc: Alok Kataria Cc: Christopher Li Cc: "H. Peter Anvin" Cc: Ingo Molnar Cc: Jan Beulich Cc: Jonathan Corbet Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Kees Cook Cc: linux-sparse@vger.kernel.org Cc: Peter Zijlstra Cc: Randy Dunlap Cc: Thomas Gleixner Cc: virtualization@lists.linux-foundation.org Cc: x86@kernel.org Nadav Amit (8): x86: objtool: use asm macro for better compiler decisions x86: bug: prevent gcc distortions x86: alternative: macrofy locks for better inlining x86: prevent inline distortion by paravirt ops x86: refcount: prevent gcc distortions x86: removing unneeded new-lines ilog2: preventing compiler distortion due to big condition bitops: prevent compiler inline decision distortion arch/x86/include/asm/alternative.h | 28 ++++++++++---- arch/x86/include/asm/asm.h | 4 +- arch/x86/include/asm/bitops.h | 8 ++-- arch/x86/include/asm/bug.h | 48 ++++++++++++++--------- arch/x86/include/asm/cmpxchg.h | 10 ++--- arch/x86/include/asm/paravirt_types.h | 53 +++++++++++++++----------- arch/x86/include/asm/refcount.h | 55 ++++++++++++++++----------- arch/x86/include/asm/special_insns.h | 12 +++--- include/linux/compiler.h | 29 ++++++++++---- include/linux/log2.h | 11 +++--- 10 files changed, 156 insertions(+), 102 deletions(-) -- 2.17.0