Received: by 2002:ac0:a594:0:0:0:0:0 with SMTP id m20-v6csp1141651imm; Tue, 15 May 2018 14:32:11 -0700 (PDT) X-Google-Smtp-Source: AB8JxZqWNePwKyp+6tjPjK8BIRnB/M+wvMVPA0CckSxvvLlgqwhYit2eyRNTvqb/YJvtSJjXmx5u X-Received: by 2002:a17:902:b60c:: with SMTP id b12-v6mr16486789pls.44.1526419931364; Tue, 15 May 2018 14:32:11 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1526419931; cv=none; d=google.com; s=arc-20160816; b=VP00viGyeYztsUUcPiWHqSFDaTlS+e5cUDFOHTB9CzkkU1pKqa8eA8HlEJ/ktlvGmA 9O2NoBibrE/cUxEoI6gpqe8MG6eCjgkqrRg1ZDVUV7kse8Wn2H2n1HPA8lAdRPHOtAyi 3yUGAkTxyh3TqKGOg+QLSbR+5Eq3NJzZ1zxQAzEi855uOCtzcJtepd6/2f0FxOqpEZfR 2Y6zYYecEl/LqhrQg+JMROFeCb0KC1KPa5UTSTiSu3yyImH+59bIRq575J8LftF1cJJI 1aYspQOq/DeWMrAtOYDgrDWC5Jj4qHEoWurwdWi0Cpqll2xRd8KIwOm77cBVFomvk7EO OzRA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:message-id:date:subject:cc :to:from:arc-authentication-results; bh=Yq48PaL8UwNTd0ii7XFMsonClO4mER5ULPT4PMvfzcY=; b=YelG65hME+7w0eWs3mMrWR3W2bSAUMvodl112yeMD4uB7+a+2Sh0xGtQM2ILJEnxaK rHBpWAadDHaqE02fUorupYxdo7e2G79fdPVYx7THUgUHP6Z3zPyo3LANQsmwBWRZGlGH d2TngjCbkGaZzyY8a5vao55QUBXUoI9DJQx5StmBkB3PSo2gajoMM7KgZhFDCpO+WnJz /Y7X71NuvZ++3zEuYaEyijD8IrcueAr74rhJ/ANbB8jWk19KX/701wCvpipMLzZ0ucJK 2N6e2cIqjfkjI6UHJsYIv37V7VCg++mF5yYbK00gmDgYgw/tL9KTQPPZi21q+KVyUzKU M/uQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f30-v6si888816plj.600.2018.05.15.14.31.56; Tue, 15 May 2018 14:32:11 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752619AbeEOVaB (ORCPT + 99 others); Tue, 15 May 2018 17:30:01 -0400 Received: from ex13-edg-ou-002.vmware.com ([208.91.0.190]:54610 "EHLO EX13-EDG-OU-002.vmware.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752412AbeEOV0M (ORCPT ); Tue, 15 May 2018 17:26:12 -0400 Received: from sc9-mailhost2.vmware.com (10.113.161.72) by EX13-EDG-OU-002.vmware.com (10.113.208.156) with Microsoft SMTP Server id 15.0.1156.6; Tue, 15 May 2018 14:26:07 -0700 Received: from sc2-haas01-esx0118.eng.vmware.com (sc2-haas01-esx0118.eng.vmware.com [10.172.44.118]) by sc9-mailhost2.vmware.com (Postfix) with ESMTP id A1E23B0792; Tue, 15 May 2018 14:26:10 -0700 (PDT) From: Nadav Amit To: CC: , Nadav Amit , Alok Kataria , Christopher Li , "H. Peter Anvin" , Ingo Molnar , Jan Beulich , Jonathan Corbet , Josh Poimboeuf , Juergen Gross , Kees Cook , , Peter Zijlstra , Randy Dunlap , Thomas Gleixner , , Subject: [RFC 0/8] Improving compiler inlining decisions Date: Tue, 15 May 2018 07:11:07 -0700 Message-ID: <20180515141124.84254-1-namit@vmware.com> X-Mailer: git-send-email 2.17.0 MIME-Version: 1.0 Content-Type: text/plain Received-SPF: None (EX13-EDG-OU-002.vmware.com: namit@vmware.com does not designate permitted sender hosts) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This patch-set deals with an interesting yet stupid problem: code that does not get inlined despite its simplicity. I find 5 classes of causes: 1. Inline assembly blocks in which code and data are added to alternative sections. The compiler is oblivious to the content of the blocks and assumes their cost in space and time is proportional to the number of the perceived assembly "instruction", according to the number of newlines and semicolons. Alternatives, paravirt and other mechanisms are affected. 2. Inline assembly with redundant new-lines and semicolons. Similarly to (1) this code is considered "heavier" than it actually is. 3. Code with constant value optimizations. Quite a few parts of the kernel check whether a variable is constant (using __builtin_constant_p()) and perform heavy computations in that case. These computations are eventually optimized out so they do not land in the binary. However, the cost of these computations is also associated with the calling function, which might prevent inlining of the calling function. ilog2() is an example for such case. 4. Code that is marked with the "cold" attribute, including all the __init functions. Some may consider it the desired behavior. 5. Code that is marked with a different optimization levels. This affects for example vmx_vcpu_run(), inducing overheads of up to 10% on exit. This patch-set deals with some instances of first 3 classes. For (1) we insert an assembly macro, and call it from the inline assembly block. As a result, the compiler sees a single "instruction" and assigns the more appropriate cost to the code. For (2) the solution is trivial: just remove the newlines. (3) is somewhat tricky. The proposed solution is to use __builtin_choose_expr() to check whether a variable is actually constant instead of using an if-condition or the C ternary operator. __builtin_choose_expr() is evaluated earlier in the compilation, so it allows the compiler to associate the right cost for the variable case before the inlining decisions take place. So far so good. Still, there is a drawback. Since __builtin_choose_expr() is evaluated earlier, it can fail to recognize constants, which an if-condition would recognize correctly. As a result, this patch-set only applies it to the simplest cases. Overall this patch-set slightly increases the kernel size (my build was done using localmodconfig + localyesconfig for the record): text data bss dec hex filename 18126699 10066728 2936832 31130259 1db0293 ./vmlinux before 18149210 10064048 2936832 31150090 1db500a ./vmlinux after (+0.06%) The patch-set eliminates many of the static text symbols: Before: 40033 After: 39632 (-10%) There is a measurable effect on performance in some cases. A loop of MADV_DONTNEED/page-fault shows a 2% performance improvement with this patch-set. Some inline comments or self-explaining C macros might still be needed. [1] https://lkml.org/lkml/2018/5/5/159 Cc: Alok Kataria Cc: Christopher Li Cc: "H. Peter Anvin" Cc: Ingo Molnar Cc: Jan Beulich Cc: Jonathan Corbet Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Kees Cook Cc: linux-sparse@vger.kernel.org Cc: Peter Zijlstra Cc: Randy Dunlap Cc: Thomas Gleixner Cc: virtualization@lists.linux-foundation.org Cc: x86@kernel.org Nadav Amit (8): x86: objtool: use asm macro for better compiler decisions x86: bug: prevent gcc distortions x86: alternative: macrofy locks for better inlining x86: prevent inline distortion by paravirt ops x86: refcount: prevent gcc distortions x86: removing unneeded new-lines ilog2: preventing compiler distortion due to big condition bitops: prevent compiler inline decision distortion arch/x86/include/asm/alternative.h | 28 ++++++++++---- arch/x86/include/asm/asm.h | 4 +- arch/x86/include/asm/bitops.h | 8 ++-- arch/x86/include/asm/bug.h | 48 ++++++++++++++--------- arch/x86/include/asm/cmpxchg.h | 10 ++--- arch/x86/include/asm/paravirt_types.h | 53 +++++++++++++++----------- arch/x86/include/asm/refcount.h | 55 ++++++++++++++++----------- arch/x86/include/asm/special_insns.h | 12 +++--- include/linux/compiler.h | 29 ++++++++++---- include/linux/log2.h | 11 +++--- 10 files changed, 156 insertions(+), 102 deletions(-) -- 2.17.0