Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1750995AbbDLHlb (ORCPT ); Sun, 12 Apr 2015 03:41:31 -0400 Received: from mail-wg0-f42.google.com ([74.125.82.42]:32987 "EHLO mail-wg0-f42.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750717AbbDLHl3 (ORCPT ); Sun, 12 Apr 2015 03:41:29 -0400 Date: Sun, 12 Apr 2015 09:41:24 +0200 From: Ingo Molnar To: Linus Torvalds Cc: Thomas Gleixner , Jakub Jelinek , Denys Vlasenko , Borislav Petkov , Tim Chen , Andy Lutomirski , Jason Low , Brian Gerst , Aswin Chandramouleeswaran , "Paul E. McKenney" , Davidlohr Bueso , Peter Zijlstra , "H. Peter Anvin" , LKML , Peter Zijlstra , Richard Henderson Subject: Re: [PATCH] x86: Turn off GCC branch probability heuristics Message-ID: <20150412074123.GB9062@gmail.com> References: <20150410090051.GA28549@gmail.com> <20150410091252.GA27630@gmail.com> <20150410092152.GA21332@gmail.com> <20150410111427.GA30477@gmail.com> <20150410112748.GB30477@gmail.com> <20150410120846.GA17101@gmail.com> <20150411092021.GA9478@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 7930 Lines: 206 * Linus Torvalds wrote: > On Sat, Apr 11, 2015 at 11:57 AM, Thomas Gleixner wrote: > > > > I thinks its just the no-guess one: > > > > text data dec patch reduction > > 7563475 1781048 10302987 > > 7192973 1780024 9931461 no-guess -4.8% > > 7354819 1781048 958464 align-1 -2.7% > > 7192973 1780024 9931461 no-guess + align-1 -4.8% > > Yeah, a 5% code expansion is a big deal. Sadly, it looks like > 'no-guess' also disables our explicit likely/unlikely handling. > > Damn. If it actually honored likely/unlikely, then we should just do > it - and manually fix up any places where we really care. > > But the fact that it apparently entirely disables not just the > guesses, but our *explicit* likely/unlikely, means that we can't fix > up the mistakes. > > And in many of the hot codepaths that likely/unlikely really does > matter. Some of our hottest paths have known "this basically never > happens" situations that we do *not* want to break up our L1 I$ over. > There's a number of functions that have been optimized to really > generate good code, and "-fno-guess-branch-probability" disables those > manual optimizations. > > So we'd have no way to fix it for the cases that matter. > > Sad. > > It might be worth bringing this up with some gcc people. I added Jakub > to the cc. Any other gcc people suggestions? (Skip to the 'More numbers' section below to see more measurements.) So what would be nice to have is if GCC had an optimization option to disable all branch probability heuristics (like -fno-guess-branch-probability), except explicit __builtin_expect() hints (which -fno-guess-branch-probability unfortunately disables). So what would be useful to have is a -fno-branch-probability-heuristics GCC option or so, or a "--param branch-probability-heuristics=0" option. I found one related option: --param predictable-branch-outcome=N predictable-branch-outcome When branch is predicted to be taken with probability lower than this threshold (in percent), then it is considered well predictable. The default is 10. I tried the values 1 and 0, on a -O2 kernel (i.e. guess-branch-probability was enabled), in the hope that it maybe turns off all non-__builtin_expect() branch heuristics, but it only had very minor size impact in the 0.001% range. More numbers: ------------- I also measured the effect of our __builtin_expect() (likely()/unlikely()) branch annotations. As an experiment I mapped both likely()/unlikely() to the four possible __builtin_expect() probability settings: likely(): __builtin_expect(, 1) unlikely(): __builtin_expect(, 0) likely(): __builtin_expect(, 0) unlikely(): __builtin_expect(, 1) likely(): __builtin_expect(, 0) unlikely(): __builtin_expect(, 0) likely(): __builtin_expect(, 1) unlikely(): __builtin_expect(, 1) (the first mapping is the only one that makes sense, and it is the one that is used by the kernel currently.) The goal of mixing up the probability mappings was to measure the full 'range' of code size impact that the kernel's explicit hints are causing, versus the impact that GCC's own branch probability heuristics are causing: text data bss dec filename 12566383 1617840 1089536 15273759 vmlinux.expect=10 [==vanilla] 12460250 1617840 1089536 15167626 vmlinux.expect=01 12563332 1617840 1089536 15270708 vmlinux.expect=00 12463035 1617840 1089536 15170411 vmlinux.expect=11 12533382 1617840 1089536 15240758 vmlinux.no-expect 11923529 1617840 1089536 14630905 vmlinux.-fno-guess-branch-probability [ This was done on a v4.1-rc7-ish vanilla -O2 kernel (no alignment tweaks), using 'make defconfig' and 'make kvmconfig' on x86-64 to turn it into a minimally bootable kernel. I used GCC 4.9.1. ] the 'vmlinux.none' kernel is a vanilla kernel with all __builtin_expect() hints removed: i.e. GCC heuristics are deciding all branch probabilities in the kernel. So the code size 'range' that the kernel's own probability hints are moving in is around 0.4%: - the 'vanilla' mapping is the largest (not unexpectedly) - the 'inverse' mapping is the smallest, by 0.8% - the 00 and 11 mappings are about mid range, 0.4% away from the extremes - the 'none' mapping is also mid-range, which too is somewhat expected: GCC heuristics would pick only part of our hints. But note how the 'no GCC heuristics at all' setting reduces size brutally, by 5.4%. So if we were able to do that, while keeping __builtin_expect(), and used our own heuristics only via __builtin_expect(), then we could still possibly expect a total code size shrinkage of around 5.0%: -5.4% code size reduction that comes from removal of all hints, +0.4% code size increase between the 'none' and the '10' mappings in the experiment above. Note that even if I apply all the align=1 patches, -fno-guess-branch-probability on top of that still gives me 2.6% code size savings: text data bss dec filename 12566383 1617840 1089536 15273759 vmlinux.expect=10 [==vanilla] 11923529 1617840 1089536 14630905 vmlinux.-fno-guess-branch-probability 11903663 1617840 1089536 14611039 vmlinux.align=1 11646102 1617840 1089536 14353478 vmlinux.align=1+fno-guess-branch-probability The smallest vmlinux has: - about 41,000 functions ('tT' symbols in System.map) - about 300,000 branch/jump instructions (all objdump -d asm mnemonics starting with 'j') - about 165,000 function calls (all objdump -d asm mnemonics matching 'call') - about 2,330,000 instructions (all objdump -d asm mnemonics) With align=1, GCC's heuristics added about 1,200 new branches and 1,350 new function calls, 76,900 instructions, altogether +257,561 bytes of code: # of x86 instructions 2549742 vmlinux.expect=10 [==vanilla] 2391069 vmlinux.-fno-guess-branch-probability 2411568 vmlinux.align=1 2334595 vmlinux.align=1+fno-guess-branch-probability (For completeness, the patch generating the smallest kernel is attached below.) The takeaway: the code size savings above are 7.9%. Even if we restored __builtin_expect() hints, we'd probably still see a combined 7.5% code shrinkage. That would be a rather significant I$ win, with very little cost that I can see! Thanks, Ingo --- arch/x86/Makefile | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/arch/x86/Makefile b/arch/x86/Makefile index 5ba2d9ce82dc..a6d3feb90b97 100644 --- a/arch/x86/Makefile +++ b/arch/x86/Makefile @@ -77,10 +77,25 @@ else KBUILD_AFLAGS += -m64 KBUILD_CFLAGS += -m64 + # Pack jump targets tightly, don't align them to the default 16 bytes: + KBUILD_CFLAGS += -falign-jumps=1 + + # Pack functions tightly as well: + KBUILD_CFLAGS += -falign-functions=1 + + # Pack loops tightly as well: + KBUILD_CFLAGS += -falign-loops=1 + # Don't autogenerate traditional x87 instructions KBUILD_CFLAGS += $(call cc-option,-mno-80387) KBUILD_CFLAGS += $(call cc-option,-mno-fp-ret-in-387) + # + # Don't guess branch probabilities, follow the code and unlikely()/likely() hints, + # which reduces vmlinux size by about 5.4%: + # + KBUILD_CFLAGS += -fno-guess-branch-probability + # Use -mpreferred-stack-boundary=3 if supported. KBUILD_CFLAGS += $(call cc-option,-mpreferred-stack-boundary=3) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/