Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753688AbbDJSsf (ORCPT ); Fri, 10 Apr 2015 14:48:35 -0400 Received: from mail-ie0-f181.google.com ([209.85.223.181]:36031 "EHLO mail-ie0-f181.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751809AbbDJSsd (ORCPT ); Fri, 10 Apr 2015 14:48:33 -0400 MIME-Version: 1.0 In-Reply-To: <5527C700.3030405@redhat.com> References: <20150409175652.GI6464@linux.vnet.ibm.com> <20150409183926.GM6464@linux.vnet.ibm.com> <20150410090051.GA28549@gmail.com> <20150410091252.GA27630@gmail.com> <20150410092152.GA21332@gmail.com> <20150410111427.GA30477@gmail.com> <20150410112748.GB30477@gmail.com> <20150410120846.GA17101@gmail.com> <5527C700.3030405@redhat.com> Date: Fri, 10 Apr 2015 11:48:32 -0700 X-Google-Sender-Auth: mVJ8fM4NKC2PXfI2JzhwLG05xyQ Message-ID: Subject: Re: [PATCH] x86: Align jump targets to 1 byte boundaries From: Linus Torvalds To: Denys Vlasenko Cc: Ingo Molnar , "Paul E. McKenney" , Jason Low , Peter Zijlstra , Davidlohr Bueso , Tim Chen , Aswin Chandramouleeswaran , LKML , Borislav Petkov , Andy Lutomirski , Brian Gerst , "H. Peter Anvin" , Thomas Gleixner , Peter Zijlstra Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2136 Lines: 59 On Fri, Apr 10, 2015 at 5:50 AM, Denys Vlasenko wrote: > > However, I'm an -Os guy. Expect -O2 people to disagree :) I used to be an -Os guy too. I'm a big believer in I$ density. HOWEVER. It turns out that gcc's -Os is just horrible nasty crap. It doesn't actually make good tradeoffs for code density, because it doesn't make any tradeoffs at all. It tries to choose small code, even when it's ridiculously bad small code. For example, a 24-byte static memcpy is best done as three quad-word load/store pairs. That's very cheap, and not at all unreasonable. But what does gcc do? It does a "rep movsl". Seriously. That's *shit*. It absolutely kills performance on some very critical code. I'm not making that up. Try "-O2" and "-Os" on the appended trivial code. Yes, the "rep movsl" is smaller, but it's incredibly expensive, particularly if the result is partially used afterwards. And I'm not a hater of "rep movs" - not at all. I think that "rep movsb" is basically a perfect way to tell the CPU "do an optimized memcpy with whatever cache situation you have". So I'm a big fan of the string instructions, but only when appropriate. And "appropriate" here very much includes "I don't know the memory copy size, so I'm going to call out to some complex generic code that does all kinds of size checks and tricks". Replacing three pairs of "mov" instructions with a "rep movs" is insane. (There are a couple of other examples of that kind of issues with "-Os". Like using "imul $15" instead of single shift-by-4 and subtract. Again, the "imul" is certainly smaller, but can have quite bad latency and throughput issues). So I'm no longer a fan of -Os. It disables too many obviously good code optimizations. Linus --- struct dummy { unsigned long a, b, c; }; void test(struct dummy *a, struct dummy *b) { *b = *a; } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/