Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756622AbZAMAW1 (ORCPT ); Mon, 12 Jan 2009 19:22:27 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1752408AbZAMAWP (ORCPT ); Mon, 12 Jan 2009 19:22:15 -0500 Received: from smtp1.linux-foundation.org ([140.211.169.13]:48625 "EHLO smtp1.linux-foundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751023AbZAMAWO (ORCPT ); Mon, 12 Jan 2009 19:22:14 -0500 Date: Mon, 12 Jan 2009 16:21:20 -0800 (PST) From: Linus Torvalds X-X-Sender: torvalds@localhost.localdomain To: Bernd Schmidt cc: Andi Kleen , David Woodhouse , Andrew Morton , Ingo Molnar , Harvey Harrison , "H. Peter Anvin" , Chris Mason , Peter Zijlstra , Steven Rostedt , paulmck@linux.vnet.ibm.com, Gregory Haskins , Matthew Wilcox , Linux Kernel Mailing List , linux-fsdevel , linux-btrfs , Thomas Gleixner , Nick Piggin , Peter Morreale , Sven Dietrich , jh@suse.cz Subject: Re: gcc inlining heuristics was Re: [PATCH -v7][RFC]: mutex: implement adaptive spinning In-Reply-To: <496BBE27.2020206@t-online.de> Message-ID: References: <20090111201427.GP26290@one.firstfloor.org> <1231704939.25018.548.camel@macbook.infradead.org> <20090111203441.GQ26290@one.firstfloor.org> <20090112001255.GR26290@one.firstfloor.org> <20090112005228.GS26290@one.firstfloor.org> <496B86B5.3090707@t-online.de> <20090112193201.GA23848@one.firstfloor.org> <496BBE27.2020206@t-online.de> User-Agent: Alpine 2.00 (LFD 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3524 Lines: 74 On Mon, 12 Jan 2009, Bernd Schmidt wrote: > > Too lazy to construct one myself, I googled for examples, and here's a > trivial one that shows how it affects the ability of the compiler to > eliminate memory references: Do you really think this is realistic or even relevant? The fact is (a) most people use similar types, so your example of "short" vs "int" is actually not very common. Type-based alias analysis is wonderful for finding specific examples of something you can optimize, but it's not actually all that wonderful in general. It _particularly_ isn't wonderful once you start looking at the downsides. When you're adding arrays of integers, you're usually adding integers. Not "short"s. The shorts may be a great example of a special case, but it's a special case! (b) instructions with memory accesses aren't the problem - instructions that take cache misses are. Your example is an excellent example of that - eliding the simple load out of the loop makes just about absolutely _zero_ difference in any somewhat more realistic scenario, because that one isn't the one that is going to make any real difference anyway. The thing is, the way to optimize for modern CPU's isn't to worry over-much about instruction scheduling. Yes, it matters for the broken ones, but it matters in the embedded world where you still find in-order CPU's, and there the size of code etc matters even more. > I'll grant you that if you're writing a kernel or maybe a malloc > library, you have reason to be unhappy about it. But that's what > compiler switches are for: -fno-strict-aliasing allows you to write code > in a superset of C. Oh, I'd use that flag regardless yes. But what you didn't seem to react to was that gcc - for no valid reason what-so-ever - actually trusts (or at least trusted: I haven't looked at that code for years) provably true static alias information _less_ than the idiotic weaker type-based one. You make all this noise about how type-based alias analysis improves code, but then you can't seem to just look at the example I gave you. Type-based alias analysis didn't improve code. It just made things worse, for no actual gain. Moving those accesses to the stack around just causes worse behavior, and a bigger stack frame, which causes more cache misses. [ Again, I do admit that kernel code is "different": we tend to have a cold stack, in ways that many other code sequences do not have. System code tends to get a lot more I$ and D$ misses. Deep call-chains _will_ take cache misses on the stack, simply because the user will do things between system calls or page faults that almost guarantees that things are not in L1, and often not in L2 either. Also, sadly, microbenchmarks often hide this, since they are often exactly the unrealistic kinds of back-to-back system calls that almost no real program ever has, since real programs actually _do_ something with the data. ] My point is, you're making all these arguments and avoiding looking at the downsides of what you are arguing for. So we use -Os - because it generally generates better (and simpler) code. We use -fno-strict-alias for the same reason. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/