Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1764388AbZAUKRl (ORCPT ); Wed, 21 Jan 2009 05:17:41 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1764335AbZAUKOT (ORCPT ); Wed, 21 Jan 2009 05:14:19 -0500 Received: from mail.suse.de ([195.135.220.2]:37642 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1764212AbZAUKOQ (ORCPT ); Wed, 21 Jan 2009 05:14:16 -0500 Date: Wed, 21 Jan 2009 11:14:10 +0100 From: Nick Piggin To: Andi Kleen Cc: Ingo Molnar , Linus Torvalds , David Woodhouse , Bernd Schmidt , Andrew Morton , Harvey Harrison , "H. Peter Anvin" , Chris Mason , Peter Zijlstra , Steven Rostedt , paulmck@linux.vnet.ibm.com, Gregory Haskins , Matthew Wilcox , Linux Kernel Mailing List , linux-fsdevel , linux-btrfs , Thomas Gleixner , Peter Morreale , Sven Dietrich , jh@suse.cz Subject: Re: gcc inlining heuristics was Re: [PATCH -v7][RFC]: mutex: implement adaptive spinning Message-ID: <20090121101409.GQ24891@wotan.suse.de> References: <20090120123824.GD7790@elte.hu> <1232480940.22233.1435.camel@macbook.infradead.org> <20090120210515.GC19710@elte.hu> <20090120220516.GA10483@elte.hu> <20090121085402.GD15750@one.firstfloor.org> <20090121085208.GO24891@wotan.suse.de> <20090121092049.GE15750@one.firstfloor.org> <20090121092550.GP24891@wotan.suse.de> <20090121095418.GG15750@one.firstfloor.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20090121095418.GG15750@one.firstfloor.org> User-Agent: Mutt/1.5.9i Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2734 Lines: 70 On Wed, Jan 21, 2009 at 10:54:18AM +0100, Andi Kleen wrote: > > The point is that the compiler is then free to do it. If things > > slow down after the compiler gets *more* information, then that > > is a problem with the compiler heuristics rather than the > > information we give it. > > The point was the -Os disables it typically then. > (not always, compiler heuristics are far from perfect) That'd be just another gcc failing. If it can make the code faster without a size increase, then it should (of course if it has to start spilling registers etc. then that's a different matter, but we're not talking about only 32-bit x86 here). > > > Then x86s tend to have very very fast L1 caches and > > > if something is not in L1 on reads then the cost of fetching > > > something for a read dwarfs the few cycles you can typically > > > get out of this. > > > > Well most architectures have L1 caches of several cycles. And > > L2 miss typically means going to L2 which in some cases the > > compiler is expected to attempt to cover as much as possible > > (eg in-order architectures). > > L2 cache is so much slower that scheduling a few instructions > more doesn't help much. I think on a lot of CPUs that is actually not the case. Including on Nehalem and Montecito CPUs where it is what, like under 15 cycles? Even in cases where you have a high latency LLC or go to memory, you want to be able to start as many loads as possible before stalling. Especially for in-order architectures, but even OOOE can stall if it can't resolve store addresses early enough or speculate them. > > stall, so you still want to get loads out early if possible. > > > > Even a lot of OOOE CPUs I think won't have the best alias > > anaysis, so all else being equal, it wouldn't hurt them to > > move loads earlier. > > Hmm, but if the load is nearby it won't matter if a > store is in the middle, because the CPU will just execute > over it. If the address is not known or the store buffer fills up etc then it may not be able to. It could even be hundreds of instructions here too much for an OOOE processor window. We have a lot of huge functions (although granted they'll often contain barriers for other reasons like locks or function calls anyway). > The real big win is if you do some computation inbetween, > but at least for typical list manipulation there isn't > really any. Well, I have a feeling that the MLP side of it could be more significant than ILP. But no data. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/