Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757745AbZAUJ0R (ORCPT ); Wed, 21 Jan 2009 04:26:17 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1749667AbZAUJZz (ORCPT ); Wed, 21 Jan 2009 04:25:55 -0500 Received: from mx1.suse.de ([195.135.220.2]:36134 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752931AbZAUJZx (ORCPT ); Wed, 21 Jan 2009 04:25:53 -0500 Date: Wed, 21 Jan 2009 10:25:50 +0100 From: Nick Piggin To: Andi Kleen Cc: Ingo Molnar , Linus Torvalds , David Woodhouse , Bernd Schmidt , Andrew Morton , Harvey Harrison , "H. Peter Anvin" , Chris Mason , Peter Zijlstra , Steven Rostedt , paulmck@linux.vnet.ibm.com, Gregory Haskins , Matthew Wilcox , Linux Kernel Mailing List , linux-fsdevel , linux-btrfs , Thomas Gleixner , Peter Morreale , Sven Dietrich , jh@suse.cz Subject: Re: gcc inlining heuristics was Re: [PATCH -v7][RFC]: mutex: implement adaptive spinning Message-ID: <20090121092550.GP24891@wotan.suse.de> References: <20090120005124.GD16304@wotan.suse.de> <20090120123824.GD7790@elte.hu> <1232480940.22233.1435.camel@macbook.infradead.org> <20090120210515.GC19710@elte.hu> <20090120220516.GA10483@elte.hu> <20090121085402.GD15750@one.firstfloor.org> <20090121085208.GO24891@wotan.suse.de> <20090121092049.GE15750@one.firstfloor.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20090121092049.GE15750@one.firstfloor.org> User-Agent: Mutt/1.5.9i Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3122 Lines: 75 On Wed, Jan 21, 2009 at 10:20:49AM +0100, Andi Kleen wrote: > On Wed, Jan 21, 2009 at 09:52:08AM +0100, Nick Piggin wrote: > > On Wed, Jan 21, 2009 at 09:54:02AM +0100, Andi Kleen wrote: > > > > GCC 4.3.2. Maybe i missed something obvious? > > > > > > The typical use case of restrict is to tell it that multiple given > > > arrays are independent and then give the loop optimizer > > > more freedom to handle expressions in the loop that > > > accesses these arrays. > > > > > > Since there are no loops in the list functions nothing changed. > > > > > > Ok presumably there are some other optimizations which > > > rely on that alias information too, but again the list_* > > > stuff is probably too simple to trigger any of them. > > > > Any function that does several interleaved loads and stores > > through different pointers could have much more freedom to > > move loads early and stores late. > > For once that would require more live registers. It's not > a clear and obvious win. Especially not if you have > only very little registers, like on 32bit x86. > > Then it would typically increase code size. The point is that the compiler is then free to do it. If things slow down after the compiler gets *more* information, then that is a problem with the compiler heuristics rather than the information we give it. > Then x86s tend to have very very fast L1 caches and > if something is not in L1 on reads then the cost of fetching > something for a read dwarfs the few cycles you can typically > get out of this. Well most architectures have L1 caches of several cycles. And L2 miss typically means going to L2 which in some cases the compiler is expected to attempt to cover as much as possible (eg in-order architectures). If the caches are missed completely, then especially with an in-order architecture, you want to issue as many parallel loads as possible during the stall. If the compiler can't resolve aliases, then it simply won't be able to bring some of those loads forward. > And lastly even on a in order system stores can > be typically queued without stalling, so it doesn't > hurt to do them early. Store queues are, what? On the order of tens of entries for big power hungry x86? I'd guess much smaller for low power in-order x86 and ARM etc. These can definitely fill up and stall, so you still want to get loads out early if possible. Even a lot of OOOE CPUs I think won't have the best alias anaysis, so all else being equal, it wouldn't hurt them to move loads earlier. > Also at least x86 gcc normally doesn't do scheduling > beyond basic blocks, so any if () shuts it up. I don't think any of this is a reason not to use restrict, though. But... there are so many places we could add it to the kernel, and probably so few where it makes much difference. Maybe it should be able to help some critical core code, though. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/