Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754202AbZAITpd (ORCPT ); Fri, 9 Jan 2009 14:45:33 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1752292AbZAITpV (ORCPT ); Fri, 9 Jan 2009 14:45:21 -0500 Received: from smtp1.linux-foundation.org ([140.211.169.13]:42069 "EHLO smtp1.linux-foundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752222AbZAITpU (ORCPT ); Fri, 9 Jan 2009 14:45:20 -0500 Date: Fri, 9 Jan 2009 11:44:19 -0800 (PST) From: Linus Torvalds X-X-Sender: torvalds@localhost.localdomain To: Richard Guenther cc: Matthew Wilcox , Andi Kleen , Dirk Hohndel , "H. Peter Anvin" , Ingo Molnar , jim owens , Chris Mason , Peter Zijlstra , Steven Rostedt , paulmck@linux.vnet.ibm.com, Gregory Haskins , Andrew Morton , Linux Kernel Mailing List , linux-fsdevel , linux-btrfs , Thomas Gleixner , Nick Piggin , Peter Morreale , Sven Dietrich , jh@suse.cz Subject: Re: [patch] measurements, numbers about CONFIG_OPTIMIZE_INLINING=y impact In-Reply-To: <84fc9c000901091109t2c2aef2fu596f8807b0962688@mail.gmail.com> Message-ID: References: <496648C7.5050700@zytor.com> <49675920.4050205@hp.com> <20090109153508.GA4671@elte.hu> <49677CB1.3030701@zytor.com> <20090109084620.3c711aad@infradead.org> <20090109172011.GD26290@one.firstfloor.org> <20090109172801.GC6936@parisc-linux.org> <20090109174719.GG26290@one.firstfloor.org> <20090109173914.GD6936@parisc-linux.org> <84fc9c000901091109t2c2aef2fu596f8807b0962688@mail.gmail.com> User-Agent: Alpine 2.00 (LFD 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3680 Lines: 82 On Fri, 9 Jan 2009, Richard Guenther wrote: > > -fno-inline-functions-called-once disables the heuristic that always > inlines (static!) functions that are called once. Other heuristics > still apply, like inlining the static function if it is small. > Everything else would be totally stupid - which seems to be the "default > mode" you think GCC developers are in. Well, I don't know about you, but the "don't inline a single instruction" sounds a bit stupid to me. And yes, that's exactly what triggered this whole thing. We have two examples of gcc doing that, one of which was even a modern version of gcc, where we had sone absolutely _everything_ on a source level to make sure that gcc could not possibly screw up. Yet it did: static inline int constant_test_bit(int nr, const volatile unsigned long *addr) { return ((1UL << (nr % BITS_PER_LONG)) & (((unsigned long *)addr)[nr / BITS_PER_LONG])) != 0; } #define test_bit(nr, addr) \ (__builtin_constant_p((nr)) \ ? constant_test_bit((nr), (addr)) \ : variable_test_bit((nr), (addr))) in this case, Ingo said that changing that _single_ inline to forcing inlining made a difference. That's CRAZY. The thing isn't even called unless "nr" is constant, so absolutely _everything_ optimizes away, and that whole function was designed to give us a single instruction: testl $constant,constant_offset(addr) and nothing else. Maybe there was something else going on, and maybe Ingo's tests were off, but this is an example of gcc not inlining WHEN WE TOLD IT TO, and when the function was a single instruction. How can anybody possibly not consider that to be "stupid"? The other case (with a single "cmpxchg" inline asm instruction) was at least _slightly_ more understandable, in that (a) Ingo claims modern gcc's did inline it and (b) the original function actually has a "switch()" statement that depends on the argument that is constant, so a stupid inliner might believe that it's a big function. But again, we _told_ the compiler to inline the damn thing, because we knew better. But gcc didn't. The other part that is crazy is when gcc inlines large functions that aren't even called most of the time (the "ioctl()" switch statements tend to be a great example of this - gcc inlines ten or twenty functions, and we can guarantee that only one of them is ever called). Yes, maybe it makes the code smaller, but it makes the code also undebuggable and often BUGGY, because we now have the stack frame of all ten-to-twenty functions to contend with. And notice how "static" has absolutely _zero_ meaning for the above example. Yes, the thing is called just from one place - that's how something like that very much works. It's a special case. It's not _worth_ inlining, especially if it causes bugs. So "called once" or "static" is actually totally irrelevant. And no, they are not marked "inline" (although they are clearly also not marked "uninline", until we figure out that gcc is causing system crashes, and we add the thing). If these two small problems were fixed, gcc inlining would work much better. But the first one, in particular, means that the "do I inline or not" decision would have to happen after expanding and simplifying constants. And then, if the end result is big, the inlining gets aborted. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/