Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1761588AbXHUQvp (ORCPT ); Tue, 21 Aug 2007 12:51:45 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1759789AbXHUQve (ORCPT ); Tue, 21 Aug 2007 12:51:34 -0400 Received: from smtp2.linux-foundation.org ([207.189.120.14]:35508 "EHLO smtp2.linux-foundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1759459AbXHUQvc (ORCPT ); Tue, 21 Aug 2007 12:51:32 -0400 Date: Tue, 21 Aug 2007 09:43:52 -0700 (PDT) From: Linus Torvalds To: Chris Snook cc: David Miller , piggin@cyberone.com.au, satyam@infradead.org, herbert@gondor.apana.org.au, paulus@samba.org, clameter@sgi.com, ilpo.jarvinen@helsinki.fi, paulmck@linux.vnet.ibm.com, stefanr@s5r6.in-berlin.de, linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org, netdev@vger.kernel.org, akpm@linux-foundation.org, ak@suse.de, heiko.carstens@de.ibm.com, schwidefsky@de.ibm.com, wensong@linux-vs.org, horms@verge.net.au, wjiang@resilience.com, cfriesen@nortel.com, zlynx@acm.org, rpjday@mindspring.com, jesper.juhl@gmail.com, segher@kernel.crashing.org Subject: Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures In-Reply-To: <46CAED8B.9030006@redhat.com> Message-ID: References: <46C993DF.4080400@redhat.com> <20070821.000404.39159401.davem@davemloft.net> <46CAED8B.9030006@redhat.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=us-ascii Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3400 Lines: 70 On Tue, 21 Aug 2007, Chris Snook wrote: > > Moore's law is definitely working against us here. Register counts, pipeline > depths, core counts, and clock multipliers are all increasing in the long run. > At some point in the future, barrier() will be universally regarded as a > hammer too big for most purposes. Note that "barrier()" is purely a compiler barrier. It has zero impact on the CPU pipeline itself, and also has zero impact on anything that gcc knows isn't visible in memory (ie local variables that don't have their address taken), so barrier() really is pretty cheap. Now, it's possible that gcc messes up in some circumstances, and that the memory clobber will cause gcc to also do things like flush local registers unnecessarily to their stack slots, but quite frankly, if that happens, it's a gcc problem, and I also have to say that I've not seen that myself. So in a very real sense, "barrier()" will just make sure that there is a stronger sequence point for the compiler where things are stable. In most cases it has absolutely zero performance impact - apart from the -intended- impact of making sure that the compiler doesn't re-order or cache stuff around it. And sure, we could make it more finegrained, and also introduce a per-variable barrier, but the fact is, people _already_ have problems with thinking about these kinds of things, and adding new abstraction issues with subtle semantics is the last thing we want. So I really think you'd want to show a real example of real code that actually gets noticeably slower or bigger. In removing "volatile", we have shown that. It may not have made a big difference on powerpc, but it makes a real difference on x86 - and more importantly, it removes something that people clearly don't know how it works, and incorrectly expect to just fix bugs. [ There are *other* barriers - the ones that actually add memory barriers to the CPU - that really can be quite expensive. The good news is that the expense is going down rather than up: both Intel and AMD are not only removing the need for some of them (ie "smp_rmb()" will become a compiler-only barrier), but we're _also_ seeing the whole "pipeline flush" approach go away, and be replaced by the CPU itself actually being better - so even the actual CPU pipeline barriers are getting cheaper, not more expensive. ] For example, did anybody even _test_ how expensive "barrier()" is? Just as a lark, I did #undef barrier #define barrier() do { } while (0) in kernel/sched.c (which only has three of them in it, but hey, that's more than most files), and there were _zero_ code generation downsides. One instruction was moved (and a few line numbers changed), so it wasn't like the assembly language was identical, but the point is, barrier() simply doesn't have the same kinds of downsides that "volatile" has. (That may not be true on other architectures or in other source files, of course. This *does* depend on code generation details. But anybody who thinks that "barrier()" is fundamentally expensive is simply incorrect. It is *fundamnetally* a no-op). Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/