Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1764163AbXJPAuo (ORCPT ); Mon, 15 Oct 2007 20:50:44 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1752973AbXJPAug (ORCPT ); Mon, 15 Oct 2007 20:50:36 -0400 Received: from cantor.suse.de ([195.135.220.2]:36798 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751926AbXJPAuf (ORCPT ); Mon, 15 Oct 2007 20:50:35 -0400 Date: Tue, 16 Oct 2007 02:50:33 +0200 From: Nick Piggin To: Jarek Poplawski Cc: Linus Torvalds , Helge Hafting , Linux Kernel Mailing List , Andi Kleen Subject: Re: [rfc][patch 3/3] x86: optimise barriers Message-ID: <20071016005033.GB5851@wotan.suse.de> References: <20071012082534.GB1962@ff.dom.local> <20071015074405.GA1875@ff.dom.local> <20071015080924.GA32562@wotan.suse.de> <20071015090959.GB1875@ff.dom.local> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20071015090959.GB1875@ff.dom.local> User-Agent: Mutt/1.5.9i Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3617 Lines: 71 On Mon, Oct 15, 2007 at 11:10:00AM +0200, Jarek Poplawski wrote: > On Mon, Oct 15, 2007 at 10:09:24AM +0200, Nick Piggin wrote: > ... > > Has performance really been much problem for you? (even before the > > lfence instruction, when you theoretically had to use a locked op)? > > I mean, I'd struggle to find a place in the Linux kernel where there > > is actually a measurable difference anywhere... and we're pretty > > performance critical and I think we have a reasonable amount of lockless > > code (I guess we may not have a lot of tight computational loops, though). > > I'd be interested to know what, if any, application had found these > > barriers to be problematic... > > I'm not performance-words at all, so I can't help you, sorry. But, I > understand people who care about this, and think there is a popular > conviction barriers and locked instructions are costly, so I'm > surprised there is any "problem" now with finding these gains... It's more expensive than nothing, sure. However in real code, algorithmic complexity, cache misses and cacheline bouncing tend to be much bigger issues. I can't think of a place in the kernel where smp_rmb matters _that_ much. seqlocks maybe (timers, dcache lookup), vmscan... Obviously removing the lfence is not going to hurt. Maybe we even gain 0.01% performance in someone's workload. Also, remember: if loads are already in-order, then lfence is a noop, right? (in practice it seems to have to do a _little_ bit of work, but it's like a dozen cycles). > > The thing is that those documents are not defining what a particular > > implementation does, but how the architecture is defined (ie. what > > must some arbitrary software/hardware provide and what may it expect). > > I'm not sure this is the right way to tell it. If there is no > distinction between what is and what could be, how can I believe in > similar Alpha or Itanium stuff? IMHO, these manuals sometimes look > like they describe some real hardware mechanisms, and sometimes they > mention about possible changes and reserved features too. So, when > they don't mention you could think it's a present behavior. No. Why are you reading that much into it? I know for a fact that some non-x86 architectures actual implementations have stronger ordering than their ISA allows. It's nothing to do with you "believing" how the hardware works. That's not what the document is describing (directly). > > It's pretty natural that Intel started out with a weaker guarantee > > than their CPUs of the time actually supported, and tightened it up > > after (presumably) deciding not to implement such relaxed semantics > > for the forseeable future. > > As a matter of fact it's not natural for me at all. I expected the > other direction, and I still doubt programmers' intentions could be > "automatically" predicted good enough, so IMHO, it's not for long. Really? Consider the consequences if, instead of releasing this latest document tightening consistency, Intel found that out of order loads were worth 5% more performance and implemented them in their next chip. The chip could be completely backwards compatible, but all your old code would break, because it was broken to begin with (because it was outside the spec). IMO Intel did exactly the right thing from an engineering perspective, and so did Linux to always follow the spec. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/