Date: Tue, 16 Oct 2007 02:50:33 +0200
From: Nick Piggin <npiggin@suse.de>
To: Jarek Poplawski <jarkao2@o2.pl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
       Helge Hafting <helge.hafting@aitel.hist.no>,
       Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
       Andi Kleen <ak@suse.de>
Subject: Re: [rfc][patch 3/3] x86: optimise barriers
Message-ID: <20071016005033.GB5851@wotan.suse.de>
References: <20071012082534.GB1962@ff.dom.local> <alpine.LFD.0.999.0710120802160.6887@woody.linux-foundation.org> <20071015074405.GA1875@ff.dom.local> <20071015080924.GA32562@wotan.suse.de> <20071015090959.GB1875@ff.dom.local>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20071015090959.GB1875@ff.dom.local>
User-Agent: Mutt/1.5.9i
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 3617
Lines: 71

On Mon, Oct 15, 2007 at 11:10:00AM +0200, Jarek Poplawski wrote:
> On Mon, Oct 15, 2007 at 10:09:24AM +0200, Nick Piggin wrote:
> ...
> > Has performance really been much problem for you? (even before the
> > lfence instruction, when you theoretically had to use a locked op)?
> > I mean, I'd struggle to find a place in the Linux kernel where there
> > is actually a measurable difference anywhere... and we're pretty
> > performance critical and I think we have a reasonable amount of lockless
> > code (I guess we may not have a lot of tight computational loops, though).
> > I'd be interested to know what, if any, application had found these
> > barriers to be problematic...
> 
> I'm not performance-words at all, so I can't help you, sorry. But, I
> understand people who care about this, and think there is a popular
> conviction barriers and locked instructions are costly, so I'm
> surprised there is any "problem" now with finding these gains...

It's more expensive than nothing, sure. However in real code, algorithmic
complexity, cache misses and cacheline bouncing tend to be much bigger
issues.

I can't think of a place in the kernel where smp_rmb matters _that_ much.
seqlocks maybe (timers, dcache lookup), vmscan... Obviously removing the
lfence is not going to hurt. Maybe we even gain 0.01% performance in
someone's workload.

Also, remember: if loads are already in-order, then lfence is a noop,
right? (in practice it seems to have to do a _little_ bit of work, but
it's like a dozen cycles).


> > The thing is that those documents are not defining what a particular
> > implementation does, but how the architecture is defined (ie. what
> > must some arbitrary software/hardware provide and what may it expect).
> 
> I'm not sure this is the right way to tell it. If there is no
> distinction between what is and what could be, how can I believe in
> similar Alpha or Itanium stuff? IMHO, these manuals sometimes look
> like they describe some real hardware mechanisms, and sometimes they
> mention about possible changes and reserved features too. So, when
> they don't mention you could think it's a present behavior.

No. Why are you reading that much into it? I know for a fact that some
non-x86 architectures actual implementations have stronger ordering than
their ISA allows. It's nothing to do with you "believing" how the hardware
works. That's not what the document is describing (directly).


> > It's pretty natural that Intel started out with a weaker guarantee
> > than their CPUs of the time actually supported, and tightened it up
> > after (presumably) deciding not to implement such relaxed semantics
> > for the forseeable future.
> 
> As a matter of fact it's not natural for me at all. I expected the
> other direction, and I still doubt programmers' intentions could be
> "automatically" predicted good enough, so IMHO, it's not for long.

Really? Consider the consequences if, instead of releasing this latest
document tightening consistency, Intel found that out of order loads
were worth 5% more performance and implemented them in their next chip.
The chip could be completely backwards compatible, but all your old code
would break, because it was broken to begin with (because it was outside
the spec).

IMO Intel did exactly the right thing from an engineering perspective,
and so did Linux to always follow the spec.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/