2015-11-02 20:15:47

by Davidlohr Bueso

[permalink] [raw]
Subject: Re: [PATCH 3/4] x86,asm: Re-work smp_store_mb()

On Tue, 27 Oct 2015, Peter Zijlstra wrote:

>On Wed, Oct 28, 2015 at 06:33:56AM +0900, Linus Torvalds wrote:
>> On Wed, Oct 28, 2015 at 4:53 AM, Davidlohr Bueso <[email protected]> wrote:
>> >
>> > Note that this might affect callers that could/would rely on the
>> > atomicity semantics, but there are no guarantees of that for
>> > smp_store_mb() mentioned anywhere, plus most archs use this anyway.
>> > Thus we continue to be consistent with the memory-barriers.txt file,
>> > and more importantly, maintain the semantics of the smp_ nature.
>>
>
>> So with this patch, the whole thing becomes pointless, I feel. (Ok, so
>> it may have been pointless before too, but at least before this patch
>> it generated special code, now it doesn't). So why carry it along at
>> all?
>
>So I suppose this boils down to if: XCHG ends up being cheaper than
>MOV+FENCE.

So I ran some experiments on an IvyBridge (2.8GHz) and the cost of XCHG is
constantly cheaper (by at least half the latency) than MFENCE. While there
was a decent amount of variation, this difference remained rather constant.

Then again, I'm not sure this matters. Thoughts?

Thanks,
Davidlohr


2015-11-03 00:06:49

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH 3/4] x86,asm: Re-work smp_store_mb()

On Mon, Nov 2, 2015 at 12:15 PM, Davidlohr Bueso <[email protected]> wrote:
>
> So I ran some experiments on an IvyBridge (2.8GHz) and the cost of XCHG is
> constantly cheaper (by at least half the latency) than MFENCE. While there
> was a decent amount of variation, this difference remained rather constant.

Mind testing "lock addq $0,0(%rsp)" instead of mfence? That's what we
use on old cpu's without one (ie 32-bit).

I'm not actually convinced that mfence is necessarily a good idea. I
could easily see it being microcode, for example.

At least on my Haswell, the "lock addq" is pretty much exactly half
the cost of "mfence".

Linus

2015-11-03 01:36:36

by Davidlohr Bueso

[permalink] [raw]
Subject: Re: [PATCH 3/4] x86,asm: Re-work smp_store_mb()

On Mon, 02 Nov 2015, Linus Torvalds wrote:

>On Mon, Nov 2, 2015 at 12:15 PM, Davidlohr Bueso <[email protected]> wrote:
>>
>> So I ran some experiments on an IvyBridge (2.8GHz) and the cost of XCHG is
>> constantly cheaper (by at least half the latency) than MFENCE. While there
>> was a decent amount of variation, this difference remained rather constant.
>
>Mind testing "lock addq $0,0(%rsp)" instead of mfence? That's what we
>use on old cpu's without one (ie 32-bit).

I'm getting results very close to xchg.

>I'm not actually convinced that mfence is necessarily a good idea. I
>could easily see it being microcode, for example.

Interesting.

>
>At least on my Haswell, the "lock addq" is pretty much exactly half
>the cost of "mfence".

Ok, his coincides with my results on IvB.