On Tue, 27 Oct 2015, Peter Zijlstra wrote:
>On Wed, Oct 28, 2015 at 06:33:56AM +0900, Linus Torvalds wrote:
>> On Wed, Oct 28, 2015 at 4:53 AM, Davidlohr Bueso <[email protected]> wrote:
>> >
>> > Note that this might affect callers that could/would rely on the
>> > atomicity semantics, but there are no guarantees of that for
>> > smp_store_mb() mentioned anywhere, plus most archs use this anyway.
>> > Thus we continue to be consistent with the memory-barriers.txt file,
>> > and more importantly, maintain the semantics of the smp_ nature.
>>
>
>> So with this patch, the whole thing becomes pointless, I feel. (Ok, so
>> it may have been pointless before too, but at least before this patch
>> it generated special code, now it doesn't). So why carry it along at
>> all?
>
>So I suppose this boils down to if: XCHG ends up being cheaper than
>MOV+FENCE.
So I ran some experiments on an IvyBridge (2.8GHz) and the cost of XCHG is
constantly cheaper (by at least half the latency) than MFENCE. While there
was a decent amount of variation, this difference remained rather constant.
Then again, I'm not sure this matters. Thoughts?
Thanks,
Davidlohr
On Mon, Nov 2, 2015 at 12:15 PM, Davidlohr Bueso <[email protected]> wrote:
>
> So I ran some experiments on an IvyBridge (2.8GHz) and the cost of XCHG is
> constantly cheaper (by at least half the latency) than MFENCE. While there
> was a decent amount of variation, this difference remained rather constant.
Mind testing "lock addq $0,0(%rsp)" instead of mfence? That's what we
use on old cpu's without one (ie 32-bit).
I'm not actually convinced that mfence is necessarily a good idea. I
could easily see it being microcode, for example.
At least on my Haswell, the "lock addq" is pretty much exactly half
the cost of "mfence".
Linus
On Mon, 02 Nov 2015, Linus Torvalds wrote:
>On Mon, Nov 2, 2015 at 12:15 PM, Davidlohr Bueso <[email protected]> wrote:
>>
>> So I ran some experiments on an IvyBridge (2.8GHz) and the cost of XCHG is
>> constantly cheaper (by at least half the latency) than MFENCE. While there
>> was a decent amount of variation, this difference remained rather constant.
>
>Mind testing "lock addq $0,0(%rsp)" instead of mfence? That's what we
>use on old cpu's without one (ie 32-bit).
I'm getting results very close to xchg.
>I'm not actually convinced that mfence is necessarily a good idea. I
>could easily see it being microcode, for example.
Interesting.
>
>At least on my Haswell, the "lock addq" is pretty much exactly half
>the cost of "mfence".
Ok, his coincides with my results on IvB.