2018-10-11 19:34:46

by Andres Freund

[permalink] [raw]
Subject: Re: [PATCH v2 0/3] x86: faster mb()+other barrier.h tweaks

Hi,

On 2016-01-26 10:20:14 +0200, Michael S. Tsirkin wrote:
> On Tue, Jan 12, 2016 at 02:25:24PM -0800, H. Peter Anvin wrote:
> > On 01/12/16 14:10, Michael S. Tsirkin wrote:
> > > mb() typically uses mfence on modern x86, but a micro-benchmark shows that it's
> > > 2 to 3 times slower than lock; addl $0,(%%e/rsp) that we use on older CPUs.
> > >
> > > So let's use the locked variant everywhere - helps keep the code simple as
> > > well.
> > >
> > > While I was at it, I found some inconsistencies in comments in
> > > arch/x86/include/asm/barrier.h
> > >
> > > I hope I'm not splitting this up too much - the reason is I wanted to isolate
> > > the code changes (that people might want to test for performance) from comment
> > > changes approved by Linus, from (so far unreviewed) comment change I came up
> > > with myself.
> > >
> > > Lightly tested on my system.
> > >
> > > Michael S. Tsirkin (3):
> > > x86: drop mfence in favor of lock+addl
> > > x86: drop a comment left over from X86_OOSTORE
> > > x86: tweak the comment about use of wmb for IO
> > >
> >
> > I would like to get feedback from the hardware team about the
> > implications of this change, first.

> Any luck getting some feedback on this one?

Ping? I just saw a bunch of kernel fences in a benchmark, making me
wonder why linux uses mfence rather than lock addl. Leading me to this
thread.

Greetings,

Andres Freund


2018-10-11 19:36:01

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v2 0/3] x86: faster mb()+other barrier.h tweaks

On Thu, Oct 11, 2018 at 10:37:07AM -0700, Andres Freund wrote:
> Hi,
>
> On 2016-01-26 10:20:14 +0200, Michael S. Tsirkin wrote:
> > On Tue, Jan 12, 2016 at 02:25:24PM -0800, H. Peter Anvin wrote:
> > > On 01/12/16 14:10, Michael S. Tsirkin wrote:
> > > > mb() typically uses mfence on modern x86, but a micro-benchmark shows that it's
> > > > 2 to 3 times slower than lock; addl $0,(%%e/rsp) that we use on older CPUs.
> > > >
> > > > So let's use the locked variant everywhere - helps keep the code simple as
> > > > well.
> > > >
> > > > While I was at it, I found some inconsistencies in comments in
> > > > arch/x86/include/asm/barrier.h
> > > >
> > > > I hope I'm not splitting this up too much - the reason is I wanted to isolate
> > > > the code changes (that people might want to test for performance) from comment
> > > > changes approved by Linus, from (so far unreviewed) comment change I came up
> > > > with myself.
> > > >
> > > > Lightly tested on my system.
> > > >
> > > > Michael S. Tsirkin (3):
> > > > x86: drop mfence in favor of lock+addl
> > > > x86: drop a comment left over from X86_OOSTORE
> > > > x86: tweak the comment about use of wmb for IO
> > > >
> > >
> > > I would like to get feedback from the hardware team about the
> > > implications of this change, first.
>
> > Any luck getting some feedback on this one?
>
> Ping? I just saw a bunch of kernel fences in a benchmark, making me
> wonder why linux uses mfence rather than lock addl. Leading me to this
> thread.
>
> Greetings,
>
> Andres Freund

It doesn't do it for smp_mb any longer:

commit 450cbdd0125cfa5d7bbf9e2a6b6961cc48d29730
Author: Michael S. Tsirkin <[email protected]>
Date: Fri Oct 27 19:14:31 2017 +0300

locking/x86: Use LOCK ADD for smp_mb() instead of MFENCE


I didn't bother with mb() since I didn't think it's performance
critical, and one needs to worry about drivers possibly doing
non-temporals etc which do need mfence.

Do you see mb() in a benchmark then?

--
MST

2018-10-11 19:37:40

by Andres Freund

[permalink] [raw]
Subject: Re: [PATCH v2 0/3] x86: faster mb()+other barrier.h tweaks

Hi,

On 2018-10-11 14:11:42 -0400, Michael S. Tsirkin wrote:
> On Thu, Oct 11, 2018 at 10:37:07AM -0700, Andres Freund wrote:
> > Hi,
> >
> > On 2016-01-26 10:20:14 +0200, Michael S. Tsirkin wrote:
> > > On Tue, Jan 12, 2016 at 02:25:24PM -0800, H. Peter Anvin wrote:
> > > > On 01/12/16 14:10, Michael S. Tsirkin wrote:
> > > > > mb() typically uses mfence on modern x86, but a micro-benchmark shows that it's
> > > > > 2 to 3 times slower than lock; addl $0,(%%e/rsp) that we use on older CPUs.
> > > > >
> > > > > So let's use the locked variant everywhere - helps keep the code simple as
> > > > > well.
> > > > >
> > > > > While I was at it, I found some inconsistencies in comments in
> > > > > arch/x86/include/asm/barrier.h
> > > > >
> > > > > I hope I'm not splitting this up too much - the reason is I wanted to isolate
> > > > > the code changes (that people might want to test for performance) from comment
> > > > > changes approved by Linus, from (so far unreviewed) comment change I came up
> > > > > with myself.
> > > > >
> > > > > Lightly tested on my system.
> > > > >
> > > > > Michael S. Tsirkin (3):
> > > > > x86: drop mfence in favor of lock+addl
> > > > > x86: drop a comment left over from X86_OOSTORE
> > > > > x86: tweak the comment about use of wmb for IO
> > > > >
> > > >
> > > > I would like to get feedback from the hardware team about the
> > > > implications of this change, first.
> >
> > > Any luck getting some feedback on this one?
> >
> > Ping? I just saw a bunch of kernel fences in a benchmark, making me
> > wonder why linux uses mfence rather than lock addl. Leading me to this
> > thread.
> >
> > Greetings,
> >
> > Andres Freund
>
> It doesn't do it for smp_mb any longer:
>
> commit 450cbdd0125cfa5d7bbf9e2a6b6961cc48d29730
> Author: Michael S. Tsirkin <[email protected]>
> Date: Fri Oct 27 19:14:31 2017 +0300
>
> locking/x86: Use LOCK ADD for smp_mb() instead of MFENCE

Ooh, missed that one.


> I didn't bother with mb() since I didn't think it's performance
> critical, and one needs to worry about drivers possibly doing
> non-temporals etc which do need mfence.
>
> Do you see mb() in a benchmark then?

No it was a smp_mp(). It was on an older kernel (was profiling postgres
on hardware I have limited control over,not the kernel. Just noticed the
barrier while looking at perf output). I quickly looked into a current
arch/x86/include/asm/barrier.h and still saw mfences, and then found
this thread. Should have looked more carefully.

Sorry for the noise, and thanks for the quick answer.

- Andres