2021-01-28 00:43:16

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 1/6] MIPS: Octeon: Implement __smp_store_release()

On Wed, Jan 27, 2021 at 09:36:22PM +0100, Alexander A Sverdlin wrote:
> From: Alexander Sverdlin <[email protected]>
>
> On Octeon smp_mb() translates to SYNC while wmb+rmb translates to SYNCW
> only. This brings around 10% performance on tight uncontended spinlock
> loops.
>
> Refer to commit 500c2e1fdbcc ("MIPS: Optimize spinlocks.") and the link
> below.
>
> On 6-core Octeon machine:
> sysbench --test=mutex --num-threads=64 --memory-scope=local run
>
> w/o patch: 1.60s
> with patch: 1.51s
>
> Link: https://lore.kernel.org/lkml/[email protected]/
> Signed-off-by: Alexander Sverdlin <[email protected]>
> ---
> arch/mips/include/asm/barrier.h | 9 +++++++++
> 1 file changed, 9 insertions(+)
>
> diff --git a/arch/mips/include/asm/barrier.h b/arch/mips/include/asm/barrier.h
> index 49ff172..24c3f2c 100644
> --- a/arch/mips/include/asm/barrier.h
> +++ b/arch/mips/include/asm/barrier.h
> @@ -113,6 +113,15 @@ static inline void wmb(void)
> ".set arch=octeon\n\t" \
> "syncw\n\t" \
> ".set pop" : : : "memory")
> +
> +#define __smp_store_release(p, v) \
> +do { \
> + compiletime_assert_atomic_type(*p); \
> + __smp_wmb(); \
> + __smp_rmb(); \
> + WRITE_ONCE(*p, v); \
> +} while (0)

This is wrong in general since smp_rmb() will only provide order between
two loads and smp_store_release() is a store.

If this is correct for all MIPS, this needs a giant comment on exactly
how that smp_rmb() makes sense here.


2021-01-28 07:31:25

by Alexander Sverdlin

[permalink] [raw]
Subject: Re: [PATCH 1/6] MIPS: Octeon: Implement __smp_store_release()

Hello Peter,

On 27/01/2021 23:32, Peter Zijlstra wrote:
>> Link: https://lore.kernel.org/lkml/[email protected]/

please, check the discussion pointed by the link above...

>> Signed-off-by: Alexander Sverdlin <[email protected]>
>> ---
>> arch/mips/include/asm/barrier.h | 9 +++++++++
>> 1 file changed, 9 insertions(+)
>>
>> diff --git a/arch/mips/include/asm/barrier.h b/arch/mips/include/asm/barrier.h
>> index 49ff172..24c3f2c 100644
>> --- a/arch/mips/include/asm/barrier.h
>> +++ b/arch/mips/include/asm/barrier.h
>> @@ -113,6 +113,15 @@ static inline void wmb(void)
>> ".set arch=octeon\n\t" \
>> "syncw\n\t" \
>> ".set pop" : : : "memory")
>> +
>> +#define __smp_store_release(p, v) \
>> +do { \
>> + compiletime_assert_atomic_type(*p); \
>> + __smp_wmb(); \
>> + __smp_rmb(); \
>> + WRITE_ONCE(*p, v); \
>> +} while (0)
> This is wrong in general since smp_rmb() will only provide order between
> two loads and smp_store_release() is a store.
>
> If this is correct for all MIPS, this needs a giant comment on exactly
> how that smp_rmb() makes sense here.

... the macro is provided for Octeon only, and __smp_rmb() is actually a NOP
there, but I thought to "document" the flow of thoughts from the discussion
above by including it anyway.

--
Best regards,
Alexander Sverdlin.

2021-01-28 11:35:20

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 1/6] MIPS: Octeon: Implement __smp_store_release()

On Thu, Jan 28, 2021 at 08:27:29AM +0100, Alexander Sverdlin wrote:

> >> +#define __smp_store_release(p, v) \
> >> +do { \
> >> + compiletime_assert_atomic_type(*p); \
> >> + __smp_wmb(); \
> >> + __smp_rmb(); \
> >> + WRITE_ONCE(*p, v); \
> >> +} while (0)
> > This is wrong in general since smp_rmb() will only provide order between
> > two loads and smp_store_release() is a store.
> >
> > If this is correct for all MIPS, this needs a giant comment on exactly
> > how that smp_rmb() makes sense here.
>
> ... the macro is provided for Octeon only, and __smp_rmb() is actually a NOP
> there, but I thought to "document" the flow of thoughts from the discussion
> above by including it anyway.

Random discussions on the internet do not absolve you from having to
write coherent comments. Especially so where memory ordering is
concerned.

This, from commit 6b07d38aaa52 ("MIPS: Octeon: Use optimized memory
barrier primitives."):

#define smp_mb__before_llsc() smp_wmb()
#define __smp_mb__before_llsc() __smp_wmb()

is also dodgy as hell and really wants a comment too. I'm not buying the
Changelog of that commit either, __smp_mb__before_llsc should also
ensure the LL cannot happen earlier, but SYNCW has no effect on loads.
So what stops the load from being speculated?


2021-01-28 11:57:20

by Alexander Sverdlin

[permalink] [raw]
Subject: Re: [PATCH 1/6] MIPS: Octeon: Implement __smp_store_release()

Hello Peter,

On 28/01/2021 12:33, Peter Zijlstra wrote:
> This, from commit 6b07d38aaa52 ("MIPS: Octeon: Use optimized memory
> barrier primitives."):
>
> #define smp_mb__before_llsc() smp_wmb()
> #define __smp_mb__before_llsc() __smp_wmb()
>
> is also dodgy as hell and really wants a comment too. I'm not buying the
> Changelog of that commit either, __smp_mb__before_llsc should also
> ensure the LL cannot happen earlier, but SYNCW has no effect on loads.
> So what stops the load from being speculated?

hmm, the commit message you point to above, says:

"Since Octeon does not do speculative reads, this functions as a full barrier."

--
Best regards,
Alexander Sverdlin.

2021-01-28 12:11:53

by Alexander Sverdlin

[permalink] [raw]
Subject: Re: [PATCH 1/6] MIPS: Octeon: Implement __smp_store_release()

Hi!

On 28/01/2021 12:33, Peter Zijlstra wrote:
> On Thu, Jan 28, 2021 at 08:27:29AM +0100, Alexander Sverdlin wrote:
>
>>>> +#define __smp_store_release(p, v) \
>>>> +do { \
>>>> + compiletime_assert_atomic_type(*p); \
>>>> + __smp_wmb(); \
>>>> + __smp_rmb(); \
>>>> + WRITE_ONCE(*p, v); \
>>>> +} while (0)
>>> This is wrong in general since smp_rmb() will only provide order between
>>> two loads and smp_store_release() is a store.
>>>
>>> If this is correct for all MIPS, this needs a giant comment on exactly
>>> how that smp_rmb() makes sense here.
>>
>> ... the macro is provided for Octeon only, and __smp_rmb() is actually a NOP
>> there, but I thought to "document" the flow of thoughts from the discussion
>> above by including it anyway.
>
> Random discussions on the internet do not absolve you from having to
> write coherent comments. Especially so where memory ordering is
> concerned.

I actually hoped you will remember the discussion you've participated 5 years
ago and (in my understanding) actually already agreed that the solution itself
is not broken:

https://lore.kernel.org/lkml/[email protected]/

Could you please just suggest the proper comment you expect to be added here,
because there is no doubts, you have much more experience here than me?

> This, from commit 6b07d38aaa52 ("MIPS: Octeon: Use optimized memory
> barrier primitives."):
>
> #define smp_mb__before_llsc() smp_wmb()
> #define __smp_mb__before_llsc() __smp_wmb()
>
> is also dodgy as hell and really wants a comment too. I'm not buying the
> Changelog of that commit either, __smp_mb__before_llsc should also
> ensure the LL cannot happen earlier, but SYNCW has no effect on loads.
> So what stops the load from being speculated?
>
>

--
Best regards,
Alexander Sverdlin.

2021-01-28 15:03:17

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 1/6] MIPS: Octeon: Implement __smp_store_release()

On Thu, Jan 28, 2021 at 12:52:22PM +0100, Alexander Sverdlin wrote:
> Hello Peter,
>
> On 28/01/2021 12:33, Peter Zijlstra wrote:
> > This, from commit 6b07d38aaa52 ("MIPS: Octeon: Use optimized memory
> > barrier primitives."):
> >
> > #define smp_mb__before_llsc() smp_wmb()
> > #define __smp_mb__before_llsc() __smp_wmb()
> >
> > is also dodgy as hell and really wants a comment too. I'm not buying the
> > Changelog of that commit either, __smp_mb__before_llsc should also
> > ensure the LL cannot happen earlier, but SYNCW has no effect on loads.
> > So what stops the load from being speculated?
>
> hmm, the commit message you point to above, says:
>
> "Since Octeon does not do speculative reads, this functions as a full barrier."

So then the only difference between SYNC and SYNCW is a pipeline drain?

I still worry about the transitivity thing.. ISTR that being a sticky
point back then too.

2021-01-28 15:10:02

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 1/6] MIPS: Octeon: Implement __smp_store_release()

On Thu, Jan 28, 2021 at 01:09:39PM +0100, Alexander Sverdlin wrote:
> On 28/01/2021 12:33, Peter Zijlstra wrote:
> > On Thu, Jan 28, 2021 at 08:27:29AM +0100, Alexander Sverdlin wrote:
> >
> >>>> +#define __smp_store_release(p, v) \
> >>>> +do { \
> >>>> + compiletime_assert_atomic_type(*p); \
> >>>> + __smp_wmb(); \
> >>>> + __smp_rmb(); \
> >>>> + WRITE_ONCE(*p, v); \
> >>>> +} while (0)

> I actually hoped you will remember the discussion you've participated 5 years
> ago and (in my understanding) actually already agreed that the solution itself
> is not broken:
>
> https://lore.kernel.org/lkml/[email protected]/

My memory really isn't that good. I can barely remember what I did 5
weeks ago, 5 years ago might as well have never happened.

> Could you please just suggest the proper comment you expect to be added here,
> because there is no doubts, you have much more experience here than me?

So for store_release I'm not too worried, and provided no read
speculation, wmb is indeed sufficient. This is because our store_release
is RCpc.

Something like:

/*
* Because Octeon does not do read speculation, an smp_wmb()
* is sufficient to ensure {load,store}->{store} order.
*/
#define __smp_store_release(p, v) \
do { \
compiletime_assert_atomic_type(*p); \
__smp_wmb(); \
WRITE_ONCE(*p, v); \
} while (0)

2021-01-28 15:18:01

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 1/6] MIPS: Octeon: Implement __smp_store_release()

On Thu, Jan 28, 2021 at 03:57:58PM +0100, Peter Zijlstra wrote:
> On Thu, Jan 28, 2021 at 12:52:22PM +0100, Alexander Sverdlin wrote:
> > Hello Peter,
> >
> > On 28/01/2021 12:33, Peter Zijlstra wrote:
> > > This, from commit 6b07d38aaa52 ("MIPS: Octeon: Use optimized memory
> > > barrier primitives."):
> > >
> > > #define smp_mb__before_llsc() smp_wmb()
> > > #define __smp_mb__before_llsc() __smp_wmb()
> > >
> > > is also dodgy as hell and really wants a comment too. I'm not buying the
> > > Changelog of that commit either, __smp_mb__before_llsc should also
> > > ensure the LL cannot happen earlier, but SYNCW has no effect on loads.
> > > So what stops the load from being speculated?
> >
> > hmm, the commit message you point to above, says:
> >
> > "Since Octeon does not do speculative reads, this functions as a full barrier."
>
> So then the only difference between SYNC and SYNCW is a pipeline drain?
>
> I still worry about the transitivity thing.. ISTR that being a sticky
> point back then too.

Ah, there we are, it's called multi-copy-atomic these days:

f1ab25a30ce8 ("memory-barriers: Replace uses of "transitive"")

Do those SYNCW / write-completion barriers guarantee this?