On Wed, Jan 27, 2021 at 09:36:24PM +0100, Alexander A Sverdlin wrote:
> From: Alexander Sverdlin <[email protected]>
>
> Flushing the write buffer brings aroung 10% performace on the tight
> uncontended spinlock loops on Octeon. Refer to commit 500c2e1fdbcc
> ("MIPS: Optimize spinlocks.").
No objection to the patch, but I don't find the above referenced commit
to be enlightening wrt nudge_writes(). The best it has to offer is the
comment that's already in the code.
> Signed-off-by: Alexander Sverdlin <[email protected]>
> ---
> arch/mips/include/asm/spinlock.h | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/arch/mips/include/asm/spinlock.h b/arch/mips/include/asm/spinlock.h
> index 8a88eb2..0a707f3 100644
> --- a/arch/mips/include/asm/spinlock.h
> +++ b/arch/mips/include/asm/spinlock.h
> @@ -24,6 +24,9 @@ static inline void queued_spin_unlock(struct qspinlock *lock)
> /* This could be optimised with ARCH_HAS_MMIOWB */
> mmiowb();
> smp_store_release(&lock->locked, 0);
> +#ifdef CONFIG_CPU_CAVIUM_OCTEON
> + nudge_writes();
> +#endif
> }
>
> #include <asm/qspinlock.h>
> --
> 2.10.2
>
Hi!
On 27/01/2021 23:34, Peter Zijlstra wrote:
> On Wed, Jan 27, 2021 at 09:36:24PM +0100, Alexander A Sverdlin wrote:
>> From: Alexander Sverdlin <[email protected]>
>>
>> Flushing the write buffer brings aroung 10% performace on the tight
>> uncontended spinlock loops on Octeon. Refer to commit 500c2e1fdbcc
>> ("MIPS: Optimize spinlocks.").
> No objection to the patch, but I don't find the above referenced commit
> to be enlightening wrt nudge_writes(). The best it has to offer is the
> comment that's already in the code.
My point was that original MIPS spinlocks had this write-buffer-flush and
it got lost on the conversion to qspinlocks. The referenced commit just
allows to see the last MIPS-specific implementation before deletion.
>> Signed-off-by: Alexander Sverdlin <[email protected]>
>> ---
>> arch/mips/include/asm/spinlock.h | 3 +++
>> 1 file changed, 3 insertions(+)
>>
>> diff --git a/arch/mips/include/asm/spinlock.h b/arch/mips/include/asm/spinlock.h
>> index 8a88eb2..0a707f3 100644
>> --- a/arch/mips/include/asm/spinlock.h
>> +++ b/arch/mips/include/asm/spinlock.h
>> @@ -24,6 +24,9 @@ static inline void queued_spin_unlock(struct qspinlock *lock)
>> /* This could be optimised with ARCH_HAS_MMIOWB */
>> mmiowb();
>> smp_store_release(&lock->locked, 0);
>> +#ifdef CONFIG_CPU_CAVIUM_OCTEON
>> + nudge_writes();
>> +#endif
>> }
>>
>> #include <asm/qspinlock.h>
--
Best regards,
Alexander Sverdlin.
On Thu, Jan 28, 2021 at 08:29:57AM +0100, Alexander Sverdlin wrote:
> Hi!
>
> On 27/01/2021 23:34, Peter Zijlstra wrote:
> > On Wed, Jan 27, 2021 at 09:36:24PM +0100, Alexander A Sverdlin wrote:
> >> From: Alexander Sverdlin <[email protected]>
> >>
> >> Flushing the write buffer brings aroung 10% performace on the tight
> >> uncontended spinlock loops on Octeon. Refer to commit 500c2e1fdbcc
> >> ("MIPS: Optimize spinlocks.").
> > No objection to the patch, but I don't find the above referenced commit
> > to be enlightening wrt nudge_writes(). The best it has to offer is the
> > comment that's already in the code.
>
> My point was that original MIPS spinlocks had this write-buffer-flush and
> it got lost on the conversion to qspinlocks. The referenced commit just
> allows to see the last MIPS-specific implementation before deletion.
Hardware that needs a store-buffer flush after release is highly suspect
and needs big and explicit comments. Not vague hints.
Hi!
On 28/01/2021 12:35, Peter Zijlstra wrote:
>> My point was that original MIPS spinlocks had this write-buffer-flush and
>> it got lost on the conversion to qspinlocks. The referenced commit just
>> allows to see the last MIPS-specific implementation before deletion.
> Hardware that needs a store-buffer flush after release is highly suspect
> and needs big and explicit comments. Not vague hints.
I have a feeling that you are not going to suggest the comments for the code
and one has to guess what is it you have in mind?
Do you think the proper approach would be to undelete MIPS spinlocks and
make these broken qspinlocks a configurable option for MIPS? I don't even
mind if they will be default option for those not interested in performance
or latency.
--
Best regards,
Alexander Sverdlin.
On Thu, Jan 28, 2021 at 01:13:03PM +0100, Alexander Sverdlin wrote:
> Hi!
>
> On 28/01/2021 12:35, Peter Zijlstra wrote:
> >> My point was that original MIPS spinlocks had this write-buffer-flush and
> >> it got lost on the conversion to qspinlocks. The referenced commit just
> >> allows to see the last MIPS-specific implementation before deletion.
> > Hardware that needs a store-buffer flush after release is highly suspect
> > and needs big and explicit comments. Not vague hints.
>
> I have a feeling that you are not going to suggest the comments for the code
> and one has to guess what is it you have in mind?
I've no insight in the specific microarch that causes this weirdness, so
it's very hard for me to suggest something here.
Find inspiration in the loongson commit.
> Do you think the proper approach would be to undelete MIPS spinlocks and
> make these broken qspinlocks a configurable option for MIPS? I don't even
> mind if they will be default option for those not interested in performance
> or latency.
qspinlock really isn't the only generic code that relies on this. I
would seriously consider doing the loongson-v3 thing, possibly also
adding that nudge_writes() thing to your smp_store_release(), you
already have it in __clear_bit_unlock().
It would then look something like:
/*
* Octeon is special; it does not do read speculation, therefore an
* smp_wmb() is sufficient to generate {load,store}->{store} order
* required for RELEASE. It however has store-buffer weirdness
* that requires an additional smp_wmb() (which is a completion barrier
* for them) to flush the store-buffer, otherwise visibility of the
* store can be arbitrarily delayed, also see __SYNC_loongson3_war.
*/
#define __smp_store_release(p, v) \
do { \
compiletime_assert_atomic_type(*p); \
__smp_wmb(); \
WRITE_ONCE(*p, v); \
__smp_wmb(); \
} while (0)
/*
* Octeon also likes to retain stores, see __SYNC_loongson3_war.
*/
#define cpu_relax() __smp_wmb();
Or something...