2023-02-22 06:01:43

by Kautuk Consul

[permalink] [raw]
Subject: [PATCH] arch/powerpc/include/asm/barrier.h: redefine rmb and wmb to lwsync

A link from ibm.com states:
"Ensures that all instructions preceding the call to __lwsync
complete before any subsequent store instructions can be executed
on the processor that executed the function. Also, it ensures that
all load instructions preceding the call to __lwsync complete before
any subsequent load instructions can be executed on the processor
that executed the function. This allows you to synchronize between
multiple processors with minimal performance impact, as __lwsync
does not wait for confirmation from each processor."

Thats why smp_rmb() and smp_wmb() are defined to lwsync.
But this same understanding applies to parallel pipeline
execution on each PowerPC processor.
So, use the lwsync instruction for rmb() and wmb() on the PPC
architectures that support it.

Also removed some useless spaces.

Signed-off-by: Kautuk Consul <[email protected]>
---
arch/powerpc/include/asm/barrier.h | 12 +++++++++---
1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/barrier.h b/arch/powerpc/include/asm/barrier.h
index e80b2c0e9315..553f5a5d20bd 100644
--- a/arch/powerpc/include/asm/barrier.h
+++ b/arch/powerpc/include/asm/barrier.h
@@ -41,11 +41,17 @@

/* The sub-arch has lwsync */
#if defined(CONFIG_PPC64) || defined(CONFIG_PPC_E500MC)
-# define SMPWMB LWSYNC
+#undef rmb
+#undef wmb
+/* Redefine rmb() to lwsync. */
+#define rmb() ({__asm__ __volatile__ ("lwsync" : : : "memory"); })
+/* Redefine wmb() to lwsync. */
+#define wmb() ({__asm__ __volatile__ ("lwsync" : : : "memory"); })
+#define SMPWMB LWSYNC
#elif defined(CONFIG_BOOKE)
-# define SMPWMB mbar
+#define SMPWMB mbar
#else
-# define SMPWMB eieio
+#define SMPWMB eieio
#endif

/* clang defines this macro for a builtin, which will not work with runtime patching */
--
2.31.1



2023-02-22 06:05:37

by Kautuk Consul

[permalink] [raw]
Subject: Re: [PATCH] arch/powerpc/include/asm/barrier.h: redefine rmb and wmb to lwsync

Hi All,

On Wed, Feb 22, 2023 at 11:31:07AM +0530, Kautuk Consul wrote:
> /* The sub-arch has lwsync */
> #if defined(CONFIG_PPC64) || defined(CONFIG_PPC_E500MC)
> -# define SMPWMB LWSYNC
> +#undef rmb
> +#undef wmb
> +/* Redefine rmb() to lwsync. */
> +#define rmb() ({__asm__ __volatile__ ("lwsync" : : : "memory"); })
> +/* Redefine wmb() to lwsync. */
> +#define wmb() ({__asm__ __volatile__ ("lwsync" : : : "memory"); })
> +#define SMPWMB LWSYNC
> #elif defined(CONFIG_BOOKE)
> -# define SMPWMB mbar
> +#define SMPWMB mbar
> #else
> -# define SMPWMB eieio
> +#define SMPWMB eieio
> #endif

I think I am conceptually right about this patch but I lack the
resources currently to tets this out on PowerPC 64 bit servers.

I request IBM/Non-IBM employees to test this patch out for:
a) functionality breaking. This patch is no good if this breaks current
kernel functionality.
b) performance impact. If functionality doesn't break, can anyone do
some reliable kernel load testing on ppc64 servers to see if there
is any significant performance gain ?

Thanks a lot!

2023-02-22 07:02:43

by Christophe Leroy

[permalink] [raw]
Subject: Re: [PATCH] arch/powerpc/include/asm/barrier.h: redefine rmb and wmb to lwsync



Le 22/02/2023 à 07:01, Kautuk Consul a écrit :
> A link from ibm.com states:
> "Ensures that all instructions preceding the call to __lwsync
> complete before any subsequent store instructions can be executed
> on the processor that executed the function. Also, it ensures that
> all load instructions preceding the call to __lwsync complete before
> any subsequent load instructions can be executed on the processor
> that executed the function. This allows you to synchronize between
> multiple processors with minimal performance impact, as __lwsync
> does not wait for confirmation from each processor."
>
> Thats why smp_rmb() and smp_wmb() are defined to lwsync.
> But this same understanding applies to parallel pipeline
> execution on each PowerPC processor.
> So, use the lwsync instruction for rmb() and wmb() on the PPC
> architectures that support it.
>
> Also removed some useless spaces.
>
> Signed-off-by: Kautuk Consul <[email protected]>
> ---
> arch/powerpc/include/asm/barrier.h | 12 +++++++++---
> 1 file changed, 9 insertions(+), 3 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/barrier.h b/arch/powerpc/include/asm/barrier.h
> index e80b2c0e9315..553f5a5d20bd 100644
> --- a/arch/powerpc/include/asm/barrier.h
> +++ b/arch/powerpc/include/asm/barrier.h
> @@ -41,11 +41,17 @@
>
> /* The sub-arch has lwsync */
> #if defined(CONFIG_PPC64) || defined(CONFIG_PPC_E500MC)
> -# define SMPWMB LWSYNC

This line shouldn't be changed by your patch

> +#undef rmb
> +#undef wmb

I see no point with defining something and undefining them a few lines
later.

Instead, why not do:

#define mb() __asm__ __volatile__ ("sync" : : : "memory")

#if defined(CONFIG_PPC64) || defined(CONFIG_PPC_E500MC)
#define rmb() ({__asm__ __volatile__ ("lwsync" : : : "memory"); })
#define wmb() ({__asm__ __volatile__ ("lwsync" : : : "memory"); })
#else
#define rmb() __asm__ __volatile__ ("sync" : : : "memory")
#define wmb() __asm__ __volatile__ ("sync" : : : "memory")
#endif

By the way, why put it inside a ({ }) ?
And I think nowdays we use "asm volatile" not "__asm__ __volatile__"

Shouldn't you also consider the same for mb() ?

Note that your series will conflict with b6e259297a6b ("powerpc/kcsan:
Memory barriers semantics") in powerpc/next tree.

> +/* Redefine rmb() to lwsync. */

WHat's the added value of this comment ? Isn't it obvious in the line
below that rmb() is being defined to lwsync ? Please avoid useless comments.

> +#define rmb() ({__asm__ __volatile__ ("lwsync" : : : "memory"); })
> +/* Redefine wmb() to lwsync. */
> +#define wmb() ({__asm__ __volatile__ ("lwsync" : : : "memory"); })
> +#define SMPWMB LWSYNC
> #elif defined(CONFIG_BOOKE)
> -# define SMPWMB mbar

This line shouldn't be changed by your patch

> +#define SMPWMB mbar
> #else
> -# define SMPWMB eieio

This line shouldn't be changed by your patch

> +#define SMPWMB eieio
> #endif
>
> /* clang defines this macro for a builtin, which will not work with runtime patching */

2023-02-22 08:17:11

by Kautuk Consul

[permalink] [raw]
Subject: Re: [PATCH] arch/powerpc/include/asm/barrier.h: redefine rmb and wmb to lwsync

On Wed, Feb 22, 2023 at 07:02:34AM +0000, Christophe Leroy wrote:
>
>
> Le 22/02/2023 ? 07:01, Kautuk Consul a ?crit?:
> > A link from ibm.com states:
> > "Ensures that all instructions preceding the call to __lwsync
> > complete before any subsequent store instructions can be executed
> > on the processor that executed the function. Also, it ensures that
> > all load instructions preceding the call to __lwsync complete before
> > any subsequent load instructions can be executed on the processor
> > that executed the function. This allows you to synchronize between
> > multiple processors with minimal performance impact, as __lwsync
> > does not wait for confirmation from each processor."
> >
> > Thats why smp_rmb() and smp_wmb() are defined to lwsync.
> > But this same understanding applies to parallel pipeline
> > execution on each PowerPC processor.
> > So, use the lwsync instruction for rmb() and wmb() on the PPC
> > architectures that support it.
> >
> > Also removed some useless spaces.
> >
> > Signed-off-by: Kautuk Consul <[email protected]>
> > ---
> > arch/powerpc/include/asm/barrier.h | 12 +++++++++---
> > 1 file changed, 9 insertions(+), 3 deletions(-)
> >
> > diff --git a/arch/powerpc/include/asm/barrier.h b/arch/powerpc/include/asm/barrier.h
> > index e80b2c0e9315..553f5a5d20bd 100644
> > --- a/arch/powerpc/include/asm/barrier.h
> > +++ b/arch/powerpc/include/asm/barrier.h
> > @@ -41,11 +41,17 @@
> >
> > /* The sub-arch has lwsync */
> > #if defined(CONFIG_PPC64) || defined(CONFIG_PPC_E500MC)
> > -# define SMPWMB LWSYNC
>
> This line shouldn't be changed by your patch
I mentioned it in the commit message.
But if you want I'll remove this. Did this because the rest
of the file doesn't have these spaces.
>
> > +#undef rmb
> > +#undef wmb
>
> I see no point with defining something and undefining them a few lines
> later.
>
> Instead, why not do:
>
> #define mb() __asm__ __volatile__ ("sync" : : : "memory")
>
> #if defined(CONFIG_PPC64) || defined(CONFIG_PPC_E500MC)
> #define rmb() ({__asm__ __volatile__ ("lwsync" : : : "memory"); })
> #define wmb() ({__asm__ __volatile__ ("lwsync" : : : "memory"); })
> #else
> #define rmb() __asm__ __volatile__ ("sync" : : : "memory")
> #define wmb() __asm__ __volatile__ ("sync" : : : "memory")
> #endif
>
I thought of doing that earlier, but there exists one more #elif
for CONFIG_BOOKE and then the #else.
That way we would have to put 3 different #defines for rmb and wmb
and I wanted to avoid that.
> By the way, why put it inside a ({ }) ?
checkpatch.pl asks for ({}).
> And I think nowdays we use "asm volatile" not "__asm__ __volatile__"
I was just following what was there in the file already.
Can change this if required.
>
> Shouldn't you also consider the same for mb() ?
My change wasn't meant to address newer usages of as volatile
#defines. I just wanted to redefine the rmb and wmb #defines
to lwsync.
>
> Note that your series will conflict with b6e259297a6b ("powerpc/kcsan:
> Memory barriers semantics") in powerpc/next tree.
I thought of defining the __rmb and __wmb macros but I decided against
it because I didn't understand kcsan completely.
I used the standard Linus' tree, not powerpc/next.
Can you tell me which branch or git repo I should make this patch on ?
>
> > +/* Redefine rmb() to lwsync. */
>
> WHat's the added value of this comment ? Isn't it obvious in the line
> below that rmb() is being defined to lwsync ? Please avoid useless comments.
Sure.
>
> > +#define rmb() ({__asm__ __volatile__ ("lwsync" : : : "memory"); })
> > +/* Redefine wmb() to lwsync. */
> > +#define wmb() ({__asm__ __volatile__ ("lwsync" : : : "memory"); })
> > +#define SMPWMB LWSYNC
> > #elif defined(CONFIG_BOOKE)
> > -# define SMPWMB mbar
>
> This line shouldn't be changed by your patch
>
> > +#define SMPWMB mbar
> > #else
> > -# define SMPWMB eieio
Ok. Can change my patch.
>
> This line shouldn't be changed by your patch
>
> > +#define SMPWMB eieio
> > #endif
Sure. Will retain this too.
> >
> > /* clang defines this macro for a builtin, which will not work with runtime patching */

2023-02-22 08:21:38

by Kautuk Consul

[permalink] [raw]
Subject: Re: [PATCH] arch/powerpc/include/asm/barrier.h: redefine rmb and wmb to lwsync

> On Wed, Feb 22, 2023 at 07:02:34AM +0000, Christophe Leroy wrote:
> > > +/* Redefine rmb() to lwsync. */
> >
> > WHat's the added value of this comment ? Isn't it obvious in the line
> > below that rmb() is being defined to lwsync ? Please avoid useless comments.
> Sure.
Sorry, forgot to add that I wasn't adding this useless comment.
Its just that checkpatch.pl complains that the memory barrier #define
doesn't have a comment for it.
> >

2023-02-22 08:28:29

by Christophe Leroy

[permalink] [raw]
Subject: Re: [PATCH] arch/powerpc/include/asm/barrier.h: redefine rmb and wmb to lwsync



Le 22/02/2023 à 09:21, Kautuk Consul a écrit :
>> On Wed, Feb 22, 2023 at 07:02:34AM +0000, Christophe Leroy wrote:
>>>> +/* Redefine rmb() to lwsync. */
>>>
>>> WHat's the added value of this comment ? Isn't it obvious in the line
>>> below that rmb() is being defined to lwsync ? Please avoid useless comments.
>> Sure.
> Sorry, forgot to add that I wasn't adding this useless comment.
> Its just that checkpatch.pl complains that the memory barrier #define
> doesn't have a comment for it.
>>>

See https://docs.kernel.org/dev-tools/checkpatch.html, it says:

Checkpatch is not always right. Your judgement takes precedence over
checkpatch messages. If your code looks better with the violations, then
its probably best left alone.

checkpatch wants a comment for uses of memory barriers. Here I think it
is a false positive.

2023-02-22 08:34:52

by Kautuk Consul

[permalink] [raw]
Subject: Re: [PATCH] arch/powerpc/include/asm/barrier.h: redefine rmb and wmb to lwsync

On Wed, Feb 22, 2023 at 08:28:19AM +0000, Christophe Leroy wrote:
>
>
> Le 22/02/2023 ? 09:21, Kautuk Consul a ?crit?:
> >> On Wed, Feb 22, 2023 at 07:02:34AM +0000, Christophe Leroy wrote:
> >>>> +/* Redefine rmb() to lwsync. */
> >>>
> >>> WHat's the added value of this comment ? Isn't it obvious in the line
> >>> below that rmb() is being defined to lwsync ? Please avoid useless comments.
> >> Sure.
> > Sorry, forgot to add that I wasn't adding this useless comment.
> > Its just that checkpatch.pl complains that the memory barrier #define
> > doesn't have a comment for it.
> >>>
>
> See https://docs.kernel.org/dev-tools/checkpatch.html, it says:
>
> Checkpatch is not always right. Your judgement takes precedence over
> checkpatch messages. If your code looks better with the violations, then
> its probably best left alone.
>
> checkpatch wants a comment for uses of memory barriers. Here I think it
> is a false positive.
Cool. I will make the changes you mentioned.
Can you tell me which branch or git repo I should re-make this patch on ?

2023-02-22 08:40:55

by Christophe Leroy

[permalink] [raw]
Subject: Re: [PATCH] arch/powerpc/include/asm/barrier.h: redefine rmb and wmb to lwsync



Le 22/02/2023 à 09:16, Kautuk Consul a écrit :
> On Wed, Feb 22, 2023 at 07:02:34AM +0000, Christophe Leroy wrote:
>>
>>
>> Le 22/02/2023 à 07:01, Kautuk Consul a écrit :
>>> A link from ibm.com states:
>>> "Ensures that all instructions preceding the call to __lwsync
>>> complete before any subsequent store instructions can be executed
>>> on the processor that executed the function. Also, it ensures that
>>> all load instructions preceding the call to __lwsync complete before
>>> any subsequent load instructions can be executed on the processor
>>> that executed the function. This allows you to synchronize between
>>> multiple processors with minimal performance impact, as __lwsync
>>> does not wait for confirmation from each processor."
>>>
>>> Thats why smp_rmb() and smp_wmb() are defined to lwsync.
>>> But this same understanding applies to parallel pipeline
>>> execution on each PowerPC processor.
>>> So, use the lwsync instruction for rmb() and wmb() on the PPC
>>> architectures that support it.
>>>
>>> Also removed some useless spaces.
>>>
>>> Signed-off-by: Kautuk Consul <[email protected]>
>>> ---
>>> arch/powerpc/include/asm/barrier.h | 12 +++++++++---
>>> 1 file changed, 9 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/arch/powerpc/include/asm/barrier.h b/arch/powerpc/include/asm/barrier.h
>>> index e80b2c0e9315..553f5a5d20bd 100644
>>> --- a/arch/powerpc/include/asm/barrier.h
>>> +++ b/arch/powerpc/include/asm/barrier.h
>>> @@ -41,11 +41,17 @@
>>>
>>> /* The sub-arch has lwsync */
>>> #if defined(CONFIG_PPC64) || defined(CONFIG_PPC_E500MC)
>>> -# define SMPWMB LWSYNC
>>
>> This line shouldn't be changed by your patch
> I mentioned it in the commit message.
> But if you want I'll remove this. Did this because the rest
> of the file doesn't have these spaces.
>>
>>> +#undef rmb
>>> +#undef wmb
>>
>> I see no point with defining something and undefining them a few lines
>> later.
>>
>> Instead, why not do:
>>
>> #define mb() __asm__ __volatile__ ("sync" : : : "memory")
>>
>> #if defined(CONFIG_PPC64) || defined(CONFIG_PPC_E500MC)
>> #define rmb() ({__asm__ __volatile__ ("lwsync" : : : "memory"); })
>> #define wmb() ({__asm__ __volatile__ ("lwsync" : : : "memory"); })
>> #else
>> #define rmb() __asm__ __volatile__ ("sync" : : : "memory")
>> #define wmb() __asm__ __volatile__ ("sync" : : : "memory")
>> #endif
>>
> I thought of doing that earlier, but there exists one more #elif
> for CONFIG_BOOKE and then the #else.
> That way we would have to put 3 different #defines for rmb and wmb
> and I wanted to avoid that.

No, I don't mean to use the existing #ifdef/elif/else.

Define an #ifdef /#else dedicated to xmb macros.

Something like that:

@@ -35,9 +35,15 @@
* However, on CPUs that don't support lwsync, lwsync actually maps to a
* heavy-weight sync, so smp_wmb() can be a lighter-weight eieio.
*/
+#if defined(CONFIG_PPC64) || defined(CONFIG_PPC_E500MC)
+#define __mb() asm volatile ("lwsync" : : : "memory")
+#define __rmb() asm volatile ("lwsync" : : : "memory")
+#define __wmb() asm volatile ("lwsync" : : : "memory")
+#else
#define __mb() __asm__ __volatile__ ("sync" : : : "memory")
#define __rmb() __asm__ __volatile__ ("sync" : : : "memory")
#define __wmb() __asm__ __volatile__ ("sync" : : : "memory")
+#endif

/* The sub-arch has lwsync */
#if defined(CONFIG_PPC64) || defined(CONFIG_PPC_E500MC)



>> By the way, why put it inside a ({ }) ?
> checkpatch.pl asks for ({}).
>> And I think nowdays we use "asm volatile" not "__asm__ __volatile__"
> I was just following what was there in the file already.
> Can change this if required.
>>
>> Shouldn't you also consider the same for mb() ?
> My change wasn't meant to address newer usages of as volatile
> #defines. I just wanted to redefine the rmb and wmb #defines
> to lwsync.

That's my point, why not also redefine mb to lwsync ?

>>
>> Note that your series will conflict with b6e259297a6b ("powerpc/kcsan:
>> Memory barriers semantics") in powerpc/next tree.
> I thought of defining the __rmb and __wmb macros but I decided against
> it because I didn't understand kcsan completely.
> I used the standard Linus' tree, not powerpc/next.
> Can you tell me which branch or git repo I should make this patch on ?

https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git

'merge' branch is a merge of branches 'master', 'fixes' and 'next'.

That's the branch to use, allthough it is not always in sync with fixes
and next, in that case all you have to do is to locally merge 'next' and
'fixes' branch until it's done remotely.

But just using 'next' branch also works most of the time.

Note that 'next' branch should already be part of linux-next so you may
also use linux-next if you prefer.

>>
>>> +/* Redefine rmb() to lwsync. */
>>
>> WHat's the added value of this comment ? Isn't it obvious in the line
>> below that rmb() is being defined to lwsync ? Please avoid useless comments.
> Sure.
>>
>>> +#define rmb() ({__asm__ __volatile__ ("lwsync" : : : "memory"); })
>>> +/* Redefine wmb() to lwsync. */
>>> +#define wmb() ({__asm__ __volatile__ ("lwsync" : : : "memory"); })
>>> +#define SMPWMB LWSYNC
>>> #elif defined(CONFIG_BOOKE)
>>> -# define SMPWMB mbar
>>
>> This line shouldn't be changed by your patch
>>
>>> +#define SMPWMB mbar
>>> #else
>>> -# define SMPWMB eieio
> Ok. Can change my patch.
>>
>> This line shouldn't be changed by your patch
>>
>>> +#define SMPWMB eieio
>>> #endif
> Sure. Will retain this too.
>>>
>>> /* clang defines this macro for a builtin, which will not work with runtime patching */

2023-02-22 08:47:38

by Kautuk Consul

[permalink] [raw]
Subject: Re: [PATCH] arch/powerpc/include/asm/barrier.h: redefine rmb and wmb to lwsync


> No, I don't mean to use the existing #ifdef/elif/else.
>
> Define an #ifdef /#else dedicated to xmb macros.
>
> Something like that:
>
> @@ -35,9 +35,15 @@
> * However, on CPUs that don't support lwsync, lwsync actually maps to a
> * heavy-weight sync, so smp_wmb() can be a lighter-weight eieio.
> */
> +#if defined(CONFIG_PPC64) || defined(CONFIG_PPC_E500MC)
> +#define __mb() asm volatile ("lwsync" : : : "memory")
> +#define __rmb() asm volatile ("lwsync" : : : "memory")
> +#define __wmb() asm volatile ("lwsync" : : : "memory")
> +#else
> #define __mb() __asm__ __volatile__ ("sync" : : : "memory")
> #define __rmb() __asm__ __volatile__ ("sync" : : : "memory")
> #define __wmb() __asm__ __volatile__ ("sync" : : : "memory")
> +#endif
Ok. Got it. Will do.

> >> Shouldn't you also consider the same for mb() ?
> > My change wasn't meant to address newer usages of as volatile
> > #defines. I just wanted to redefine the rmb and wmb #defines
> > to lwsync.
>
> That's my point, why not also redefine mb to lwsync ?
That would be incorrect. lwsync will only work for one: load or store.
mb() is meant for barriering both loads as well as stores so the sync
instruction is correct for that one.
>
> >>
> >> Note that your series will conflict with b6e259297a6b ("powerpc/kcsan:
> >> Memory barriers semantics") in powerpc/next tree.
> > I thought of defining the __rmb and __wmb macros but I decided against
> > it because I didn't understand kcsan completely.
> > I used the standard Linus' tree, not powerpc/next.
> > Can you tell me which branch or git repo I should make this patch on ?
>
> https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git
>
> 'merge' branch is a merge of branches 'master', 'fixes' and 'next'.
>
> That's the branch to use, allthough it is not always in sync with fixes
> and next, in that case all you have to do is to locally merge 'next' and
> 'fixes' branch until it's done remotely.
>
> But just using 'next' branch also works most of the time.
>
> Note that 'next' branch should already be part of linux-next so you may
> also use linux-next if you prefer.
>
Will send another patch on this.
Thanks. Will use linux-next branch on this git repo.