From: Guo Ren <[email protected]>
When cmpxchg failed, no memory barrier was needed to take. Only
when cmpxchg success and the new value is written, then the memory
barriers needed.
- cmpxchg_asm: Unnecessary __WEAK_LLSC_WB for the fail path would
reduce the performance of the cmpxchg loop trying.
- cmpxchg_small: Miss an necessary __WEAK_LLSC_WB when sc succeeds.
It's a bug because there is no memory synchronization
when sc succeeds.
Signed-off-by: Guo Ren <[email protected]>
Signed-off-by: Guo Ren <[email protected]>
---
arch/loongarch/include/asm/cmpxchg.h | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/arch/loongarch/include/asm/cmpxchg.h b/arch/loongarch/include/asm/cmpxchg.h
index 979fde61bba8..6a05b92814b6 100644
--- a/arch/loongarch/include/asm/cmpxchg.h
+++ b/arch/loongarch/include/asm/cmpxchg.h
@@ -102,8 +102,8 @@ __arch_xchg(volatile void *ptr, unsigned long x, int size)
" move $t0, %z4 \n" \
" " st " $t0, %1 \n" \
" beqz $t0, 1b \n" \
- "2: \n" \
__WEAK_LLSC_MB \
+ "2: \n" \
: "=&r" (__ret), "=ZB"(*m) \
: "ZB"(*m), "Jr" (old), "Jr" (new) \
: "t0", "memory"); \
@@ -148,10 +148,8 @@ static inline unsigned int __cmpxchg_small(volatile void *ptr, unsigned int old,
" or %1, %1, %z6 \n"
" sc.w %1, %2 \n"
" beqz %1, 1b \n"
- " b 3f \n"
- "2: \n"
__WEAK_LLSC_MB
- "3: \n"
+ "2: \n"
: "=&r" (old32), "=&r" (temp), "=ZC" (*ptr32)
: "ZC" (*ptr32), "Jr" (mask), "Jr" (old), "Jr" (new)
: "memory");
--
2.36.1
Hello,
On Tue, Aug 1, 2023 at 9:16 AM <[email protected]> wrote:
> diff --git a/arch/loongarch/include/asm/cmpxchg.h b/arch/loongarch/include/asm/cmpxchg.h
> index 979fde61bba8..6a05b92814b6 100644
> --- a/arch/loongarch/include/asm/cmpxchg.h
> +++ b/arch/loongarch/include/asm/cmpxchg.h
> @@ -102,8 +102,8 @@ __arch_xchg(volatile void *ptr, unsigned long x, int size)
> " move $t0, %z4 \n" \
> " " st " $t0, %1 \n" \
> " beqz $t0, 1b \n" \
> - "2: \n" \
> __WEAK_LLSC_MB \
> + "2: \n" \
Thanks for the patch.
This would look pretty good if it weren't for the special memory
barrier semantics of the LoongArch's LL and SC instructions.
The LL/SC memory barrier behavior of LoongArch:
* LL: <memory-barrier> + <load-exclusive>
* SC: <store-conditional> + <memory-barrier>
and the LoongArch's weak memory model allows load/load reorder for the
same address.
So, the __WEAK_LLSC_MB[1] is used to prevent load/load reorder and no
explicit barrier instruction is required after SC.
[1] https://lore.kernel.org/loongarch/[email protected]/
Regards,
--
WANG Rui
On Mon, 2023-07-31 at 21:15 -0400, [email protected] wrote:
> From: Guo Ren <[email protected]>
>
> When cmpxchg failed, no memory barrier was needed to take. Only
> when cmpxchg success and the new value is written, then the memory
> barriers needed.
>
> - cmpxchg_asm: Unnecessary __WEAK_LLSC_WB for the fail path would
> reduce the performance of the cmpxchg loop trying.
I'm not an expert in memory models, but in practice this barrier is
really needed or cmpxchg will be "not atomic". See
https://lore.kernel.org/linux-mips/[email protected]/.
> - cmpxchg_small: Miss an necessary __WEAK_LLSC_WB when sc succeeds.
> It's a bug because there is no memory synchronization
> when sc succeeds.
>
> Signed-off-by: Guo Ren <[email protected]>
> Signed-off-by: Guo Ren <[email protected]>
> ---
> arch/loongarch/include/asm/cmpxchg.h | 6 ++----
> 1 file changed, 2 insertions(+), 4 deletions(-)
>
> diff --git a/arch/loongarch/include/asm/cmpxchg.h b/arch/loongarch/include/asm/cmpxchg.h
> index 979fde61bba8..6a05b92814b6 100644
> --- a/arch/loongarch/include/asm/cmpxchg.h
> +++ b/arch/loongarch/include/asm/cmpxchg.h
> @@ -102,8 +102,8 @@ __arch_xchg(volatile void *ptr, unsigned long x, int size)
> " move $t0, %z4 \n" \
> " " st " $t0, %1 \n" \
> " beqz $t0, 1b \n" \
> - "2: \n" \
> __WEAK_LLSC_MB \
> + "2: \n" \
> : "=&r" (__ret), "=ZB"(*m) \
> : "ZB"(*m), "Jr" (old), "Jr" (new) \
> : "t0", "memory"); \
> @@ -148,10 +148,8 @@ static inline unsigned int __cmpxchg_small(volatile void *ptr, unsigned int old,
> " or %1, %1, %z6 \n"
> " sc.w %1, %2 \n"
> " beqz %1, 1b \n"
> - " b 3f \n"
> - "2: \n"
> __WEAK_LLSC_MB
> - "3: \n"
> + "2: \n"
> : "=&r" (old32), "=&r" (temp), "=ZC" (*ptr32)
> : "ZC" (*ptr32), "Jr" (mask), "Jr" (old), "Jr" (new)
> : "memory");
--
Xi Ruoyao <[email protected]>
School of Aerospace Science and Technology, Xidian University
On Tue, Aug 1, 2023 at 10:29 AM WANG Rui <[email protected]> wrote:
>
> Hello,
>
> On Tue, Aug 1, 2023 at 9:16 AM <[email protected]> wrote:
> > diff --git a/arch/loongarch/include/asm/cmpxchg.h b/arch/loongarch/include/asm/cmpxchg.h
> > index 979fde61bba8..6a05b92814b6 100644
> > --- a/arch/loongarch/include/asm/cmpxchg.h
> > +++ b/arch/loongarch/include/asm/cmpxchg.h
> > @@ -102,8 +102,8 @@ __arch_xchg(volatile void *ptr, unsigned long x, int size)
> > " move $t0, %z4 \n" \
> > " " st " $t0, %1 \n" \
> > " beqz $t0, 1b \n" \
> > - "2: \n" \
> > __WEAK_LLSC_MB \
> > + "2: \n" \
>
> Thanks for the patch.
>
> This would look pretty good if it weren't for the special memory
> barrier semantics of the LoongArch's LL and SC instructions.
>
> The LL/SC memory barrier behavior of LoongArch:
>
> * LL: <memory-barrier> + <load-exclusive>
> * SC: <store-conditional> + <memory-barrier>
>
> and the LoongArch's weak memory model allows load/load reorder for the
> same address.
The CoRR problem would cause wider problems than this.For this case,
do you mean your LL -> LL would be reordered?
CPU 0
CPU1
LL(2) (set ex-monitor)
store (break the ex-monitor)
LL(1) (reordered instruction set ex-monitor
SC(3) (successed ?)
>
> So, the __WEAK_LLSC_MB[1] is used to prevent load/load reorder and no
> explicit barrier instruction is required after SC.
>
> [1] https://lore.kernel.org/loongarch/[email protected]/
>
> Regards,
> --
> WANG Rui
>
--
Best Regards
Guo Ren
On Tue, Aug 1, 2023 at 5:02 PM Guo Ren <[email protected]> wrote:
>
> On Tue, Aug 1, 2023 at 10:29 AM WANG Rui <[email protected]> wrote:
> >
> > Hello,
> >
> > On Tue, Aug 1, 2023 at 9:16 AM <[email protected]> wrote:
> > > diff --git a/arch/loongarch/include/asm/cmpxchg.h b/arch/loongarch/include/asm/cmpxchg.h
> > > index 979fde61bba8..6a05b92814b6 100644
> > > --- a/arch/loongarch/include/asm/cmpxchg.h
> > > +++ b/arch/loongarch/include/asm/cmpxchg.h
> > > @@ -102,8 +102,8 @@ __arch_xchg(volatile void *ptr, unsigned long x, int size)
> > > " move $t0, %z4 \n" \
> > > " " st " $t0, %1 \n" \
> > > " beqz $t0, 1b \n" \
> > > - "2: \n" \
> > > __WEAK_LLSC_MB \
> > > + "2: \n" \
> >
> > Thanks for the patch.
> >
> > This would look pretty good if it weren't for the special memory
> > barrier semantics of the LoongArch's LL and SC instructions.
> >
> > The LL/SC memory barrier behavior of LoongArch:
> >
> > * LL: <memory-barrier> + <load-exclusive>
> > * SC: <store-conditional> + <memory-barrier>
> >
> > and the LoongArch's weak memory model allows load/load reorder for the
> > same address.
> The CoRR problem would cause wider problems than this.For this case,
> do you mean your LL -> LL would be reordered?
>
> CPU 0
> CPU1
> LL(2) (set ex-monitor)
>
> store (break the ex-monitor)
> LL(1) (reordered instruction set ex-monitor
> SC(3) (successes ?)
Sorry for the mail client reformat, I mean:
CPU0 LL(2) (set ex-monitor)
CPU1 STORE (break the ex-monitor)
CPU0 LL(1) (reordered instruction set ex-monitor
CPU0 SC(3) (success?)
>
> >
> > So, the __WEAK_LLSC_MB[1] is used to prevent load/load reorder and no
> > explicit barrier instruction is required after SC.
> >
> > [1] https://lore.kernel.org/loongarch/[email protected]/
> >
> > Regards,
> > --
> > WANG Rui
> >
>
>
> --
> Best Regards
> Guo Ren
--
Best Regards
Guo Ren
On Tue, Aug 01, 2023 at 10:29:31AM +0800, WANG Rui wrote:
> On Tue, Aug 1, 2023 at 9:16 AM <[email protected]> wrote:
> > diff --git a/arch/loongarch/include/asm/cmpxchg.h b/arch/loongarch/include/asm/cmpxchg.h
> > index 979fde61bba8..6a05b92814b6 100644
> > --- a/arch/loongarch/include/asm/cmpxchg.h
> > +++ b/arch/loongarch/include/asm/cmpxchg.h
> > @@ -102,8 +102,8 @@ __arch_xchg(volatile void *ptr, unsigned long x, int size)
> > " move $t0, %z4 \n" \
> > " " st " $t0, %1 \n" \
> > " beqz $t0, 1b \n" \
> > - "2: \n" \
> > __WEAK_LLSC_MB \
> > + "2: \n" \
>
> Thanks for the patch.
>
> This would look pretty good if it weren't for the special memory
> barrier semantics of the LoongArch's LL and SC instructions.
>
> The LL/SC memory barrier behavior of LoongArch:
>
> * LL: <memory-barrier> + <load-exclusive>
> * SC: <store-conditional> + <memory-barrier>
>
> and the LoongArch's weak memory model allows load/load reorder for the
> same address.
Hmm, somehow this one passed me by, but I think that puts you in the naughty
corner with Itanium. It probably also means your READ_ONCE() is broken,
unless the compiler emits barriers for volatile reads (like ia64)?
Will
Hello,
On Tue, Aug 1, 2023 at 4:32 PM Will Deacon <[email protected]> wrote:
>
> Hmm, somehow this one passed me by, but I think that puts you in the naughty
> corner with Itanium. It probably also means your READ_ONCE() is broken,
> unless the compiler emits barriers for volatile reads (like ia64)?
Hmm, I agree with your perspective. Allowing out-of-order loads for
the same address in the memory model provides certain performance
benefits, but it also poses challenges to software. Fortunately,
hardware supports software to disable this feature when needed.
Regards,
--
WANG Rui
Hello,
On Tue, Aug 1, 2023 at 5:05 PM Guo Ren <[email protected]> wrote:
>
> > The CoRR problem would cause wider problems than this.For this case,
> > do you mean your LL -> LL would be reordered?
> >
> > CPU 0
> > CPU1
> > LL(2) (set ex-monitor)
> >
> > store (break the ex-monitor)
> > LL(1) (reordered instruction set ex-monitor
> > SC(3) (successes ?)
> Sorry for the mail client reformat, I mean:
>
> CPU0 LL(2) (set ex-monitor)
> CPU1 STORE (break the ex-monitor)
> CPU0 LL(1) (reordered instruction set ex-monitor
> CPU0 SC(3) (success?)
No. LL and LL won't reorder because LL implies a memory barrier(though
not acquire semantics).
Regards,
--
WANG Rui
On Tue, Aug 1, 2023 at 5:32 PM WANG Rui <[email protected]> wrote:
>
> Hello,
>
> On Tue, Aug 1, 2023 at 5:05 PM Guo Ren <[email protected]> wrote:
> >
> > > The CoRR problem would cause wider problems than this.For this case,
> > > do you mean your LL -> LL would be reordered?
> > >
> > > CPU 0
> > > CPU1
> > > LL(2) (set ex-monitor)
> > >
> > > store (break the ex-monitor)
> > > LL(1) (reordered instruction set ex-monitor
> > > SC(3) (successes ?)
> > Sorry for the mail client reformat, I mean:
> >
> > CPU0 LL(2) (set ex-monitor)
> > CPU1 STORE (break the ex-monitor)
> > CPU0 LL(1) (reordered instruction set ex-monitor
> > CPU0 SC(3) (success?)
>
> No. LL and LL won't reorder because LL implies a memory barrier(though
> not acquire semantics).
That means we could remove __WEAK_LLSC_MB totally, right?
>
> Regards,
> --
> WANG Rui
>
--
Best Regards
Guo Ren
On Tue, 2023-08-01 at 18:49 +0800, Guo Ren wrote:
> > On Tue, Aug 1, 2023 at 5:05 PM Guo Ren <[email protected]> wrote:
> > >
> > > > The CoRR problem would cause wider problems than this.For this case,
> > > > do you mean your LL -> LL would be reordered?
> > > >
> > > > CPU 0
> > > > CPU1
> > > > LL(2) (set ex-monitor)
> > > >
> > > > store (break the ex-monitor)
> > > > LL(1) (reordered instruction set ex-monitor
> > > > SC(3) (successes ?)
> > > Sorry for the mail client reformat, I mean:
> > >
> > > CPU0 LL(2) (set ex-monitor)
> > > CPU1 STORE (break the ex-monitor)
> > > CPU0 LL(1) (reordered instruction set ex-monitor
> > > CPU0 SC(3) (success?)
> >
> > No. LL and LL won't reorder because LL implies a memory barrier(though
> > not acquire semantics).
> That means we could remove __WEAK_LLSC_MB totally, right?
As I've said, to implement CAS on LA464 this barrier is *really* needed.
I initially didn't believe it then I spent one night (from 11 PM to 04
AM) debugging GCC libgomp test failures.
On LA664 (3A6000) things may change though.
--
Xi Ruoyao <[email protected]>
School of Aerospace Science and Technology, Xidian University
On Tue, 2023-08-01 at 19:23 +0800, Xi Ruoyao wrote:
> On Tue, 2023-08-01 at 18:49 +0800, Guo Ren wrote:
> > > On Tue, Aug 1, 2023 at 5:05 PM Guo Ren <[email protected]> wrote:
> > > >
> > > > > The CoRR problem would cause wider problems than this.For this case,
> > > > > do you mean your LL -> LL would be reordered?
> > > > >
> > > > > CPU 0
> > > > > CPU1
> > > > > LL(2) (set ex-monitor)
> > > > >
> > > > > store (break the ex-monitor)
> > > > > LL(1) (reordered instruction set ex-monitor
> > > > > SC(3) (successes ?)
> > > > Sorry for the mail client reformat, I mean:
> > > >
> > > > CPU0 LL(2) (set ex-monitor)
> > > > CPU1 STORE (break the ex-monitor)
> > > > CPU0 LL(1) (reordered instruction set ex-monitor
> > > > CPU0 SC(3) (success?)
> > >
> > > No. LL and LL won't reorder because LL implies a memory barrier(though
> > > not acquire semantics).
> > That means we could remove __WEAK_LLSC_MB totally, right?
>
> As I've said, to implement CAS on LA464 this barrier is *really* needed.
> I initially didn't believe it then I spent one night (from 11 PM to 04
> AM) debugging GCC libgomp test failures.
>
> On LA664 (3A6000) things may change though.
And I consider this a hardware bug, so it's out of the general concept
of general memory orders. We are basically just applying a workaround
provided by uarch designers.
--
Xi Ruoyao <[email protected]>
School of Aerospace Science and Technology, Xidian University
On Tue, Aug 1, 2023 at 6:50 PM Guo Ren <[email protected]> wrote:
>
> On Tue, Aug 1, 2023 at 5:32 PM WANG Rui <[email protected]> wrote:
> > No. LL and LL won't reorder because LL implies a memory barrier(though
> > not acquire semantics).
> That means we could remove __WEAK_LLSC_MB totally, right?
More precisely, __WEAK_LLSC_MB is intended to prevent reordering
between LL and normal LD used to fetch the expected value for cmpxchg.
Regards,
--
WANG Rui
On Tue, Aug 1, 2023 at 12:37 PM WANG Rui <[email protected]> wrote:
>
> On Tue, Aug 1, 2023 at 6:50 PM Guo Ren <[email protected]> wrote:
> >
> > On Tue, Aug 1, 2023 at 5:32 PM WANG Rui <[email protected]> wrote:
> > > No. LL and LL won't reorder because LL implies a memory barrier(though
> > > not acquire semantics).
> > That means we could remove __WEAK_LLSC_MB totally, right?
>
> More precisely, __WEAK_LLSC_MB is intended to prevent reordering
> between LL and normal LD used to fetch the expected value for cmpxchg.
Oh, that's unnecessary when cmpxchg fails.
Maybe you treat cmpxchg as a CoRR antidote in coincidence. Please
solve the CoRR problem by READ_ONCE.
See alpha architecture.
>
> Regards,
> --
> WANG Rui
>
--
Best Regards
Guo Ren
On Wed, Aug 2, 2023 at 7:17 AM Guo Ren <[email protected]> wrote:
>
> On Tue, Aug 1, 2023 at 12:37 PM WANG Rui <[email protected]> wrote:
> >
> > On Tue, Aug 1, 2023 at 6:50 PM Guo Ren <[email protected]> wrote:
> > >
> > > On Tue, Aug 1, 2023 at 5:32 PM WANG Rui <[email protected]> wrote:
> > > > No. LL and LL won't reorder because LL implies a memory barrier(though
> > > > not acquire semantics).
> > > That means we could remove __WEAK_LLSC_MB totally, right?
> >
> > More precisely, __WEAK_LLSC_MB is intended to prevent reordering
> > between LL and normal LD used to fetch the expected value for cmpxchg.
> Oh, that's unnecessary when cmpxchg fails.
>
> Maybe you treat cmpxchg as a CoRR antidote in coincidence. Please
> solve the CoRR problem by READ_ONCE.
>
> See alpha architecture.
Unfortunately, the LL instruction has no acquire semantics. Even if
our kernel team improves READ_ONCE, it cannot prevent reordering
between LL and READ_ONCE after cmpxchg fails.
LL (<memory-barrier> + <load-exclusive>); WEAK_LLSC_MB; READ_ONCE
(<normal-load>); ...
vs
LL (<memory-barrier> + <load-exclusive>); READ_ONCE (<normal-load> +
<memory-barrier>); ...
Improving READ_ONCE is really important.
Regards,
--
WANG Rui