2018-03-05 18:26:02

by Andrea Parri

[permalink] [raw]
Subject: [RFC PATCH 1/2] riscv/spinlock: Strengthen implementations with fences

Current implementations map locking operations using .rl and .aq
annotations. However, this mapping is unsound w.r.t. the kernel
memory consistency model (LKMM) [1]:

Referring to the "unlock-lock-read-ordering" test reported below,
Daniel wrote:

"I think an RCpc interpretation of .aq and .rl would in fact
allow the two normal loads in P1 to be reordered [...]

The intuition would be that the amoswap.w.aq can forward from
the amoswap.w.rl while that's still in the store buffer, and
then the lw x3,0(x4) can also perform while the amoswap.w.rl
is still in the store buffer, all before the l1 x1,0(x2)
executes. That's not forbidden unless the amoswaps are RCsc,
unless I'm missing something.

Likewise even if the unlock()/lock() is between two stores.
A control dependency might originate from the load part of
the amoswap.w.aq, but there still would have to be something
to ensure that this load part in fact performs after the store
part of the amoswap.w.rl performs globally, and that's not
automatic under RCpc."

Simulation of the RISC-V memory consistency model confirmed this
expectation.

In order to "synchronize" LKMM and RISC-V's implementation, this
commit strengthens the implementations of the locking operations
by replacing .rl and .aq with the use of ("lightweigth") fences,
resp., "fence rw, w" and "fence r , rw".

C unlock-lock-read-ordering

{}
/* s initially owned by P1 */

P0(int *x, int *y)
{
WRITE_ONCE(*x, 1);
smp_wmb();
WRITE_ONCE(*y, 1);
}

P1(int *x, int *y, spinlock_t *s)
{
int r0;
int r1;

r0 = READ_ONCE(*y);
spin_unlock(s);
spin_lock(s);
r1 = READ_ONCE(*x);
}

exists (1:r0=1 /\ 1:r1=0)

[1] https://marc.info/?l=linux-kernel&m=151930201102853&w=2
https://groups.google.com/a/groups.riscv.org/forum/#!topic/isa-dev/hKywNHBkAXM
https://marc.info/?l=linux-kernel&m=151633436614259&w=2

Signed-off-by: Andrea Parri <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Albert Ou <[email protected]>
Cc: Daniel Lustig <[email protected]>
Cc: Alan Stern <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Boqun Feng <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: David Howells <[email protected]>
Cc: Jade Alglave <[email protected]>
Cc: Luc Maranget <[email protected]>
Cc: "Paul E. McKenney" <[email protected]>
Cc: Akira Yokosawa <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: [email protected]
Cc: [email protected]
---
arch/riscv/include/asm/fence.h | 12 ++++++++++++
arch/riscv/include/asm/spinlock.h | 29 +++++++++++++++--------------
2 files changed, 27 insertions(+), 14 deletions(-)
create mode 100644 arch/riscv/include/asm/fence.h

diff --git a/arch/riscv/include/asm/fence.h b/arch/riscv/include/asm/fence.h
new file mode 100644
index 0000000000000..2b443a3a487f3
--- /dev/null
+++ b/arch/riscv/include/asm/fence.h
@@ -0,0 +1,12 @@
+#ifndef _ASM_RISCV_FENCE_H
+#define _ASM_RISCV_FENCE_H
+
+#ifdef CONFIG_SMP
+#define RISCV_ACQUIRE_BARRIER "\tfence r , rw\n"
+#define RISCV_RELEASE_BARRIER "\tfence rw, w\n"
+#else
+#define RISCV_ACQUIRE_BARRIER
+#define RISCV_RELEASE_BARRIER
+#endif
+
+#endif /* _ASM_RISCV_FENCE_H */
diff --git a/arch/riscv/include/asm/spinlock.h b/arch/riscv/include/asm/spinlock.h
index 2fd27e8ef1fd6..8eb26d1ede819 100644
--- a/arch/riscv/include/asm/spinlock.h
+++ b/arch/riscv/include/asm/spinlock.h
@@ -17,6 +17,7 @@

#include <linux/kernel.h>
#include <asm/current.h>
+#include <asm/fence.h>

/*
* Simple spin lock operations. These provide no fairness guarantees.
@@ -28,10 +29,7 @@

static inline void arch_spin_unlock(arch_spinlock_t *lock)
{
- __asm__ __volatile__ (
- "amoswap.w.rl x0, x0, %0"
- : "=A" (lock->lock)
- :: "memory");
+ smp_store_release(&lock->lock, 0);
}

static inline int arch_spin_trylock(arch_spinlock_t *lock)
@@ -39,7 +37,8 @@ static inline int arch_spin_trylock(arch_spinlock_t *lock)
int tmp = 1, busy;

__asm__ __volatile__ (
- "amoswap.w.aq %0, %2, %1"
+ " amoswap.w %0, %2, %1\n"
+ RISCV_ACQUIRE_BARRIER
: "=r" (busy), "+A" (lock->lock)
: "r" (tmp)
: "memory");
@@ -68,8 +67,9 @@ static inline void arch_read_lock(arch_rwlock_t *lock)
"1: lr.w %1, %0\n"
" bltz %1, 1b\n"
" addi %1, %1, 1\n"
- " sc.w.aq %1, %1, %0\n"
+ " sc.w %1, %1, %0\n"
" bnez %1, 1b\n"
+ RISCV_ACQUIRE_BARRIER
: "+A" (lock->lock), "=&r" (tmp)
:: "memory");
}
@@ -82,8 +82,9 @@ static inline void arch_write_lock(arch_rwlock_t *lock)
"1: lr.w %1, %0\n"
" bnez %1, 1b\n"
" li %1, -1\n"
- " sc.w.aq %1, %1, %0\n"
+ " sc.w %1, %1, %0\n"
" bnez %1, 1b\n"
+ RISCV_ACQUIRE_BARRIER
: "+A" (lock->lock), "=&r" (tmp)
:: "memory");
}
@@ -96,8 +97,9 @@ static inline int arch_read_trylock(arch_rwlock_t *lock)
"1: lr.w %1, %0\n"
" bltz %1, 1f\n"
" addi %1, %1, 1\n"
- " sc.w.aq %1, %1, %0\n"
+ " sc.w %1, %1, %0\n"
" bnez %1, 1b\n"
+ RISCV_ACQUIRE_BARRIER
"1:\n"
: "+A" (lock->lock), "=&r" (busy)
:: "memory");
@@ -113,8 +115,9 @@ static inline int arch_write_trylock(arch_rwlock_t *lock)
"1: lr.w %1, %0\n"
" bnez %1, 1f\n"
" li %1, -1\n"
- " sc.w.aq %1, %1, %0\n"
+ " sc.w %1, %1, %0\n"
" bnez %1, 1b\n"
+ RISCV_ACQUIRE_BARRIER
"1:\n"
: "+A" (lock->lock), "=&r" (busy)
:: "memory");
@@ -125,7 +128,8 @@ static inline int arch_write_trylock(arch_rwlock_t *lock)
static inline void arch_read_unlock(arch_rwlock_t *lock)
{
__asm__ __volatile__(
- "amoadd.w.rl x0, %1, %0"
+ RISCV_RELEASE_BARRIER
+ " amoadd.w x0, %1, %0\n"
: "+A" (lock->lock)
: "r" (-1)
: "memory");
@@ -133,10 +137,7 @@ static inline void arch_read_unlock(arch_rwlock_t *lock)

static inline void arch_write_unlock(arch_rwlock_t *lock)
{
- __asm__ __volatile__ (
- "amoswap.w.rl x0, x0, %0"
- : "=A" (lock->lock)
- :: "memory");
+ smp_store_release(&lock->lock, 0);
}

#endif /* _ASM_RISCV_SPINLOCK_H */
--
2.7.4



2018-03-07 02:04:06

by Palmer Dabbelt

[permalink] [raw]
Subject: Re: [RFC PATCH 1/2] riscv/spinlock: Strengthen implementations with fences

On Mon, 05 Mar 2018 10:24:09 PST (-0800), [email protected] wrote:
> Current implementations map locking operations using .rl and .aq
> annotations. However, this mapping is unsound w.r.t. the kernel
> memory consistency model (LKMM) [1]:
>
> Referring to the "unlock-lock-read-ordering" test reported below,
> Daniel wrote:
>
> "I think an RCpc interpretation of .aq and .rl would in fact
> allow the two normal loads in P1 to be reordered [...]
>
> The intuition would be that the amoswap.w.aq can forward from
> the amoswap.w.rl while that's still in the store buffer, and
> then the lw x3,0(x4) can also perform while the amoswap.w.rl
> is still in the store buffer, all before the l1 x1,0(x2)
> executes. That's not forbidden unless the amoswaps are RCsc,
> unless I'm missing something.
>
> Likewise even if the unlock()/lock() is between two stores.
> A control dependency might originate from the load part of
> the amoswap.w.aq, but there still would have to be something
> to ensure that this load part in fact performs after the store
> part of the amoswap.w.rl performs globally, and that's not
> automatic under RCpc."
>
> Simulation of the RISC-V memory consistency model confirmed this
> expectation.
>
> In order to "synchronize" LKMM and RISC-V's implementation, this
> commit strengthens the implementations of the locking operations
> by replacing .rl and .aq with the use of ("lightweigth") fences,
> resp., "fence rw, w" and "fence r , rw".
>
> C unlock-lock-read-ordering
>
> {}
> /* s initially owned by P1 */
>
> P0(int *x, int *y)
> {
> WRITE_ONCE(*x, 1);
> smp_wmb();
> WRITE_ONCE(*y, 1);
> }
>
> P1(int *x, int *y, spinlock_t *s)
> {
> int r0;
> int r1;
>
> r0 = READ_ONCE(*y);
> spin_unlock(s);
> spin_lock(s);
> r1 = READ_ONCE(*x);
> }
>
> exists (1:r0=1 /\ 1:r1=0)
>
> [1] https://marc.info/?l=linux-kernel&m=151930201102853&w=2
> https://groups.google.com/a/groups.riscv.org/forum/#!topic/isa-dev/hKywNHBkAXM
> https://marc.info/?l=linux-kernel&m=151633436614259&w=2
>
> Signed-off-by: Andrea Parri <[email protected]>
> Cc: Palmer Dabbelt <[email protected]>
> Cc: Albert Ou <[email protected]>
> Cc: Daniel Lustig <[email protected]>
> Cc: Alan Stern <[email protected]>
> Cc: Will Deacon <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Boqun Feng <[email protected]>
> Cc: Nicholas Piggin <[email protected]>
> Cc: David Howells <[email protected]>
> Cc: Jade Alglave <[email protected]>
> Cc: Luc Maranget <[email protected]>
> Cc: "Paul E. McKenney" <[email protected]>
> Cc: Akira Yokosawa <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Linus Torvalds <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> ---
> arch/riscv/include/asm/fence.h | 12 ++++++++++++
> arch/riscv/include/asm/spinlock.h | 29 +++++++++++++++--------------
> 2 files changed, 27 insertions(+), 14 deletions(-)
> create mode 100644 arch/riscv/include/asm/fence.h

Oh, sorry about this -- I thought I'd deleted all this code, but I guess I just
wrote a patch and then forgot about it. Here's my original patch, which I have
marked as a WIP:

commit 39908f1f8b75ae88ce44dc77b8219a94078ad298
Author: Palmer Dabbelt <[email protected]>
Date: Tue Dec 5 16:26:50 2017 -0800

RISC-V: Use generic spin and rw locks

This might not be exactly the right thing to do: we could use LR/SC to
produce slightly better locks by rolling the tests into the LR/SC. I'm
going to defer that until I get a better handle on the new memory model
and just be safe here: after some discussion I'm pretty sure the AMOs
are good, and cmpxchg is safe (by being way too string).

Since we'd want to rewrite the spinlocks anyway so they queue, I don't
see any reason to keep the old implementations around.

Signed-off-by: Palmer Dabbelt <[email protected]>

diff --git a/arch/riscv/include/asm/spinlock.h b/arch/riscv/include/asm/spinlock.h
index 2fd27e8ef1fd..9b166ea81fe5 100644
--- a/arch/riscv/include/asm/spinlock.h
+++ b/arch/riscv/include/asm/spinlock.h
@@ -15,128 +15,7 @@
#ifndef _ASM_RISCV_SPINLOCK_H
#define _ASM_RISCV_SPINLOCK_H

-#include <linux/kernel.h>
-#include <asm/current.h>
-
-/*
- * Simple spin lock operations. These provide no fairness guarantees.
- */
-
-/* FIXME: Replace this with a ticket lock, like MIPS. */
-
-#define arch_spin_is_locked(x) (READ_ONCE((x)->lock) != 0)
-
-static inline void arch_spin_unlock(arch_spinlock_t *lock)
-{
- __asm__ __volatile__ (
- "amoswap.w.rl x0, x0, %0"
- : "=A" (lock->lock)
- :: "memory");
-}
-
-static inline int arch_spin_trylock(arch_spinlock_t *lock)
-{
- int tmp = 1, busy;
-
- __asm__ __volatile__ (
- "amoswap.w.aq %0, %2, %1"
- : "=r" (busy), "+A" (lock->lock)
- : "r" (tmp)
- : "memory");
-
- return !busy;
-}
-
-static inline void arch_spin_lock(arch_spinlock_t *lock)
-{
- while (1) {
- if (arch_spin_is_locked(lock))
- continue;
-
- if (arch_spin_trylock(lock))
- break;
- }
-}
-
-/***********************************************************/
-
-static inline void arch_read_lock(arch_rwlock_t *lock)
-{
- int tmp;
-
- __asm__ __volatile__(
- "1: lr.w %1, %0\n"
- " bltz %1, 1b\n"
- " addi %1, %1, 1\n"
- " sc.w.aq %1, %1, %0\n"
- " bnez %1, 1b\n"
- : "+A" (lock->lock), "=&r" (tmp)
- :: "memory");
-}
-
-static inline void arch_write_lock(arch_rwlock_t *lock)
-{
- int tmp;
-
- __asm__ __volatile__(
- "1: lr.w %1, %0\n"
- " bnez %1, 1b\n"
- " li %1, -1\n"
- " sc.w.aq %1, %1, %0\n"
- " bnez %1, 1b\n"
- : "+A" (lock->lock), "=&r" (tmp)
- :: "memory");
-}
-
-static inline int arch_read_trylock(arch_rwlock_t *lock)
-{
- int busy;
-
- __asm__ __volatile__(
- "1: lr.w %1, %0\n"
- " bltz %1, 1f\n"
- " addi %1, %1, 1\n"
- " sc.w.aq %1, %1, %0\n"
- " bnez %1, 1b\n"
- "1:\n"
- : "+A" (lock->lock), "=&r" (busy)
- :: "memory");
-
- return !busy;
-}
-
-static inline int arch_write_trylock(arch_rwlock_t *lock)
-{
- int busy;
-
- __asm__ __volatile__(
- "1: lr.w %1, %0\n"
- " bnez %1, 1f\n"
- " li %1, -1\n"
- " sc.w.aq %1, %1, %0\n"
- " bnez %1, 1b\n"
- "1:\n"
- : "+A" (lock->lock), "=&r" (busy)
- :: "memory");
-
- return !busy;
-}
-
-static inline void arch_read_unlock(arch_rwlock_t *lock)
-{
- __asm__ __volatile__(
- "amoadd.w.rl x0, %1, %0"
- : "+A" (lock->lock)
- : "r" (-1)
- : "memory");
-}
-
-static inline void arch_write_unlock(arch_rwlock_t *lock)
-{
- __asm__ __volatile__ (
- "amoswap.w.rl x0, x0, %0"
- : "=A" (lock->lock)
- :: "memory");
-}
+#include <asm-generic/qspinlock.h>
+#include <asm-generic/qrwlock.h>

#endif /* _ASM_RISCV_SPINLOCK_H */


2018-03-07 10:56:06

by Andrea Parri

[permalink] [raw]
Subject: Re: [RFC PATCH 1/2] riscv/spinlock: Strengthen implementations with fences

On Tue, Mar 06, 2018 at 06:02:28PM -0800, Palmer Dabbelt wrote:
> On Mon, 05 Mar 2018 10:24:09 PST (-0800), [email protected] wrote:
> >Current implementations map locking operations using .rl and .aq
> >annotations. However, this mapping is unsound w.r.t. the kernel
> >memory consistency model (LKMM) [1]:
> >
> >Referring to the "unlock-lock-read-ordering" test reported below,
> >Daniel wrote:
> >
> > "I think an RCpc interpretation of .aq and .rl would in fact
> > allow the two normal loads in P1 to be reordered [...]
> >
> > The intuition would be that the amoswap.w.aq can forward from
> > the amoswap.w.rl while that's still in the store buffer, and
> > then the lw x3,0(x4) can also perform while the amoswap.w.rl
> > is still in the store buffer, all before the l1 x1,0(x2)
> > executes. That's not forbidden unless the amoswaps are RCsc,
> > unless I'm missing something.
> >
> > Likewise even if the unlock()/lock() is between two stores.
> > A control dependency might originate from the load part of
> > the amoswap.w.aq, but there still would have to be something
> > to ensure that this load part in fact performs after the store
> > part of the amoswap.w.rl performs globally, and that's not
> > automatic under RCpc."
> >
> >Simulation of the RISC-V memory consistency model confirmed this
> >expectation.
> >
> >In order to "synchronize" LKMM and RISC-V's implementation, this
> >commit strengthens the implementations of the locking operations
> >by replacing .rl and .aq with the use of ("lightweigth") fences,
> >resp., "fence rw, w" and "fence r , rw".
> >
> >C unlock-lock-read-ordering
> >
> >{}
> >/* s initially owned by P1 */
> >
> >P0(int *x, int *y)
> >{
> > WRITE_ONCE(*x, 1);
> > smp_wmb();
> > WRITE_ONCE(*y, 1);
> >}
> >
> >P1(int *x, int *y, spinlock_t *s)
> >{
> > int r0;
> > int r1;
> >
> > r0 = READ_ONCE(*y);
> > spin_unlock(s);
> > spin_lock(s);
> > r1 = READ_ONCE(*x);
> >}
> >
> >exists (1:r0=1 /\ 1:r1=0)
> >
> >[1] https://marc.info/?l=linux-kernel&m=151930201102853&w=2
> > https://groups.google.com/a/groups.riscv.org/forum/#!topic/isa-dev/hKywNHBkAXM
> > https://marc.info/?l=linux-kernel&m=151633436614259&w=2
> >
> >Signed-off-by: Andrea Parri <[email protected]>
> >Cc: Palmer Dabbelt <[email protected]>
> >Cc: Albert Ou <[email protected]>
> >Cc: Daniel Lustig <[email protected]>
> >Cc: Alan Stern <[email protected]>
> >Cc: Will Deacon <[email protected]>
> >Cc: Peter Zijlstra <[email protected]>
> >Cc: Boqun Feng <[email protected]>
> >Cc: Nicholas Piggin <[email protected]>
> >Cc: David Howells <[email protected]>
> >Cc: Jade Alglave <[email protected]>
> >Cc: Luc Maranget <[email protected]>
> >Cc: "Paul E. McKenney" <[email protected]>
> >Cc: Akira Yokosawa <[email protected]>
> >Cc: Ingo Molnar <[email protected]>
> >Cc: Linus Torvalds <[email protected]>
> >Cc: [email protected]
> >Cc: [email protected]
> >---
> > arch/riscv/include/asm/fence.h | 12 ++++++++++++
> > arch/riscv/include/asm/spinlock.h | 29 +++++++++++++++--------------
> > 2 files changed, 27 insertions(+), 14 deletions(-)
> > create mode 100644 arch/riscv/include/asm/fence.h
>
> Oh, sorry about this -- I thought I'd deleted all this code, but I guess I
> just wrote a patch and then forgot about it. Here's my original patch,
> which I have marked as a WIP:

No problem.


>
> commit 39908f1f8b75ae88ce44dc77b8219a94078ad298
> Author: Palmer Dabbelt <[email protected]>
> Date: Tue Dec 5 16:26:50 2017 -0800
>
> RISC-V: Use generic spin and rw locks
>
> This might not be exactly the right thing to do: we could use LR/SC to
> produce slightly better locks by rolling the tests into the LR/SC. I'm
> going to defer that until I get a better handle on the new memory model
> and just be safe here: after some discussion I'm pretty sure the AMOs
> are good, and cmpxchg is safe (by being way too string).

I'm pretty sure you lost me (and a few other people) here.

IIUC, this says: "what we've been discussing within the last few weeks is
going to change", but not much else...

Or am I misunderstanding? You mean cmpxchg, ... as in my patch 2/2?


>
> Since we'd want to rewrite the spinlocks anyway so they queue, I don't
> see any reason to keep the old implementations around.

Keep in mind that queued locks were written and optimized for x86. arm64
only recently adopted qrwlocks:

087133ac90763cd339b6b67f2998f87dcc136c52
("locking/qrwlock, arm64: Move rwlock implementation over to qrwlocks")

This certainly needs further testing and reviewing. (Nit: your patch does
not compile on any of the "riscv" branches I'm currently tracking...)

Andrea


>
> Signed-off-by: Palmer Dabbelt <[email protected]>
>
> diff --git a/arch/riscv/include/asm/spinlock.h b/arch/riscv/include/asm/spinlock.h
> index 2fd27e8ef1fd..9b166ea81fe5 100644
> --- a/arch/riscv/include/asm/spinlock.h
> +++ b/arch/riscv/include/asm/spinlock.h
> @@ -15,128 +15,7 @@
> #ifndef _ASM_RISCV_SPINLOCK_H
> #define _ASM_RISCV_SPINLOCK_H
>
> -#include <linux/kernel.h>
> -#include <asm/current.h>
> -
> -/*
> - * Simple spin lock operations. These provide no fairness guarantees.
> - */
> -
> -/* FIXME: Replace this with a ticket lock, like MIPS. */
> -
> -#define arch_spin_is_locked(x) (READ_ONCE((x)->lock) != 0)
> -
> -static inline void arch_spin_unlock(arch_spinlock_t *lock)
> -{
> - __asm__ __volatile__ (
> - "amoswap.w.rl x0, x0, %0"
> - : "=A" (lock->lock)
> - :: "memory");
> -}
> -
> -static inline int arch_spin_trylock(arch_spinlock_t *lock)
> -{
> - int tmp = 1, busy;
> -
> - __asm__ __volatile__ (
> - "amoswap.w.aq %0, %2, %1"
> - : "=r" (busy), "+A" (lock->lock)
> - : "r" (tmp)
> - : "memory");
> -
> - return !busy;
> -}
> -
> -static inline void arch_spin_lock(arch_spinlock_t *lock)
> -{
> - while (1) {
> - if (arch_spin_is_locked(lock))
> - continue;
> -
> - if (arch_spin_trylock(lock))
> - break;
> - }
> -}
> -
> -/***********************************************************/
> -
> -static inline void arch_read_lock(arch_rwlock_t *lock)
> -{
> - int tmp;
> -
> - __asm__ __volatile__(
> - "1: lr.w %1, %0\n"
> - " bltz %1, 1b\n"
> - " addi %1, %1, 1\n"
> - " sc.w.aq %1, %1, %0\n"
> - " bnez %1, 1b\n"
> - : "+A" (lock->lock), "=&r" (tmp)
> - :: "memory");
> -}
> -
> -static inline void arch_write_lock(arch_rwlock_t *lock)
> -{
> - int tmp;
> -
> - __asm__ __volatile__(
> - "1: lr.w %1, %0\n"
> - " bnez %1, 1b\n"
> - " li %1, -1\n"
> - " sc.w.aq %1, %1, %0\n"
> - " bnez %1, 1b\n"
> - : "+A" (lock->lock), "=&r" (tmp)
> - :: "memory");
> -}
> -
> -static inline int arch_read_trylock(arch_rwlock_t *lock)
> -{
> - int busy;
> -
> - __asm__ __volatile__(
> - "1: lr.w %1, %0\n"
> - " bltz %1, 1f\n"
> - " addi %1, %1, 1\n"
> - " sc.w.aq %1, %1, %0\n"
> - " bnez %1, 1b\n"
> - "1:\n"
> - : "+A" (lock->lock), "=&r" (busy)
> - :: "memory");
> -
> - return !busy;
> -}
> -
> -static inline int arch_write_trylock(arch_rwlock_t *lock)
> -{
> - int busy;
> -
> - __asm__ __volatile__(
> - "1: lr.w %1, %0\n"
> - " bnez %1, 1f\n"
> - " li %1, -1\n"
> - " sc.w.aq %1, %1, %0\n"
> - " bnez %1, 1b\n"
> - "1:\n"
> - : "+A" (lock->lock), "=&r" (busy)
> - :: "memory");
> -
> - return !busy;
> -}
> -
> -static inline void arch_read_unlock(arch_rwlock_t *lock)
> -{
> - __asm__ __volatile__(
> - "amoadd.w.rl x0, %1, %0"
> - : "+A" (lock->lock)
> - : "r" (-1)
> - : "memory");
> -}
> -
> -static inline void arch_write_unlock(arch_rwlock_t *lock)
> -{
> - __asm__ __volatile__ (
> - "amoswap.w.rl x0, x0, %0"
> - : "=A" (lock->lock)
> - :: "memory");
> -}
> +#include <asm-generic/qspinlock.h>
> +#include <asm-generic/qrwlock.h>
>
> #endif /* _ASM_RISCV_SPINLOCK_H */
>

2018-03-07 18:40:46

by Palmer Dabbelt

[permalink] [raw]
Subject: Re: [RFC PATCH 1/2] riscv/spinlock: Strengthen implementations with fences

On Wed, 07 Mar 2018 02:52:42 PST (-0800), [email protected] wrote:
> On Tue, Mar 06, 2018 at 06:02:28PM -0800, Palmer Dabbelt wrote:
>> On Mon, 05 Mar 2018 10:24:09 PST (-0800), [email protected] wrote:
>> >Current implementations map locking operations using .rl and .aq
>> >annotations. However, this mapping is unsound w.r.t. the kernel
>> >memory consistency model (LKMM) [1]:
>> >
>> >Referring to the "unlock-lock-read-ordering" test reported below,
>> >Daniel wrote:
>> >
>> > "I think an RCpc interpretation of .aq and .rl would in fact
>> > allow the two normal loads in P1 to be reordered [...]
>> >
>> > The intuition would be that the amoswap.w.aq can forward from
>> > the amoswap.w.rl while that's still in the store buffer, and
>> > then the lw x3,0(x4) can also perform while the amoswap.w.rl
>> > is still in the store buffer, all before the l1 x1,0(x2)
>> > executes. That's not forbidden unless the amoswaps are RCsc,
>> > unless I'm missing something.
>> >
>> > Likewise even if the unlock()/lock() is between two stores.
>> > A control dependency might originate from the load part of
>> > the amoswap.w.aq, but there still would have to be something
>> > to ensure that this load part in fact performs after the store
>> > part of the amoswap.w.rl performs globally, and that's not
>> > automatic under RCpc."
>> >
>> >Simulation of the RISC-V memory consistency model confirmed this
>> >expectation.
>> >
>> >In order to "synchronize" LKMM and RISC-V's implementation, this
>> >commit strengthens the implementations of the locking operations
>> >by replacing .rl and .aq with the use of ("lightweigth") fences,
>> >resp., "fence rw, w" and "fence r , rw".
>> >
>> >C unlock-lock-read-ordering
>> >
>> >{}
>> >/* s initially owned by P1 */
>> >
>> >P0(int *x, int *y)
>> >{
>> > WRITE_ONCE(*x, 1);
>> > smp_wmb();
>> > WRITE_ONCE(*y, 1);
>> >}
>> >
>> >P1(int *x, int *y, spinlock_t *s)
>> >{
>> > int r0;
>> > int r1;
>> >
>> > r0 = READ_ONCE(*y);
>> > spin_unlock(s);
>> > spin_lock(s);
>> > r1 = READ_ONCE(*x);
>> >}
>> >
>> >exists (1:r0=1 /\ 1:r1=0)
>> >
>> >[1] https://marc.info/?l=linux-kernel&m=151930201102853&w=2
>> > https://groups.google.com/a/groups.riscv.org/forum/#!topic/isa-dev/hKywNHBkAXM
>> > https://marc.info/?l=linux-kernel&m=151633436614259&w=2
>> >
>> >Signed-off-by: Andrea Parri <[email protected]>
>> >Cc: Palmer Dabbelt <[email protected]>
>> >Cc: Albert Ou <[email protected]>
>> >Cc: Daniel Lustig <[email protected]>
>> >Cc: Alan Stern <[email protected]>
>> >Cc: Will Deacon <[email protected]>
>> >Cc: Peter Zijlstra <[email protected]>
>> >Cc: Boqun Feng <[email protected]>
>> >Cc: Nicholas Piggin <[email protected]>
>> >Cc: David Howells <[email protected]>
>> >Cc: Jade Alglave <[email protected]>
>> >Cc: Luc Maranget <[email protected]>
>> >Cc: "Paul E. McKenney" <[email protected]>
>> >Cc: Akira Yokosawa <[email protected]>
>> >Cc: Ingo Molnar <[email protected]>
>> >Cc: Linus Torvalds <[email protected]>
>> >Cc: [email protected]
>> >Cc: [email protected]
>> >---
>> > arch/riscv/include/asm/fence.h | 12 ++++++++++++
>> > arch/riscv/include/asm/spinlock.h | 29 +++++++++++++++--------------
>> > 2 files changed, 27 insertions(+), 14 deletions(-)
>> > create mode 100644 arch/riscv/include/asm/fence.h
>>
>> Oh, sorry about this -- I thought I'd deleted all this code, but I guess I
>> just wrote a patch and then forgot about it. Here's my original patch,
>> which I have marked as a WIP:
>
> No problem.
>
>
>>
>> commit 39908f1f8b75ae88ce44dc77b8219a94078ad298
>> Author: Palmer Dabbelt <[email protected]>
>> Date: Tue Dec 5 16:26:50 2017 -0800
>>
>> RISC-V: Use generic spin and rw locks
>>
>> This might not be exactly the right thing to do: we could use LR/SC to
>> produce slightly better locks by rolling the tests into the LR/SC. I'm
>> going to defer that until I get a better handle on the new memory model
>> and just be safe here: after some discussion I'm pretty sure the AMOs
>> are good, and cmpxchg is safe (by being way too string).
>
> I'm pretty sure you lost me (and a few other people) here.
>
> IIUC, this says: "what we've been discussing within the last few weeks is
> going to change", but not much else...
>
> Or am I misunderstanding? You mean cmpxchg, ... as in my patch 2/2?

Well, it was what we were discussing for the past few weeks before Dec 5th (as
that's when I wrote the patch). It's more of a note for myself than a proper
commit message, and I've also forgotten what I was talking about.

>>
>> Since we'd want to rewrite the spinlocks anyway so they queue, I don't
>> see any reason to keep the old implementations around.
>
> Keep in mind that queued locks were written and optimized for x86. arm64
> only recently adopted qrwlocks:
>
> 087133ac90763cd339b6b67f2998f87dcc136c52
> ("locking/qrwlock, arm64: Move rwlock implementation over to qrwlocks")
>
> This certainly needs further testing and reviewing. (Nit: your patch does
> not compile on any of the "riscv" branches I'm currently tracking...)

That's probably why it was just floating around and not sent out :). I went
and talked to Andrew and we think there's actually a reasonable argument for
some spinlocks that are similar to what we currently have. The ISA manual
describes some canonical spinlock code, which has the advantage of being
smaller and being defined as a target for micro architectural pattern matching.

I'm going to go produce a new set of spinlocks, I think it'll be a bit more
coherent then.

I'm keeping your other patch in my queue for now, it generally looks good but I
haven't looked closely yet.

Thanks!

>
> Andrea
>
>
>>
>> Signed-off-by: Palmer Dabbelt <[email protected]>
>>
>> diff --git a/arch/riscv/include/asm/spinlock.h b/arch/riscv/include/asm/spinlock.h
>> index 2fd27e8ef1fd..9b166ea81fe5 100644
>> --- a/arch/riscv/include/asm/spinlock.h
>> +++ b/arch/riscv/include/asm/spinlock.h
>> @@ -15,128 +15,7 @@
>> #ifndef _ASM_RISCV_SPINLOCK_H
>> #define _ASM_RISCV_SPINLOCK_H
>>
>> -#include <linux/kernel.h>
>> -#include <asm/current.h>
>> -
>> -/*
>> - * Simple spin lock operations. These provide no fairness guarantees.
>> - */
>> -
>> -/* FIXME: Replace this with a ticket lock, like MIPS. */
>> -
>> -#define arch_spin_is_locked(x) (READ_ONCE((x)->lock) != 0)
>> -
>> -static inline void arch_spin_unlock(arch_spinlock_t *lock)
>> -{
>> - __asm__ __volatile__ (
>> - "amoswap.w.rl x0, x0, %0"
>> - : "=A" (lock->lock)
>> - :: "memory");
>> -}
>> -
>> -static inline int arch_spin_trylock(arch_spinlock_t *lock)
>> -{
>> - int tmp = 1, busy;
>> -
>> - __asm__ __volatile__ (
>> - "amoswap.w.aq %0, %2, %1"
>> - : "=r" (busy), "+A" (lock->lock)
>> - : "r" (tmp)
>> - : "memory");
>> -
>> - return !busy;
>> -}
>> -
>> -static inline void arch_spin_lock(arch_spinlock_t *lock)
>> -{
>> - while (1) {
>> - if (arch_spin_is_locked(lock))
>> - continue;
>> -
>> - if (arch_spin_trylock(lock))
>> - break;
>> - }
>> -}
>> -
>> -/***********************************************************/
>> -
>> -static inline void arch_read_lock(arch_rwlock_t *lock)
>> -{
>> - int tmp;
>> -
>> - __asm__ __volatile__(
>> - "1: lr.w %1, %0\n"
>> - " bltz %1, 1b\n"
>> - " addi %1, %1, 1\n"
>> - " sc.w.aq %1, %1, %0\n"
>> - " bnez %1, 1b\n"
>> - : "+A" (lock->lock), "=&r" (tmp)
>> - :: "memory");
>> -}
>> -
>> -static inline void arch_write_lock(arch_rwlock_t *lock)
>> -{
>> - int tmp;
>> -
>> - __asm__ __volatile__(
>> - "1: lr.w %1, %0\n"
>> - " bnez %1, 1b\n"
>> - " li %1, -1\n"
>> - " sc.w.aq %1, %1, %0\n"
>> - " bnez %1, 1b\n"
>> - : "+A" (lock->lock), "=&r" (tmp)
>> - :: "memory");
>> -}
>> -
>> -static inline int arch_read_trylock(arch_rwlock_t *lock)
>> -{
>> - int busy;
>> -
>> - __asm__ __volatile__(
>> - "1: lr.w %1, %0\n"
>> - " bltz %1, 1f\n"
>> - " addi %1, %1, 1\n"
>> - " sc.w.aq %1, %1, %0\n"
>> - " bnez %1, 1b\n"
>> - "1:\n"
>> - : "+A" (lock->lock), "=&r" (busy)
>> - :: "memory");
>> -
>> - return !busy;
>> -}
>> -
>> -static inline int arch_write_trylock(arch_rwlock_t *lock)
>> -{
>> - int busy;
>> -
>> - __asm__ __volatile__(
>> - "1: lr.w %1, %0\n"
>> - " bnez %1, 1f\n"
>> - " li %1, -1\n"
>> - " sc.w.aq %1, %1, %0\n"
>> - " bnez %1, 1b\n"
>> - "1:\n"
>> - : "+A" (lock->lock), "=&r" (busy)
>> - :: "memory");
>> -
>> - return !busy;
>> -}
>> -
>> -static inline void arch_read_unlock(arch_rwlock_t *lock)
>> -{
>> - __asm__ __volatile__(
>> - "amoadd.w.rl x0, %1, %0"
>> - : "+A" (lock->lock)
>> - : "r" (-1)
>> - : "memory");
>> -}
>> -
>> -static inline void arch_write_unlock(arch_rwlock_t *lock)
>> -{
>> - __asm__ __volatile__ (
>> - "amoswap.w.rl x0, x0, %0"
>> - : "=A" (lock->lock)
>> - :: "memory");
>> -}
>> +#include <asm-generic/qspinlock.h>
>> +#include <asm-generic/qrwlock.h>
>>
>> #endif /* _ASM_RISCV_SPINLOCK_H */
>>

2018-03-08 21:04:31

by Andrea Parri

[permalink] [raw]
Subject: Re: [RFC PATCH 1/2] riscv/spinlock: Strengthen implementations with fences

On Wed, Mar 07, 2018 at 10:33:49AM -0800, Palmer Dabbelt wrote:

[...]

> I'm going to go produce a new set of spinlocks, I think it'll be a bit more
> coherent then.
>
> I'm keeping your other patch in my queue for now, it generally looks good
> but I haven't looked closely yet.

Patches 1 and 2 address a same issue ("release-to-acquire"); this is also
expressed, more or less explicitly, in the corresponding commit messages:
it might make sense to "queue" them together, and to build the new locks
on top of these (even if this meant "rewrite all of/a large portion of
spinlock.h"...).

Andrea


>
> Thanks!
>
> >
> > Andrea
> >
> >
> >>
> >> Signed-off-by: Palmer Dabbelt <[email protected]>
> >>
> >>diff --git a/arch/riscv/include/asm/spinlock.h b/arch/riscv/include/asm/spinlock.h
> >>index 2fd27e8ef1fd..9b166ea81fe5 100644
> >>--- a/arch/riscv/include/asm/spinlock.h
> >>+++ b/arch/riscv/include/asm/spinlock.h
> >>@@ -15,128 +15,7 @@
> >>#ifndef _ASM_RISCV_SPINLOCK_H
> >>#define _ASM_RISCV_SPINLOCK_H
> >>
> >>-#include <linux/kernel.h>
> >>-#include <asm/current.h>
> >>-
> >>-/*
> >>- * Simple spin lock operations. These provide no fairness guarantees.
> >>- */
> >>-
> >>-/* FIXME: Replace this with a ticket lock, like MIPS. */
> >>-
> >>-#define arch_spin_is_locked(x) (READ_ONCE((x)->lock) != 0)
> >>-
> >>-static inline void arch_spin_unlock(arch_spinlock_t *lock)
> >>-{
> >>- __asm__ __volatile__ (
> >>- "amoswap.w.rl x0, x0, %0"
> >>- : "=A" (lock->lock)
> >>- :: "memory");
> >>-}
> >>-
> >>-static inline int arch_spin_trylock(arch_spinlock_t *lock)
> >>-{
> >>- int tmp = 1, busy;
> >>-
> >>- __asm__ __volatile__ (
> >>- "amoswap.w.aq %0, %2, %1"
> >>- : "=r" (busy), "+A" (lock->lock)
> >>- : "r" (tmp)
> >>- : "memory");
> >>-
> >>- return !busy;
> >>-}
> >>-
> >>-static inline void arch_spin_lock(arch_spinlock_t *lock)
> >>-{
> >>- while (1) {
> >>- if (arch_spin_is_locked(lock))
> >>- continue;
> >>-
> >>- if (arch_spin_trylock(lock))
> >>- break;
> >>- }
> >>-}
> >>-
> >>-/***********************************************************/
> >>-
> >>-static inline void arch_read_lock(arch_rwlock_t *lock)
> >>-{
> >>- int tmp;
> >>-
> >>- __asm__ __volatile__(
> >>- "1: lr.w %1, %0\n"
> >>- " bltz %1, 1b\n"
> >>- " addi %1, %1, 1\n"
> >>- " sc.w.aq %1, %1, %0\n"
> >>- " bnez %1, 1b\n"
> >>- : "+A" (lock->lock), "=&r" (tmp)
> >>- :: "memory");
> >>-}
> >>-
> >>-static inline void arch_write_lock(arch_rwlock_t *lock)
> >>-{
> >>- int tmp;
> >>-
> >>- __asm__ __volatile__(
> >>- "1: lr.w %1, %0\n"
> >>- " bnez %1, 1b\n"
> >>- " li %1, -1\n"
> >>- " sc.w.aq %1, %1, %0\n"
> >>- " bnez %1, 1b\n"
> >>- : "+A" (lock->lock), "=&r" (tmp)
> >>- :: "memory");
> >>-}
> >>-
> >>-static inline int arch_read_trylock(arch_rwlock_t *lock)
> >>-{
> >>- int busy;
> >>-
> >>- __asm__ __volatile__(
> >>- "1: lr.w %1, %0\n"
> >>- " bltz %1, 1f\n"
> >>- " addi %1, %1, 1\n"
> >>- " sc.w.aq %1, %1, %0\n"
> >>- " bnez %1, 1b\n"
> >>- "1:\n"
> >>- : "+A" (lock->lock), "=&r" (busy)
> >>- :: "memory");
> >>-
> >>- return !busy;
> >>-}
> >>-
> >>-static inline int arch_write_trylock(arch_rwlock_t *lock)
> >>-{
> >>- int busy;
> >>-
> >>- __asm__ __volatile__(
> >>- "1: lr.w %1, %0\n"
> >>- " bnez %1, 1f\n"
> >>- " li %1, -1\n"
> >>- " sc.w.aq %1, %1, %0\n"
> >>- " bnez %1, 1b\n"
> >>- "1:\n"
> >>- : "+A" (lock->lock), "=&r" (busy)
> >>- :: "memory");
> >>-
> >>- return !busy;
> >>-}
> >>-
> >>-static inline void arch_read_unlock(arch_rwlock_t *lock)
> >>-{
> >>- __asm__ __volatile__(
> >>- "amoadd.w.rl x0, %1, %0"
> >>- : "+A" (lock->lock)
> >>- : "r" (-1)
> >>- : "memory");
> >>-}
> >>-
> >>-static inline void arch_write_unlock(arch_rwlock_t *lock)
> >>-{
> >>- __asm__ __volatile__ (
> >>- "amoswap.w.rl x0, x0, %0"
> >>- : "=A" (lock->lock)
> >>- :: "memory");
> >>-}
> >>+#include <asm-generic/qspinlock.h>
> >>+#include <asm-generic/qrwlock.h>
> >>
> >>#endif /* _ASM_RISCV_SPINLOCK_H */
> >>

2018-03-08 22:35:25

by Palmer Dabbelt

[permalink] [raw]
Subject: Re: [RFC PATCH 1/2] riscv/spinlock: Strengthen implementations with fences

On Thu, 08 Mar 2018 13:03:03 PST (-0800), [email protected] wrote:
> On Wed, Mar 07, 2018 at 10:33:49AM -0800, Palmer Dabbelt wrote:
>
> [...]
>
>> I'm going to go produce a new set of spinlocks, I think it'll be a bit more
>> coherent then.
>>
>> I'm keeping your other patch in my queue for now, it generally looks good
>> but I haven't looked closely yet.
>
> Patches 1 and 2 address a same issue ("release-to-acquire"); this is also
> expressed, more or less explicitly, in the corresponding commit messages:
> it might make sense to "queue" them together, and to build the new locks
> on top of these (even if this meant "rewrite all of/a large portion of
> spinlock.h"...).

I agree. IIRC you had a fixup to the first pair of patches, can you submit a
v2?

2018-03-09 12:18:00

by Andrea Parri

[permalink] [raw]
Subject: Re: [RFC PATCH 1/2] riscv/spinlock: Strengthen implementations with fences

On Thu, Mar 08, 2018 at 02:11:12PM -0800, Palmer Dabbelt wrote:
> On Thu, 08 Mar 2018 13:03:03 PST (-0800), [email protected] wrote:
> >On Wed, Mar 07, 2018 at 10:33:49AM -0800, Palmer Dabbelt wrote:
> >
> >[...]
> >
> >>I'm going to go produce a new set of spinlocks, I think it'll be a bit more
> >>coherent then.
> >>
> >>I'm keeping your other patch in my queue for now, it generally looks good
> >>but I haven't looked closely yet.
> >
> >Patches 1 and 2 address a same issue ("release-to-acquire"); this is also
> >expressed, more or less explicitly, in the corresponding commit messages:
> >it might make sense to "queue" them together, and to build the new locks
> >on top of these (even if this meant "rewrite all of/a large portion of
> >spinlock.h"...).
>
> I agree. IIRC you had a fixup to the first pair of patches, can you submit
> a v2?

I've just sent it (with updated changelog).

Andrea

2018-03-09 18:10:06

by Palmer Dabbelt

[permalink] [raw]
Subject: Re: [RFC PATCH 1/2] riscv/spinlock: Strengthen implementations with fences

On Fri, 09 Mar 2018 04:16:43 PST (-0800), [email protected] wrote:
> On Thu, Mar 08, 2018 at 02:11:12PM -0800, Palmer Dabbelt wrote:
>> On Thu, 08 Mar 2018 13:03:03 PST (-0800), [email protected] wrote:
>> >On Wed, Mar 07, 2018 at 10:33:49AM -0800, Palmer Dabbelt wrote:
>> >
>> >[...]
>> >
>> >>I'm going to go produce a new set of spinlocks, I think it'll be a bit more
>> >>coherent then.
>> >>
>> >>I'm keeping your other patch in my queue for now, it generally looks good
>> >>but I haven't looked closely yet.
>> >
>> >Patches 1 and 2 address a same issue ("release-to-acquire"); this is also
>> >expressed, more or less explicitly, in the corresponding commit messages:
>> >it might make sense to "queue" them together, and to build the new locks
>> >on top of these (even if this meant "rewrite all of/a large portion of
>> >spinlock.h"...).
>>
>> I agree. IIRC you had a fixup to the first pair of patches, can you submit
>> a v2?
>
> I've just sent it (with updated changelog).

Thanks!