commit 47933ad41a86 ("arch: Introduce smp_load_acquire(), smp_store_release()")
allowed only 4- and 8-byte smp_load_acquire, smp_store_release.
So 1- and 2-byte cases weren't implemented in arm64.
Later commit 536fa402221f ("compiler: Allow 1- and 2-byte smp_load_acquire()
and smp_store_release()")
allowed to use 1 and 2 byte smp_load_acquire and smp_store_release
by adjusting the definition of __native_word().
However, 1-,2- byte cases in arm64 version left unimplemented.
Commit 8053871d0f7f ("smp: Fix smp_call_function_single_async() locking")
started to use smp_load_acquire() to load 2-bytes csd->flags.
That crashes arm64 kernel during the boot.
Implement 1,2 byte cases in arm64's smp_load_acquire()
and smp_store_release() to fix this.
Signed-off-by: Andrey Ryabinin <[email protected]>
---
arch/arm64/include/asm/barrier.h | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/arch/arm64/include/asm/barrier.h b/arch/arm64/include/asm/barrier.h
index a5abb00..71f19c4 100644
--- a/arch/arm64/include/asm/barrier.h
+++ b/arch/arm64/include/asm/barrier.h
@@ -65,6 +65,14 @@ do { \
do { \
compiletime_assert_atomic_type(*p); \
switch (sizeof(*p)) { \
+ case 1: \
+ asm volatile ("stlrb %w1, %0" \
+ : "=Q" (*p) : "r" (v) : "memory"); \
+ break; \
+ case 2: \
+ asm volatile ("stlrh %w1, %0" \
+ : "=Q" (*p) : "r" (v) : "memory"); \
+ break; \
case 4: \
asm volatile ("stlr %w1, %0" \
: "=Q" (*p) : "r" (v) : "memory"); \
@@ -81,6 +89,14 @@ do { \
typeof(*p) ___p1; \
compiletime_assert_atomic_type(*p); \
switch (sizeof(*p)) { \
+ case 1: \
+ asm volatile ("ldarb %w0, %1" \
+ : "=r" (___p1) : "Q" (*p) : "memory"); \
+ break; \
+ case 2: \
+ asm volatile ("ldarh %w0, %1" \
+ : "=r" (___p1) : "Q" (*p) : "memory"); \
+ break; \
case 4: \
asm volatile ("ldar %w0, %1" \
: "=r" (___p1) : "Q" (*p) : "memory"); \
--
2.3.5
Hi Andrey,
On Mon, Apr 20, 2015 at 04:45:53PM +0100, Andrey Ryabinin wrote:
> commit 47933ad41a86 ("arch: Introduce smp_load_acquire(), smp_store_release()")
> allowed only 4- and 8-byte smp_load_acquire, smp_store_release.
> So 1- and 2-byte cases weren't implemented in arm64.
> Later commit 536fa402221f ("compiler: Allow 1- and 2-byte smp_load_acquire()
> and smp_store_release()")
> allowed to use 1 and 2 byte smp_load_acquire and smp_store_release
> by adjusting the definition of __native_word().
> However, 1-,2- byte cases in arm64 version left unimplemented.
>
> Commit 8053871d0f7f ("smp: Fix smp_call_function_single_async() locking")
> started to use smp_load_acquire() to load 2-bytes csd->flags.
> That crashes arm64 kernel during the boot.
>
> Implement 1,2 byte cases in arm64's smp_load_acquire()
> and smp_store_release() to fix this.
>
> Signed-off-by: Andrey Ryabinin <[email protected]>
I already have an equivalent patch queued in the arm64/fixes branch[1]. I'll
send a pull shortly.
Will
[1]
https://git.kernel.org/cgit/linux/kernel/git/arm64/linux.git/log/?h=fixes/core
> ---
> arch/arm64/include/asm/barrier.h | 16 ++++++++++++++++
> 1 file changed, 16 insertions(+)
>
> diff --git a/arch/arm64/include/asm/barrier.h b/arch/arm64/include/asm/barrier.h
> index a5abb00..71f19c4 100644
> --- a/arch/arm64/include/asm/barrier.h
> +++ b/arch/arm64/include/asm/barrier.h
> @@ -65,6 +65,14 @@ do { \
> do { \
> compiletime_assert_atomic_type(*p); \
> switch (sizeof(*p)) { \
> + case 1: \
> + asm volatile ("stlrb %w1, %0" \
> + : "=Q" (*p) : "r" (v) : "memory"); \
> + break; \
> + case 2: \
> + asm volatile ("stlrh %w1, %0" \
> + : "=Q" (*p) : "r" (v) : "memory"); \
> + break; \
> case 4: \
> asm volatile ("stlr %w1, %0" \
> : "=Q" (*p) : "r" (v) : "memory"); \
> @@ -81,6 +89,14 @@ do { \
> typeof(*p) ___p1; \
> compiletime_assert_atomic_type(*p); \
> switch (sizeof(*p)) { \
> + case 1: \
> + asm volatile ("ldarb %w0, %1" \
> + : "=r" (___p1) : "Q" (*p) : "memory"); \
> + break; \
> + case 2: \
> + asm volatile ("ldarh %w0, %1" \
> + : "=r" (___p1) : "Q" (*p) : "memory"); \
> + break; \
> case 4: \
> asm volatile ("ldar %w0, %1" \
> : "=r" (___p1) : "Q" (*p) : "memory"); \
> --
> 2.3.5
>
On Mon, Apr 20, 2015 at 8:45 AM, Andrey Ryabinin <[email protected]> wrote:
>
> Commit 8053871d0f7f ("smp: Fix smp_call_function_single_async() locking")
> started to use smp_load_acquire() to load 2-bytes csd->flags.
> That crashes arm64 kernel during the boot.
Ahh. I don't disagree with your patch, but we should just make
csd->flags be an "int" regardless of this issue.
There is absolutely no reason to try to use 16-bit words. It doesn't
save us any space (alignment), and it's just a *bad* idea to depend on
any kind of atomicity of 16-bit entities. Bytes, yes. Words, yes. But
16-bit half-words? There's no point.
So I'll make it "unsigned int" instead of "u16".
But as said, the ARM64 patch definitely is a good idea regardless.
Linus
On Mon, Apr 20, 2015 at 06:45:53PM +0300, Andrey Ryabinin wrote:
> commit 47933ad41a86 ("arch: Introduce smp_load_acquire(), smp_store_release()")
> allowed only 4- and 8-byte smp_load_acquire, smp_store_release.
> So 1- and 2-byte cases weren't implemented in arm64.
> Later commit 536fa402221f ("compiler: Allow 1- and 2-byte smp_load_acquire()
> and smp_store_release()")
> allowed to use 1 and 2 byte smp_load_acquire and smp_store_release
> by adjusting the definition of __native_word().
> However, 1-,2- byte cases in arm64 version left unimplemented.
>
> Commit 8053871d0f7f ("smp: Fix smp_call_function_single_async() locking")
> started to use smp_load_acquire() to load 2-bytes csd->flags.
> That crashes arm64 kernel during the boot.
>
> Implement 1,2 byte cases in arm64's smp_load_acquire()
> and smp_store_release() to fix this.
>
> Signed-off-by: Andrey Ryabinin <[email protected]>
I am introducing a similar smp_load_acquire() case in rcutorture to
replace use of explicit memory barriers, so thank you! ;-)
Reviewed-by: Paul E. McKenney <[email protected]>
> ---
> arch/arm64/include/asm/barrier.h | 16 ++++++++++++++++
> 1 file changed, 16 insertions(+)
>
> diff --git a/arch/arm64/include/asm/barrier.h b/arch/arm64/include/asm/barrier.h
> index a5abb00..71f19c4 100644
> --- a/arch/arm64/include/asm/barrier.h
> +++ b/arch/arm64/include/asm/barrier.h
> @@ -65,6 +65,14 @@ do { \
> do { \
> compiletime_assert_atomic_type(*p); \
> switch (sizeof(*p)) { \
> + case 1: \
> + asm volatile ("stlrb %w1, %0" \
> + : "=Q" (*p) : "r" (v) : "memory"); \
> + break; \
> + case 2: \
> + asm volatile ("stlrh %w1, %0" \
> + : "=Q" (*p) : "r" (v) : "memory"); \
> + break; \
> case 4: \
> asm volatile ("stlr %w1, %0" \
> : "=Q" (*p) : "r" (v) : "memory"); \
> @@ -81,6 +89,14 @@ do { \
> typeof(*p) ___p1; \
> compiletime_assert_atomic_type(*p); \
> switch (sizeof(*p)) { \
> + case 1: \
> + asm volatile ("ldarb %w0, %1" \
> + : "=r" (___p1) : "Q" (*p) : "memory"); \
> + break; \
> + case 2: \
> + asm volatile ("ldarh %w0, %1" \
> + : "=r" (___p1) : "Q" (*p) : "memory"); \
> + break; \
> case 4: \
> asm volatile ("ldar %w0, %1" \
> : "=r" (___p1) : "Q" (*p) : "memory"); \
> --
> 2.3.5
>
On Mon, Apr 20, 2015 at 04:48:24PM +0100, Will Deacon wrote:
> Hi Andrey,
>
> On Mon, Apr 20, 2015 at 04:45:53PM +0100, Andrey Ryabinin wrote:
> > commit 47933ad41a86 ("arch: Introduce smp_load_acquire(), smp_store_release()")
> > allowed only 4- and 8-byte smp_load_acquire, smp_store_release.
> > So 1- and 2-byte cases weren't implemented in arm64.
> > Later commit 536fa402221f ("compiler: Allow 1- and 2-byte smp_load_acquire()
> > and smp_store_release()")
> > allowed to use 1 and 2 byte smp_load_acquire and smp_store_release
> > by adjusting the definition of __native_word().
> > However, 1-,2- byte cases in arm64 version left unimplemented.
> >
> > Commit 8053871d0f7f ("smp: Fix smp_call_function_single_async() locking")
> > started to use smp_load_acquire() to load 2-bytes csd->flags.
> > That crashes arm64 kernel during the boot.
> >
> > Implement 1,2 byte cases in arm64's smp_load_acquire()
> > and smp_store_release() to fix this.
> >
> > Signed-off-by: Andrey Ryabinin <[email protected]>
>
> I already have an equivalent patch queued in the arm64/fixes branch[1]. I'll
> send a pull shortly.
Even better! ;-)
Thanx, Paul
> Will
>
> [1]
> https://git.kernel.org/cgit/linux/kernel/git/arm64/linux.git/log/?h=fixes/core
>
> > ---
> > arch/arm64/include/asm/barrier.h | 16 ++++++++++++++++
> > 1 file changed, 16 insertions(+)
> >
> > diff --git a/arch/arm64/include/asm/barrier.h b/arch/arm64/include/asm/barrier.h
> > index a5abb00..71f19c4 100644
> > --- a/arch/arm64/include/asm/barrier.h
> > +++ b/arch/arm64/include/asm/barrier.h
> > @@ -65,6 +65,14 @@ do { \
> > do { \
> > compiletime_assert_atomic_type(*p); \
> > switch (sizeof(*p)) { \
> > + case 1: \
> > + asm volatile ("stlrb %w1, %0" \
> > + : "=Q" (*p) : "r" (v) : "memory"); \
> > + break; \
> > + case 2: \
> > + asm volatile ("stlrh %w1, %0" \
> > + : "=Q" (*p) : "r" (v) : "memory"); \
> > + break; \
> > case 4: \
> > asm volatile ("stlr %w1, %0" \
> > : "=Q" (*p) : "r" (v) : "memory"); \
> > @@ -81,6 +89,14 @@ do { \
> > typeof(*p) ___p1; \
> > compiletime_assert_atomic_type(*p); \
> > switch (sizeof(*p)) { \
> > + case 1: \
> > + asm volatile ("ldarb %w0, %1" \
> > + : "=r" (___p1) : "Q" (*p) : "memory"); \
> > + break; \
> > + case 2: \
> > + asm volatile ("ldarh %w0, %1" \
> > + : "=r" (___p1) : "Q" (*p) : "memory"); \
> > + break; \
> > case 4: \
> > asm volatile ("ldar %w0, %1" \
> > : "=r" (___p1) : "Q" (*p) : "memory"); \
> > --
> > 2.3.5
> >
>
On Mon, Apr 20, 2015 at 05:08:37PM +0100, Linus Torvalds wrote:
> On Mon, Apr 20, 2015 at 8:45 AM, Andrey Ryabinin <[email protected]> wrote:
> >
> > Commit 8053871d0f7f ("smp: Fix smp_call_function_single_async() locking")
> > started to use smp_load_acquire() to load 2-bytes csd->flags.
> > That crashes arm64 kernel during the boot.
>
> Ahh. I don't disagree with your patch, but we should just make
> csd->flags be an "int" regardless of this issue.
>
> There is absolutely no reason to try to use 16-bit words. It doesn't
> save us any space (alignment), and it's just a *bad* idea to depend on
> any kind of atomicity of 16-bit entities. Bytes, yes. Words, yes. But
> 16-bit half-words? There's no point.
>
> So I'll make it "unsigned int" instead of "u16".
>
> But as said, the ARM64 patch definitely is a good idea regardless.
Sounds good to me and means I can leave the pull for a day or two to see
if any other arm64 fixes queue up.
Will