2019-03-07 09:17:53

by Arnd Bergmann

[permalink] [raw]
Subject: [PATCH 1/2] futex: mark futex_detect_cmpxchg() as 'noinline'

On 32-bit ARM, I got a link failure in futex_init() when building
with clang in some random configurations:

kernel/futex.o:(.text.fixup+0x5c): relocation truncated to fit: R_ARM_JUMP24 against `.init.text'

As far as I can tell, the problem is that a branch is over 16MB
apart in those configurations, but only if it branches back to
the init text.

Marking the futex_detect_cmpxchg() function as noinline and
not __init avoids the problem for me.

Signed-off-by: Arnd Bergmann <[email protected]>
---
kernel/futex.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index c3b73b0311bc..dda77ed9f445 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -3849,7 +3849,7 @@ SYSCALL_DEFINE6(futex_time32, u32 __user *, uaddr, int, op, u32, val,
}
#endif /* CONFIG_COMPAT_32BIT_TIME */

-static void __init futex_detect_cmpxchg(void)
+static noinline void futex_detect_cmpxchg(void)
{
#ifndef CONFIG_HAVE_FUTEX_CMPXCHG
u32 curval;
--
2.20.0



2019-03-07 09:18:33

by Arnd Bergmann

[permalink] [raw]
Subject: [PATCH 2/2] ARM: futex: make futex_detect_cmpxchg more reliable

Passing registers containing zero as both the address (NULL pointer)
and data into cmpxchg_futex_value_locked() leads clang to assign
the same register for both inputs on ARM, which triggers a warning
explaining that this instruction has unpredictable behavior on ARMv5.

/tmp/futex-7e740e.s: Assembler messages:
/tmp/futex-7e740e.s:12713: Warning: source register same as write-back base

This patch was suggested by Mikael Pettersson back in 2011 (!) with gcc-4.4,
as Mikael wrote:
"One way of fixing this is to make uaddr an input/output register, since
"that prevents it from overlapping any other input or output."

but then withdrawn as the warning was determined to be harmless, and it
apparently never showed up again with later gcc versions.

Now the same problem is back when compiling with clang, and we are trying
to get clang to build the kernel without warnings, as gcc normally does.

Cc: Mikael Pettersson <[email protected]>
Cc: Mikael Pettersson <[email protected]>
Cc: Dave Martin <[email protected]>
Link: https://lore.kernel.org/linux-arm-kernel/[email protected]/
Signed-off-by: Arnd Bergmann <[email protected]>
---
arch/arm/include/asm/futex.h | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/arm/include/asm/futex.h b/arch/arm/include/asm/futex.h
index 0a46676b4245..79790912974e 100644
--- a/arch/arm/include/asm/futex.h
+++ b/arch/arm/include/asm/futex.h
@@ -110,13 +110,13 @@ futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr,
preempt_disable();
__ua_flags = uaccess_save_and_enable();
__asm__ __volatile__("@futex_atomic_cmpxchg_inatomic\n"
- "1: " TUSER(ldr) " %1, [%4]\n"
- " teq %1, %2\n"
+ "1: " TUSER(ldr) " %1, [%2]\n"
+ " teq %1, %3\n"
" it eq @ explicit IT needed for the 2b label\n"
- "2: " TUSER(streq) " %3, [%4]\n"
+ "2: " TUSER(streq) " %4, [%2]\n"
__futex_atomic_ex_table("%5")
- : "+r" (ret), "=&r" (val)
- : "r" (oldval), "r" (newval), "r" (uaddr), "Ir" (-EFAULT)
+ : "+&r" (ret), "=&r" (val), "+&r" (uaddr)
+ : "r" (oldval), "r" (newval), "Ir" (-EFAULT)
: "cc", "memory");
uaccess_restore(__ua_flags);

--
2.20.0


2019-03-07 17:21:47

by Joe Perches

[permalink] [raw]
Subject: Re: [PATCH 1/2] futex: mark futex_detect_cmpxchg() as 'noinline'

On Thu, 2019-03-07 at 10:14 +0100, Arnd Bergmann wrote:
> On 32-bit ARM, I got a link failure in futex_init() when building
> with clang in some random configurations:
>
> kernel/futex.o:(.text.fixup+0x5c): relocation truncated to fit: R_ARM_JUMP24 against `.init.text'
>
> As far as I can tell, the problem is that a branch is over 16MB
> apart in those configurations, but only if it branches back to
> the init text.
>
> Marking the futex_detect_cmpxchg() function as noinline and
> not __init avoids the problem for me.

Perhaps the __init and __exit #defines should be noinline
to allow discarding of the code.

These defines are already marked with __section so I'd've
expected these to be noinline anyway.

---
include/linux/init.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/init.h b/include/linux/init.h
index 5255069f5a9f..806215a74064 100644
--- a/include/linux/init.h
+++ b/include/linux/init.h
@@ -47,7 +47,7 @@

/* These are for everybody (although not all archs will actually
discard it in modules) */
-#define __init __section(.init.text) __cold __latent_entropy __noinitretpoline
+#define __init __section(.init.text) __cold __latent_entropy __noinitretpoline noinline
#define __initdata __section(.init.data)
#define __initconst __section(.init.rodata)
#define __exitdata __section(.exit.data)
@@ -80,7 +80,7 @@
#define __exitused __used
#endif

-#define __exit __section(.exit.text) __exitused __cold notrace
+#define __exit __section(.exit.text) __exitused __cold notrace noinline

/* Used for MEMORY_HOTPLUG */
#define __meminit __section(.meminit.text) __cold notrace \




2019-03-07 17:26:49

by Russell King (Oracle)

[permalink] [raw]
Subject: Re: [PATCH 1/2] futex: mark futex_detect_cmpxchg() as 'noinline'

On Thu, Mar 07, 2019 at 09:19:04AM -0800, Joe Perches wrote:
> On Thu, 2019-03-07 at 10:14 +0100, Arnd Bergmann wrote:
> > On 32-bit ARM, I got a link failure in futex_init() when building
> > with clang in some random configurations:
> >
> > kernel/futex.o:(.text.fixup+0x5c): relocation truncated to fit: R_ARM_JUMP24 against `.init.text'
> >
> > As far as I can tell, the problem is that a branch is over 16MB
> > apart in those configurations, but only if it branches back to
> > the init text.
> >
> > Marking the futex_detect_cmpxchg() function as noinline and
> > not __init avoids the problem for me.
>
> Perhaps the __init and __exit #defines should be noinline
> to allow discarding of the code.

How does that help this case? The problem is futex_detect_cmpxchg()
being placed in .init.text which is too far from .text.fixup.
Whether it is inlined or not is immaterial since both
futex_detect_cmpxchg() and its only caller futex_init() are both __init
functions.

It seems to me to be completely sane to have:

static void __init foo(...)
{
}

static int __init foo_init(...)
{
foo();
}

and have the expectation that the compiler _can_, if it desires, inline
foo() into foo_init().

Let me re-iterate: the inlining of futex_detect_cmpxchg() into
futex_init() is not the issue here.

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up

2019-03-07 17:44:09

by Joe Perches

[permalink] [raw]
Subject: Re: [PATCH 1/2] futex: mark futex_detect_cmpxchg() as 'noinline'

On Thu, 2019-03-07 at 17:25 +0000, Russell King - ARM Linux admin wrote:
> On Thu, Mar 07, 2019 at 09:19:04AM -0800, Joe Perches wrote:
> > On Thu, 2019-03-07 at 10:14 +0100, Arnd Bergmann wrote:
> > > On 32-bit ARM, I got a link failure in futex_init() when building
> > > with clang in some random configurations:
> > >
> > > kernel/futex.o:(.text.fixup+0x5c): relocation truncated to fit: R_ARM_JUMP24 against `.init.text'
> > >
> > > As far as I can tell, the problem is that a branch is over 16MB
> > > apart in those configurations, but only if it branches back to
> > > the init text.
> > >
> > > Marking the futex_detect_cmpxchg() function as noinline and
> > > not __init avoids the problem for me.
> >
> > Perhaps the __init and __exit #defines should be noinline
> > to allow discarding of the code.
>
> How does that help this case?

It doesn't particularly.
It does help any other case that might arise.

> It seems to me to be completely sane to have:
>
> static void __init foo(...)
> {
> }
>
> static int __init foo_init(...)
> {
> foo();
> }
>
> and have the expectation that the compiler _can_, if it desires, inline
> foo() into foo_init().

True, but init function performance requirements are
generally trivial and isolating the possibility of
defects like this seems a reasonable trade-off to me.



2019-03-07 18:09:07

by Russell King (Oracle)

[permalink] [raw]
Subject: Re: [PATCH 1/2] futex: mark futex_detect_cmpxchg() as 'noinline'

On Thu, Mar 07, 2019 at 09:42:40AM -0800, Joe Perches wrote:
> On Thu, 2019-03-07 at 17:25 +0000, Russell King - ARM Linux admin wrote:
> > On Thu, Mar 07, 2019 at 09:19:04AM -0800, Joe Perches wrote:
> > > On Thu, 2019-03-07 at 10:14 +0100, Arnd Bergmann wrote:
> > > > On 32-bit ARM, I got a link failure in futex_init() when building
> > > > with clang in some random configurations:
> > > >
> > > > kernel/futex.o:(.text.fixup+0x5c): relocation truncated to fit: R_ARM_JUMP24 against `.init.text'
> > > >
> > > > As far as I can tell, the problem is that a branch is over 16MB
> > > > apart in those configurations, but only if it branches back to
> > > > the init text.
> > > >
> > > > Marking the futex_detect_cmpxchg() function as noinline and
> > > > not __init avoids the problem for me.
> > >
> > > Perhaps the __init and __exit #defines should be noinline
> > > to allow discarding of the code.
> >
> > How does that help this case?
>
> It doesn't particularly.
> It does help any other case that might arise.

Please describe what "any other case" you are thinking of - as already
stated, your patch would have no beneficial effect for the observed
issue.

In fact, it _could_ do exactly the opposite. Where functions can be
inlined, they may become smaller due to no need for function prologue
and epilogue, and other optimisations. Forcing the compiler to avoid
inlining them could result in larger code, which means a higher chance
of triggering the "relocation truncated to fit" problem.

The converse is also a possibility too - it all depends on the inlining
behaviour of the compiler towards static functions, and the resulting
size of the code.

What may be right for one architecture and compiler may not be right
for another.

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up

2019-03-07 18:13:04

by Nick Desaulniers

[permalink] [raw]
Subject: Re: [PATCH 1/2] futex: mark futex_detect_cmpxchg() as 'noinline'

On Thu, Mar 7, 2019 at 1:15 AM Arnd Bergmann <[email protected]> wrote:
>
> On 32-bit ARM, I got a link failure in futex_init() when building
> with clang in some random configurations:
>
> kernel/futex.o:(.text.fixup+0x5c): relocation truncated to fit: R_ARM_JUMP24 against `.init.text'

Do we know what function from the fixup text section is calling
futex_detect_cmpxchg? I'm curious if this is maybe another case of
-Wsection where some function may be in the wrong section?

>
> As far as I can tell, the problem is that a branch is over 16MB
> apart in those configurations, but only if it branches back to
> the init text.
>
> Marking the futex_detect_cmpxchg() function as noinline and
> not __init avoids the problem for me.
>
> Signed-off-by: Arnd Bergmann <[email protected]>
> ---
> kernel/futex.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/futex.c b/kernel/futex.c
> index c3b73b0311bc..dda77ed9f445 100644
> --- a/kernel/futex.c
> +++ b/kernel/futex.c
> @@ -3849,7 +3849,7 @@ SYSCALL_DEFINE6(futex_time32, u32 __user *, uaddr, int, op, u32, val,
> }
> #endif /* CONFIG_COMPAT_32BIT_TIME */
>
> -static void __init futex_detect_cmpxchg(void)
> +static noinline void futex_detect_cmpxchg(void)
> {
> #ifndef CONFIG_HAVE_FUTEX_CMPXCHG
> u32 curval;
> --
> 2.20.0
>


--
Thanks,
~Nick Desaulniers

2019-03-07 18:22:22

by Nathan Chancellor

[permalink] [raw]
Subject: Re: [PATCH 1/2] futex: mark futex_detect_cmpxchg() as 'noinline'

On Thu, Mar 07, 2019 at 10:12:11AM -0800, Nick Desaulniers wrote:
> On Thu, Mar 7, 2019 at 1:15 AM Arnd Bergmann <[email protected]> wrote:
> >
> > On 32-bit ARM, I got a link failure in futex_init() when building
> > with clang in some random configurations:
> >
> > kernel/futex.o:(.text.fixup+0x5c): relocation truncated to fit: R_ARM_JUMP24 against `.init.text'
>
> Do we know what function from the fixup text section is calling
> futex_detect_cmpxchg? I'm curious if this is maybe another case of
> -Wsection where some function may be in the wrong section?
>

Looks like this is the call stack:

futex_init ->
futex_detect_cmpxchg ->
cmpxchg_futex_value_locked ->
futex_atomic_cmpxchg_inatomic

This is the same issue I reported: https://github.com/ClangBuiltLinux/linux/issues/325

Marking arm's futex_atomic_cmpxchg_inatomic as noinline also fixes this
so maybe that's it?

Cheers,
Nathan

> >
> > As far as I can tell, the problem is that a branch is over 16MB
> > apart in those configurations, but only if it branches back to
> > the init text.
> >
> > Marking the futex_detect_cmpxchg() function as noinline and
> > not __init avoids the problem for me.
> >
> > Signed-off-by: Arnd Bergmann <[email protected]>
> > ---
> > kernel/futex.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/kernel/futex.c b/kernel/futex.c
> > index c3b73b0311bc..dda77ed9f445 100644
> > --- a/kernel/futex.c
> > +++ b/kernel/futex.c
> > @@ -3849,7 +3849,7 @@ SYSCALL_DEFINE6(futex_time32, u32 __user *, uaddr, int, op, u32, val,
> > }
> > #endif /* CONFIG_COMPAT_32BIT_TIME */
> >
> > -static void __init futex_detect_cmpxchg(void)
> > +static noinline void futex_detect_cmpxchg(void)
> > {
> > #ifndef CONFIG_HAVE_FUTEX_CMPXCHG
> > u32 curval;
> > --
> > 2.20.0
> >
>
>
> --
> Thanks,
> ~Nick Desaulniers

2019-03-07 19:40:51

by Nick Desaulniers

[permalink] [raw]
Subject: Re: [PATCH 2/2] ARM: futex: make futex_detect_cmpxchg more reliable

On Thu, Mar 7, 2019 at 1:15 AM Arnd Bergmann <[email protected]> wrote:
>
> Passing registers containing zero as both the address (NULL pointer)
> and data into cmpxchg_futex_value_locked() leads clang to assign
> the same register for both inputs on ARM, which triggers a warning
> explaining that this instruction has unpredictable behavior on ARMv5.
>
> /tmp/futex-7e740e.s: Assembler messages:
> /tmp/futex-7e740e.s:12713: Warning: source register same as write-back base
>
> This patch was suggested by Mikael Pettersson back in 2011 (!) with gcc-4.4,
> as Mikael wrote:
> "One way of fixing this is to make uaddr an input/output register, since
> "that prevents it from overlapping any other input or output."
>
> but then withdrawn as the warning was determined to be harmless, and it
> apparently never showed up again with later gcc versions.
>
> Now the same problem is back when compiling with clang, and we are trying
> to get clang to build the kernel without warnings, as gcc normally does.
>
> Cc: Mikael Pettersson <[email protected]>
> Cc: Mikael Pettersson <[email protected]>
> Cc: Dave Martin <[email protected]>
> Link: https://lore.kernel.org/linux-arm-kernel/[email protected]/
> Signed-off-by: Arnd Bergmann <[email protected]>
> ---
> arch/arm/include/asm/futex.h | 10 +++++-----
> 1 file changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/arch/arm/include/asm/futex.h b/arch/arm/include/asm/futex.h
> index 0a46676b4245..79790912974e 100644
> --- a/arch/arm/include/asm/futex.h
> +++ b/arch/arm/include/asm/futex.h
> @@ -110,13 +110,13 @@ futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr,
> preempt_disable();
> __ua_flags = uaccess_save_and_enable();
> __asm__ __volatile__("@futex_atomic_cmpxchg_inatomic\n"
> - "1: " TUSER(ldr) " %1, [%4]\n"
> - " teq %1, %2\n"
> + "1: " TUSER(ldr) " %1, [%2]\n"
> + " teq %1, %3\n"
> " it eq @ explicit IT needed for the 2b label\n"
> - "2: " TUSER(streq) " %3, [%4]\n"
> + "2: " TUSER(streq) " %4, [%2]\n"
> __futex_atomic_ex_table("%5")
> - : "+r" (ret), "=&r" (val)
> - : "r" (oldval), "r" (newval), "r" (uaddr), "Ir" (-EFAULT)
> + : "+&r" (ret), "=&r" (val), "+&r" (uaddr)
> + : "r" (oldval), "r" (newval), "Ir" (-EFAULT)
> : "cc", "memory");
> uaccess_restore(__ua_flags);

Underspecification of constraints to extended inline assembly is a
common issue exposed by other compilers (and possibly but in-effect
infrequently compiler upgrades).
So the reordering of the constraints means the in the assembly (notes
for other reviewers):
%2 -> %3
%3 -> %4
%4 -> %2
Yep, looks good to me, thanks for finding this old patch and resending, Arnd!
Reviewed-by: Nick Desaulniers <[email protected]>

I think it would be good to further credit Mikael with reported by and
suggested by tags, but not sure which email address is preferred?

--
Thanks,
~Nick Desaulniers

2019-03-07 22:25:19

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH 1/2] futex: mark futex_detect_cmpxchg() as 'noinline'

On Thu, Mar 7, 2019 at 7:21 PM Nathan Chancellor
<[email protected]> wrote:
>
> On Thu, Mar 07, 2019 at 10:12:11AM -0800, Nick Desaulniers wrote:
> > On Thu, Mar 7, 2019 at 1:15 AM Arnd Bergmann <[email protected]> wrote:
> > >
> > > On 32-bit ARM, I got a link failure in futex_init() when building
> > > with clang in some random configurations:
> > >
> > > kernel/futex.o:(.text.fixup+0x5c): relocation truncated to fit: R_ARM_JUMP24 against `.init.text'
> >
> > Do we know what function from the fixup text section is calling
> > futex_detect_cmpxchg? I'm curious if this is maybe another case of
> > -Wsection where some function may be in the wrong section?
> >
>
> Looks like this is the call stack:
>
> futex_init ->
> futex_detect_cmpxchg ->
> cmpxchg_futex_value_locked ->
> futex_atomic_cmpxchg_inatomic
>
> This is the same issue I reported: https://github.com/ClangBuiltLinux/linux/issues/325
>
> Marking arm's futex_atomic_cmpxchg_inatomic as noinline also fixes this
> so maybe that's it?

I think part of the problem is the linker sometimes failing to generate
veneers for long calls. I've seen this in the past with relocations in
non-executable sections, but it's possible that the problem here is
having a relocation to a section that doesn't start with ".text".

I have not analyzed the linker behavior in more detail, but have
experimentally shown that the problem doesn't happen if the relocation
is from the .text.fixup segment to .text rather than .init.text. It's
also possible that this is simply a result of the relative distance of
the locations in memory as Russell said, so it could come back
if the kernel grows further.

Arnd

2019-03-07 23:51:13

by Russell King (Oracle)

[permalink] [raw]
Subject: Re: [PATCH 2/2] ARM: futex: make futex_detect_cmpxchg more reliable

On Thu, Mar 07, 2019 at 11:39:08AM -0800, Nick Desaulniers wrote:
> On Thu, Mar 7, 2019 at 1:15 AM Arnd Bergmann <[email protected]> wrote:
> >
> > Passing registers containing zero as both the address (NULL pointer)
> > and data into cmpxchg_futex_value_locked() leads clang to assign
> > the same register for both inputs on ARM, which triggers a warning
> > explaining that this instruction has unpredictable behavior on ARMv5.
> >
> > /tmp/futex-7e740e.s: Assembler messages:
> > /tmp/futex-7e740e.s:12713: Warning: source register same as write-back base
> >
> > This patch was suggested by Mikael Pettersson back in 2011 (!) with gcc-4.4,
> > as Mikael wrote:
> > "One way of fixing this is to make uaddr an input/output register, since
> > "that prevents it from overlapping any other input or output."
> >
> > but then withdrawn as the warning was determined to be harmless, and it
> > apparently never showed up again with later gcc versions.
> >
> > Now the same problem is back when compiling with clang, and we are trying
> > to get clang to build the kernel without warnings, as gcc normally does.
> >
> > Cc: Mikael Pettersson <[email protected]>
> > Cc: Mikael Pettersson <[email protected]>
> > Cc: Dave Martin <[email protected]>
> > Link: https://lore.kernel.org/linux-arm-kernel/[email protected]/
> > Signed-off-by: Arnd Bergmann <[email protected]>
> > ---
> > arch/arm/include/asm/futex.h | 10 +++++-----
> > 1 file changed, 5 insertions(+), 5 deletions(-)
> >
> > diff --git a/arch/arm/include/asm/futex.h b/arch/arm/include/asm/futex.h
> > index 0a46676b4245..79790912974e 100644
> > --- a/arch/arm/include/asm/futex.h
> > +++ b/arch/arm/include/asm/futex.h
> > @@ -110,13 +110,13 @@ futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr,
> > preempt_disable();
> > __ua_flags = uaccess_save_and_enable();
> > __asm__ __volatile__("@futex_atomic_cmpxchg_inatomic\n"
> > - "1: " TUSER(ldr) " %1, [%4]\n"
> > - " teq %1, %2\n"
> > + "1: " TUSER(ldr) " %1, [%2]\n"
> > + " teq %1, %3\n"
> > " it eq @ explicit IT needed for the 2b label\n"
> > - "2: " TUSER(streq) " %3, [%4]\n"
> > + "2: " TUSER(streq) " %4, [%2]\n"
> > __futex_atomic_ex_table("%5")
> > - : "+r" (ret), "=&r" (val)
> > - : "r" (oldval), "r" (newval), "r" (uaddr), "Ir" (-EFAULT)
> > + : "+&r" (ret), "=&r" (val), "+&r" (uaddr)
> > + : "r" (oldval), "r" (newval), "Ir" (-EFAULT)
> > : "cc", "memory");
> > uaccess_restore(__ua_flags);
>
> Underspecification of constraints to extended inline assembly is a
> common issue exposed by other compilers (and possibly but in-effect
> infrequently compiler upgrades).
> So the reordering of the constraints means the in the assembly (notes
> for other reviewers):
> %2 -> %3
> %3 -> %4
> %4 -> %2
> Yep, looks good to me, thanks for finding this old patch and resending, Arnd!

I don't see what is "underspecified" in the original constraints.
Please explain.

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up

2019-03-08 00:06:54

by Nick Desaulniers

[permalink] [raw]
Subject: Re: [PATCH 2/2] ARM: futex: make futex_detect_cmpxchg more reliable

On Thu, Mar 7, 2019 at 3:49 PM Russell King - ARM Linux admin
<[email protected]> wrote:
>
> On Thu, Mar 07, 2019 at 11:39:08AM -0800, Nick Desaulniers wrote:
> > Underspecification of constraints to extended inline assembly is a
> > common issue exposed by other compilers (and possibly but in-effect
> > infrequently compiler upgrades).
>
> I don't see what is "underspecified" in the original constraints.
> Please explain.

From the link:

The problem is that in the T(streq) insn, %3 and %4 MUST be different registers,
but nothing in the asm() constrains them to be different.

> > > "One way of fixing this is to make uaddr an input/output register, since
> > > "that prevents it from overlapping any other input or output."

--
Thanks,
~Nick Desaulniers

2019-03-08 08:58:42

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [PATCH 2/2] ARM: futex: make futex_detect_cmpxchg more reliable

On Fri, 8 Mar 2019 at 00:49, Russell King - ARM Linux admin
<[email protected]> wrote:
>
> On Thu, Mar 07, 2019 at 11:39:08AM -0800, Nick Desaulniers wrote:
> > On Thu, Mar 7, 2019 at 1:15 AM Arnd Bergmann <[email protected]> wrote:
> > >
> > > Passing registers containing zero as both the address (NULL pointer)
> > > and data into cmpxchg_futex_value_locked() leads clang to assign
> > > the same register for both inputs on ARM, which triggers a warning
> > > explaining that this instruction has unpredictable behavior on ARMv5.
> > >
> > > /tmp/futex-7e740e.s: Assembler messages:
> > > /tmp/futex-7e740e.s:12713: Warning: source register same as write-back base
> > >
> > > This patch was suggested by Mikael Pettersson back in 2011 (!) with gcc-4.4,
> > > as Mikael wrote:
> > > "One way of fixing this is to make uaddr an input/output register, since
> > > "that prevents it from overlapping any other input or output."
> > >
> > > but then withdrawn as the warning was determined to be harmless, and it
> > > apparently never showed up again with later gcc versions.
> > >
> > > Now the same problem is back when compiling with clang, and we are trying
> > > to get clang to build the kernel without warnings, as gcc normally does.
> > >
> > > Cc: Mikael Pettersson <[email protected]>
> > > Cc: Mikael Pettersson <[email protected]>
> > > Cc: Dave Martin <[email protected]>
> > > Link: https://lore.kernel.org/linux-arm-kernel/[email protected]/
> > > Signed-off-by: Arnd Bergmann <[email protected]>
> > > ---
> > > arch/arm/include/asm/futex.h | 10 +++++-----
> > > 1 file changed, 5 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/arch/arm/include/asm/futex.h b/arch/arm/include/asm/futex.h
> > > index 0a46676b4245..79790912974e 100644
> > > --- a/arch/arm/include/asm/futex.h
> > > +++ b/arch/arm/include/asm/futex.h
> > > @@ -110,13 +110,13 @@ futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr,
> > > preempt_disable();
> > > __ua_flags = uaccess_save_and_enable();
> > > __asm__ __volatile__("@futex_atomic_cmpxchg_inatomic\n"
> > > - "1: " TUSER(ldr) " %1, [%4]\n"
> > > - " teq %1, %2\n"
> > > + "1: " TUSER(ldr) " %1, [%2]\n"
> > > + " teq %1, %3\n"
> > > " it eq @ explicit IT needed for the 2b label\n"
> > > - "2: " TUSER(streq) " %3, [%4]\n"
> > > + "2: " TUSER(streq) " %4, [%2]\n"
> > > __futex_atomic_ex_table("%5")
> > > - : "+r" (ret), "=&r" (val)
> > > - : "r" (oldval), "r" (newval), "r" (uaddr), "Ir" (-EFAULT)
> > > + : "+&r" (ret), "=&r" (val), "+&r" (uaddr)
> > > + : "r" (oldval), "r" (newval), "Ir" (-EFAULT)
> > > : "cc", "memory");
> > > uaccess_restore(__ua_flags);
> >
> > Underspecification of constraints to extended inline assembly is a
> > common issue exposed by other compilers (and possibly but in-effect
> > infrequently compiler upgrades).
> > So the reordering of the constraints means the in the assembly (notes
> > for other reviewers):
> > %2 -> %3
> > %3 -> %4
> > %4 -> %2
> > Yep, looks good to me, thanks for finding this old patch and resending, Arnd!
>
> I don't see what is "underspecified" in the original constraints.
> Please explain.
>

I agree that that statement makes little sense.

As Russell points out in the referenced thread, there is nothing wrong
with the generated assembly, given that the UNPREDICTABLE opcode is
unreachable in practice. Unfortunately, we have no way to flag this
diagnostic as a known false positive, and AFAICT, there is no reason
we couldn't end up with the same diagnostic popping up for GCC builds
in the future, considering that the register assignment matches the
constraints. (We have seen somewhat similar issues where constant
folded function clones are emitted with a constant argument that could
never occur in reality [0])

Given the above, the only meaningful way to invoke this function is
with different registers assigned to %3 and %4, and so tightening the
constraints to guarantee that does not actually result in worse code
(except maybe for the instantiations that we won't ever call in the
first place). So I think we should fix this.

I wonder if just adding

BUG_ON(__builtin_constant_p(uaddr));

at the beginning makes any difference - this shouldn't result in any
object code differences since the conditional will always evaluate to
false at build time for instantiations we care about.


[0] https://lore.kernel.org/lkml/[email protected]/

2019-03-08 09:54:15

by Russell King (Oracle)

[permalink] [raw]
Subject: Re: [PATCH 2/2] ARM: futex: make futex_detect_cmpxchg more reliable

On Fri, Mar 08, 2019 at 09:57:45AM +0100, Ard Biesheuvel wrote:
> On Fri, 8 Mar 2019 at 00:49, Russell King - ARM Linux admin
> <[email protected]> wrote:
> >
> > On Thu, Mar 07, 2019 at 11:39:08AM -0800, Nick Desaulniers wrote:
> > > On Thu, Mar 7, 2019 at 1:15 AM Arnd Bergmann <[email protected]> wrote:
> > > >
> > > > Passing registers containing zero as both the address (NULL pointer)
> > > > and data into cmpxchg_futex_value_locked() leads clang to assign
> > > > the same register for both inputs on ARM, which triggers a warning
> > > > explaining that this instruction has unpredictable behavior on ARMv5.
> > > >
> > > > /tmp/futex-7e740e.s: Assembler messages:
> > > > /tmp/futex-7e740e.s:12713: Warning: source register same as write-back base
> > > >
> > > > This patch was suggested by Mikael Pettersson back in 2011 (!) with gcc-4.4,
> > > > as Mikael wrote:
> > > > "One way of fixing this is to make uaddr an input/output register, since
> > > > "that prevents it from overlapping any other input or output."
> > > >
> > > > but then withdrawn as the warning was determined to be harmless, and it
> > > > apparently never showed up again with later gcc versions.
> > > >
> > > > Now the same problem is back when compiling with clang, and we are trying
> > > > to get clang to build the kernel without warnings, as gcc normally does.
> > > >
> > > > Cc: Mikael Pettersson <[email protected]>
> > > > Cc: Mikael Pettersson <[email protected]>
> > > > Cc: Dave Martin <[email protected]>
> > > > Link: https://lore.kernel.org/linux-arm-kernel/[email protected]/
> > > > Signed-off-by: Arnd Bergmann <[email protected]>
> > > > ---
> > > > arch/arm/include/asm/futex.h | 10 +++++-----
> > > > 1 file changed, 5 insertions(+), 5 deletions(-)
> > > >
> > > > diff --git a/arch/arm/include/asm/futex.h b/arch/arm/include/asm/futex.h
> > > > index 0a46676b4245..79790912974e 100644
> > > > --- a/arch/arm/include/asm/futex.h
> > > > +++ b/arch/arm/include/asm/futex.h
> > > > @@ -110,13 +110,13 @@ futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr,
> > > > preempt_disable();
> > > > __ua_flags = uaccess_save_and_enable();
> > > > __asm__ __volatile__("@futex_atomic_cmpxchg_inatomic\n"
> > > > - "1: " TUSER(ldr) " %1, [%4]\n"
> > > > - " teq %1, %2\n"
> > > > + "1: " TUSER(ldr) " %1, [%2]\n"
> > > > + " teq %1, %3\n"
> > > > " it eq @ explicit IT needed for the 2b label\n"
> > > > - "2: " TUSER(streq) " %3, [%4]\n"
> > > > + "2: " TUSER(streq) " %4, [%2]\n"
> > > > __futex_atomic_ex_table("%5")
> > > > - : "+r" (ret), "=&r" (val)
> > > > - : "r" (oldval), "r" (newval), "r" (uaddr), "Ir" (-EFAULT)
> > > > + : "+&r" (ret), "=&r" (val), "+&r" (uaddr)
> > > > + : "r" (oldval), "r" (newval), "Ir" (-EFAULT)
> > > > : "cc", "memory");
> > > > uaccess_restore(__ua_flags);
> > >
> > > Underspecification of constraints to extended inline assembly is a
> > > common issue exposed by other compilers (and possibly but in-effect
> > > infrequently compiler upgrades).
> > > So the reordering of the constraints means the in the assembly (notes
> > > for other reviewers):
> > > %2 -> %3
> > > %3 -> %4
> > > %4 -> %2
> > > Yep, looks good to me, thanks for finding this old patch and resending, Arnd!
> >
> > I don't see what is "underspecified" in the original constraints.
> > Please explain.
> >
>
> I agree that that statement makes little sense.
>
> As Russell points out in the referenced thread, there is nothing wrong
> with the generated assembly, given that the UNPREDICTABLE opcode is
> unreachable in practice. Unfortunately, we have no way to flag this
> diagnostic as a known false positive, and AFAICT, there is no reason
> we couldn't end up with the same diagnostic popping up for GCC builds
> in the future, considering that the register assignment matches the
> constraints. (We have seen somewhat similar issues where constant
> folded function clones are emitted with a constant argument that could
> never occur in reality [0])
>
> Given the above, the only meaningful way to invoke this function is
> with different registers assigned to %3 and %4, and so tightening the
> constraints to guarantee that does not actually result in worse code
> (except maybe for the instantiations that we won't ever call in the
> first place). So I think we should fix this.
>
> I wonder if just adding
>
> BUG_ON(__builtin_constant_p(uaddr));
>
> at the beginning makes any difference - this shouldn't result in any
> object code differences since the conditional will always evaluate to
> false at build time for instantiations we care about.
>
>
> [0] https://lore.kernel.org/lkml/[email protected]/

What I'm actually asking is:

The GCC manual says that input operands _may_ overlap output operands
since GCC assumes that input operands are consumed before output
operands are written. This is an explicit statement.

The GCC manual does not say that input operands may overlap with each
other, and the behaviour of GCC thus far (apart from one version,
presumably caused by a bug) has been that input operands are unique.

Clang appears to be different: it allows input operands that are
registers, and contain the same constant value to be the same physical
register.

The assertion is that the constraints are under-specified. I am
questioning that assertion.

If the constraints are under-specified, I would have expected gcc-4.4's
behaviour to have persisted, and we would've been told by gcc's
developers to fix our code. That didn't happen, and instead gcc seems
to have been fixed. So, my conclusion is that it is intentional that
input operands to asm() do not overlap with themselves.

It seems to me that the work-around for clang is to change every input
operand to be an output operand with a "+&r" contraint - an operand
that is both read and written by the "instruction", and that the operand
is "earlyclobber". For something that is really only read, that seems
strange.

Also, reading GCC's manual, it would appear that "+&" is wrong.

`+'
Means that this operand is both read and written by the
instruction.

When the compiler fixes up the operands to satisfy the constraints,
it needs to know which operands are inputs to the instruction and
which are outputs from it. `=' identifies an output; `+'
identifies an operand that is both input and output; all other
^^^^^^^^^^^^^^^^^^^^^
operands are assumed to be input only.

`&'
Means (in a particular alternative) that this operand is an
"earlyclobber" operand, which is modified before the instruction is
finished using the input operands. Therefore, this operand may
not lie in a register that is used as an input operand or as part
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
of any memory address.

So "+" says that this operand is an input but "&" says that it must not
be in a register that is used as an input. That's contradictory, and I
think we can expect GCC to barf or at least end up doing strange stuff,
if not with existing versions, then with future versions.

Hence, I'm asking for clarification why it is thought that the existing
code underspecifies the asm constraints, and I'm trying to get some more
thought about what the constraints should be, in case there is a need to
use "better" constraints.

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up

2019-03-08 09:55:14

by Russell King (Oracle)

[permalink] [raw]
Subject: Re: [PATCH 2/2] ARM: futex: make futex_detect_cmpxchg more reliable

On Thu, Mar 07, 2019 at 04:04:23PM -0800, Nick Desaulniers wrote:
> On Thu, Mar 7, 2019 at 3:49 PM Russell King - ARM Linux admin
> <[email protected]> wrote:
> >
> > On Thu, Mar 07, 2019 at 11:39:08AM -0800, Nick Desaulniers wrote:
> > > Underspecification of constraints to extended inline assembly is a
> > > common issue exposed by other compilers (and possibly but in-effect
> > > infrequently compiler upgrades).
> >
> > I don't see what is "underspecified" in the original constraints.
> > Please explain.
>
> From the link:
>
> The problem is that in the T(streq) insn, %3 and %4 MUST be different registers,
> but nothing in the asm() constrains them to be different.

Thanks, but that is not what I'm asking. See my reply to Ard.

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up

2019-03-08 10:09:30

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [PATCH 2/2] ARM: futex: make futex_detect_cmpxchg more reliable

On Fri, 8 Mar 2019 at 10:53, Russell King - ARM Linux admin
<[email protected]> wrote:
>
> On Fri, Mar 08, 2019 at 09:57:45AM +0100, Ard Biesheuvel wrote:
> > On Fri, 8 Mar 2019 at 00:49, Russell King - ARM Linux admin
> > <[email protected]> wrote:
> > >
> > > On Thu, Mar 07, 2019 at 11:39:08AM -0800, Nick Desaulniers wrote:
> > > > On Thu, Mar 7, 2019 at 1:15 AM Arnd Bergmann <[email protected]> wrote:
> > > > >
> > > > > Passing registers containing zero as both the address (NULL pointer)
> > > > > and data into cmpxchg_futex_value_locked() leads clang to assign
> > > > > the same register for both inputs on ARM, which triggers a warning
> > > > > explaining that this instruction has unpredictable behavior on ARMv5.
> > > > >
> > > > > /tmp/futex-7e740e.s: Assembler messages:
> > > > > /tmp/futex-7e740e.s:12713: Warning: source register same as write-back base
> > > > >
> > > > > This patch was suggested by Mikael Pettersson back in 2011 (!) with gcc-4.4,
> > > > > as Mikael wrote:
> > > > > "One way of fixing this is to make uaddr an input/output register, since
> > > > > "that prevents it from overlapping any other input or output."
> > > > >
> > > > > but then withdrawn as the warning was determined to be harmless, and it
> > > > > apparently never showed up again with later gcc versions.
> > > > >
> > > > > Now the same problem is back when compiling with clang, and we are trying
> > > > > to get clang to build the kernel without warnings, as gcc normally does.
> > > > >
> > > > > Cc: Mikael Pettersson <[email protected]>
> > > > > Cc: Mikael Pettersson <[email protected]>
> > > > > Cc: Dave Martin <[email protected]>
> > > > > Link: https://lore.kernel.org/linux-arm-kernel/[email protected]/
> > > > > Signed-off-by: Arnd Bergmann <[email protected]>
> > > > > ---
> > > > > arch/arm/include/asm/futex.h | 10 +++++-----
> > > > > 1 file changed, 5 insertions(+), 5 deletions(-)
> > > > >
> > > > > diff --git a/arch/arm/include/asm/futex.h b/arch/arm/include/asm/futex.h
> > > > > index 0a46676b4245..79790912974e 100644
> > > > > --- a/arch/arm/include/asm/futex.h
> > > > > +++ b/arch/arm/include/asm/futex.h
> > > > > @@ -110,13 +110,13 @@ futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr,
> > > > > preempt_disable();
> > > > > __ua_flags = uaccess_save_and_enable();
> > > > > __asm__ __volatile__("@futex_atomic_cmpxchg_inatomic\n"
> > > > > - "1: " TUSER(ldr) " %1, [%4]\n"
> > > > > - " teq %1, %2\n"
> > > > > + "1: " TUSER(ldr) " %1, [%2]\n"
> > > > > + " teq %1, %3\n"
> > > > > " it eq @ explicit IT needed for the 2b label\n"
> > > > > - "2: " TUSER(streq) " %3, [%4]\n"
> > > > > + "2: " TUSER(streq) " %4, [%2]\n"
> > > > > __futex_atomic_ex_table("%5")
> > > > > - : "+r" (ret), "=&r" (val)
> > > > > - : "r" (oldval), "r" (newval), "r" (uaddr), "Ir" (-EFAULT)
> > > > > + : "+&r" (ret), "=&r" (val), "+&r" (uaddr)
> > > > > + : "r" (oldval), "r" (newval), "Ir" (-EFAULT)
> > > > > : "cc", "memory");
> > > > > uaccess_restore(__ua_flags);
> > > >
> > > > Underspecification of constraints to extended inline assembly is a
> > > > common issue exposed by other compilers (and possibly but in-effect
> > > > infrequently compiler upgrades).
> > > > So the reordering of the constraints means the in the assembly (notes
> > > > for other reviewers):
> > > > %2 -> %3
> > > > %3 -> %4
> > > > %4 -> %2
> > > > Yep, looks good to me, thanks for finding this old patch and resending, Arnd!
> > >
> > > I don't see what is "underspecified" in the original constraints.
> > > Please explain.
> > >
> >
> > I agree that that statement makes little sense.
> >
> > As Russell points out in the referenced thread, there is nothing wrong
> > with the generated assembly, given that the UNPREDICTABLE opcode is
> > unreachable in practice. Unfortunately, we have no way to flag this
> > diagnostic as a known false positive, and AFAICT, there is no reason
> > we couldn't end up with the same diagnostic popping up for GCC builds
> > in the future, considering that the register assignment matches the
> > constraints. (We have seen somewhat similar issues where constant
> > folded function clones are emitted with a constant argument that could
> > never occur in reality [0])
> >
> > Given the above, the only meaningful way to invoke this function is
> > with different registers assigned to %3 and %4, and so tightening the
> > constraints to guarantee that does not actually result in worse code
> > (except maybe for the instantiations that we won't ever call in the
> > first place). So I think we should fix this.
> >
> > I wonder if just adding
> >
> > BUG_ON(__builtin_constant_p(uaddr));
> >
> > at the beginning makes any difference - this shouldn't result in any
> > object code differences since the conditional will always evaluate to
> > false at build time for instantiations we care about.
> >
> >
> > [0] https://lore.kernel.org/lkml/[email protected]/
>
> What I'm actually asking is:
>
> The GCC manual says that input operands _may_ overlap output operands
> since GCC assumes that input operands are consumed before output
> operands are written. This is an explicit statement.
>
> The GCC manual does not say that input operands may overlap with each
> other, and the behaviour of GCC thus far (apart from one version,
> presumably caused by a bug) has been that input operands are unique.
>

Not entirely. I have run into issues where GCC assumes that registers
that are only used for input operands are left untouched by the asm
code. I.e., if you put an asm() block in a loop and modify an input
register, your code may break on the next pass, even if the input
register does not overlap with an output register.

To me, that seems to suggest that whether or not inputs may overlap is
irrelevant, since they are not expected to be modified.

> Clang appears to be different: it allows input operands that are
> registers, and contain the same constant value to be the same physical
> register.
>
> The assertion is that the constraints are under-specified. I am
> questioning that assertion.
>
> If the constraints are under-specified, I would have expected gcc-4.4's
> behaviour to have persisted, and we would've been told by gcc's
> developers to fix our code. That didn't happen, and instead gcc seems
> to have been fixed. So, my conclusion is that it is intentional that
> input operands to asm() do not overlap with themselves.
>

Whether we hit the error or not is not deterministic. Like in the
ilog2() case I quoted, GCC may decide to instantiate a constant folded
['curried', if you will] clone of a function, and so even if any calls
to futex_atomic_cmpxchg_inatomic() with constant NULL args for newval
and uaddr are compiled, it does not mean they occur like that in the C
code.

> It seems to me that the work-around for clang is to change every input
> operand to be an output operand with a "+&r" contraint - an operand
> that is both read and written by the "instruction", and that the operand
> is "earlyclobber". For something that is really only read, that seems
> strange.
>
> Also, reading GCC's manual, it would appear that "+&" is wrong.
>
> `+'
> Means that this operand is both read and written by the
> instruction.
>
> When the compiler fixes up the operands to satisfy the constraints,
> it needs to know which operands are inputs to the instruction and
> which are outputs from it. `=' identifies an output; `+'
> identifies an operand that is both input and output; all other
> ^^^^^^^^^^^^^^^^^^^^^
> operands are assumed to be input only.
>
> `&'
> Means (in a particular alternative) that this operand is an
> "earlyclobber" operand, which is modified before the instruction is
> finished using the input operands. Therefore, this operand may
> not lie in a register that is used as an input operand or as part
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> of any memory address.
>
> So "+" says that this operand is an input but "&" says that it must not
> be in a register that is used as an input. That's contradictory, and I
> think we can expect GCC to barf or at least end up doing strange stuff,
> if not with existing versions, then with future versions.
>

I wondered about the same thing: given that the asm itself is a black
box to the compiler, it can never reuse an in/output register for
output, so when it is clobbered is irrelevant.

> Hence, I'm asking for clarification why it is thought that the existing
> code underspecifies the asm constraints, and I'm trying to get some more
> thought about what the constraints should be, in case there is a need to
> use "better" constraints.
>

I think the constraints are correct, but as I argued before,
tightening the constraints to ensure that uaddr and newval are not
mapped onto the same register should not result in any object code
changes, except for the case where the compiler instantiated a
constprop clone that is bogus to begin with.

2019-03-08 10:17:54

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [PATCH 2/2] ARM: futex: make futex_detect_cmpxchg more reliable

On Fri, 8 Mar 2019 at 11:08, Ard Biesheuvel <[email protected]> wrote:
>
> On Fri, 8 Mar 2019 at 10:53, Russell King - ARM Linux admin
> <[email protected]> wrote:
> >
> > On Fri, Mar 08, 2019 at 09:57:45AM +0100, Ard Biesheuvel wrote:
> > > On Fri, 8 Mar 2019 at 00:49, Russell King - ARM Linux admin
> > > <[email protected]> wrote:
> > > >
> > > > On Thu, Mar 07, 2019 at 11:39:08AM -0800, Nick Desaulniers wrote:
> > > > > On Thu, Mar 7, 2019 at 1:15 AM Arnd Bergmann <[email protected]> wrote:
> > > > > >
> > > > > > Passing registers containing zero as both the address (NULL pointer)
> > > > > > and data into cmpxchg_futex_value_locked() leads clang to assign
> > > > > > the same register for both inputs on ARM, which triggers a warning
> > > > > > explaining that this instruction has unpredictable behavior on ARMv5.
> > > > > >
> > > > > > /tmp/futex-7e740e.s: Assembler messages:
> > > > > > /tmp/futex-7e740e.s:12713: Warning: source register same as write-back base
> > > > > >
> > > > > > This patch was suggested by Mikael Pettersson back in 2011 (!) with gcc-4.4,
> > > > > > as Mikael wrote:
> > > > > > "One way of fixing this is to make uaddr an input/output register, since
> > > > > > "that prevents it from overlapping any other input or output."
> > > > > >
> > > > > > but then withdrawn as the warning was determined to be harmless, and it
> > > > > > apparently never showed up again with later gcc versions.
> > > > > >
> > > > > > Now the same problem is back when compiling with clang, and we are trying
> > > > > > to get clang to build the kernel without warnings, as gcc normally does.
> > > > > >
> > > > > > Cc: Mikael Pettersson <[email protected]>
> > > > > > Cc: Mikael Pettersson <[email protected]>
> > > > > > Cc: Dave Martin <[email protected]>
> > > > > > Link: https://lore.kernel.org/linux-arm-kernel/[email protected]/
> > > > > > Signed-off-by: Arnd Bergmann <[email protected]>
> > > > > > ---
> > > > > > arch/arm/include/asm/futex.h | 10 +++++-----
> > > > > > 1 file changed, 5 insertions(+), 5 deletions(-)
> > > > > >
> > > > > > diff --git a/arch/arm/include/asm/futex.h b/arch/arm/include/asm/futex.h
> > > > > > index 0a46676b4245..79790912974e 100644
> > > > > > --- a/arch/arm/include/asm/futex.h
> > > > > > +++ b/arch/arm/include/asm/futex.h
> > > > > > @@ -110,13 +110,13 @@ futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr,
> > > > > > preempt_disable();
> > > > > > __ua_flags = uaccess_save_and_enable();
> > > > > > __asm__ __volatile__("@futex_atomic_cmpxchg_inatomic\n"
> > > > > > - "1: " TUSER(ldr) " %1, [%4]\n"
> > > > > > - " teq %1, %2\n"
> > > > > > + "1: " TUSER(ldr) " %1, [%2]\n"
> > > > > > + " teq %1, %3\n"
> > > > > > " it eq @ explicit IT needed for the 2b label\n"
> > > > > > - "2: " TUSER(streq) " %3, [%4]\n"
> > > > > > + "2: " TUSER(streq) " %4, [%2]\n"
> > > > > > __futex_atomic_ex_table("%5")
> > > > > > - : "+r" (ret), "=&r" (val)
> > > > > > - : "r" (oldval), "r" (newval), "r" (uaddr), "Ir" (-EFAULT)
> > > > > > + : "+&r" (ret), "=&r" (val), "+&r" (uaddr)
> > > > > > + : "r" (oldval), "r" (newval), "Ir" (-EFAULT)
> > > > > > : "cc", "memory");
> > > > > > uaccess_restore(__ua_flags);
> > > > >
> > > > > Underspecification of constraints to extended inline assembly is a
> > > > > common issue exposed by other compilers (and possibly but in-effect
> > > > > infrequently compiler upgrades).
> > > > > So the reordering of the constraints means the in the assembly (notes
> > > > > for other reviewers):
> > > > > %2 -> %3
> > > > > %3 -> %4
> > > > > %4 -> %2
> > > > > Yep, looks good to me, thanks for finding this old patch and resending, Arnd!
> > > >
> > > > I don't see what is "underspecified" in the original constraints.
> > > > Please explain.
> > > >
> > >
> > > I agree that that statement makes little sense.
> > >
> > > As Russell points out in the referenced thread, there is nothing wrong
> > > with the generated assembly, given that the UNPREDICTABLE opcode is
> > > unreachable in practice. Unfortunately, we have no way to flag this
> > > diagnostic as a known false positive, and AFAICT, there is no reason
> > > we couldn't end up with the same diagnostic popping up for GCC builds
> > > in the future, considering that the register assignment matches the
> > > constraints. (We have seen somewhat similar issues where constant
> > > folded function clones are emitted with a constant argument that could
> > > never occur in reality [0])
> > >
> > > Given the above, the only meaningful way to invoke this function is
> > > with different registers assigned to %3 and %4, and so tightening the
> > > constraints to guarantee that does not actually result in worse code
> > > (except maybe for the instantiations that we won't ever call in the
> > > first place). So I think we should fix this.
> > >
> > > I wonder if just adding
> > >
> > > BUG_ON(__builtin_constant_p(uaddr));
> > >
> > > at the beginning makes any difference - this shouldn't result in any
> > > object code differences since the conditional will always evaluate to
> > > false at build time for instantiations we care about.
> > >
> > >
> > > [0] https://lore.kernel.org/lkml/[email protected]/
> >
> > What I'm actually asking is:
> >
> > The GCC manual says that input operands _may_ overlap output operands
> > since GCC assumes that input operands are consumed before output
> > operands are written. This is an explicit statement.
> >
> > The GCC manual does not say that input operands may overlap with each
> > other, and the behaviour of GCC thus far (apart from one version,
> > presumably caused by a bug) has been that input operands are unique.
> >
>
> Not entirely. I have run into issues where GCC assumes that registers
> that are only used for input operands are left untouched by the asm
> code. I.e., if you put an asm() block in a loop and modify an input
> register, your code may break on the next pass, even if the input
> register does not overlap with an output register.
>
> To me, that seems to suggest that whether or not inputs may overlap is
> irrelevant, since they are not expected to be modified.
>
> > Clang appears to be different: it allows input operands that are
> > registers, and contain the same constant value to be the same physical
> > register.
> >
> > The assertion is that the constraints are under-specified. I am
> > questioning that assertion.
> >
> > If the constraints are under-specified, I would have expected gcc-4.4's
> > behaviour to have persisted, and we would've been told by gcc's
> > developers to fix our code. That didn't happen, and instead gcc seems
> > to have been fixed. So, my conclusion is that it is intentional that
> > input operands to asm() do not overlap with themselves.
> >
>
> Whether we hit the error or not is not deterministic. Like in the
> ilog2() case I quoted, GCC may decide to instantiate a constant folded
> ['curried', if you will] clone of a function, and so even if any calls
> to futex_atomic_cmpxchg_inatomic() with constant NULL args for newval
> and uaddr are compiled, it does not mean they occur like that in the C
> code.
>
> > It seems to me that the work-around for clang is to change every input
> > operand to be an output operand with a "+&r" contraint - an operand
> > that is both read and written by the "instruction", and that the operand
> > is "earlyclobber". For something that is really only read, that seems
> > strange.
> >
> > Also, reading GCC's manual, it would appear that "+&" is wrong.
> >
> > `+'
> > Means that this operand is both read and written by the
> > instruction.
> >
> > When the compiler fixes up the operands to satisfy the constraints,
> > it needs to know which operands are inputs to the instruction and
> > which are outputs from it. `=' identifies an output; `+'
> > identifies an operand that is both input and output; all other
> > ^^^^^^^^^^^^^^^^^^^^^
> > operands are assumed to be input only.
> >
> > `&'
> > Means (in a particular alternative) that this operand is an
> > "earlyclobber" operand, which is modified before the instruction is
> > finished using the input operands. Therefore, this operand may
> > not lie in a register that is used as an input operand or as part
> > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > of any memory address.
> >
> > So "+" says that this operand is an input but "&" says that it must not
> > be in a register that is used as an input. That's contradictory, and I
> > think we can expect GCC to barf or at least end up doing strange stuff,
> > if not with existing versions, then with future versions.
> >
>
> I wondered about the same thing: given that the asm itself is a black
> box to the compiler, it can never reuse an in/output register for
> output, so when it is clobbered is irrelevant.
>
> > Hence, I'm asking for clarification why it is thought that the existing
> > code underspecifies the asm constraints, and I'm trying to get some more
> > thought about what the constraints should be, in case there is a need to
> > use "better" constraints.
> >
>
> I think the constraints are correct, but as I argued before,
> tightening the constraints to ensure that uaddr and newval are not
> mapped onto the same register should not result in any object code
> changes, except for the case where the compiler instantiated a
> constprop clone that is bogus to begin with.

Compiling the following code

"""
#include <stdio.h>

static void foo(void *a, int b)
{
asm("str %0, [%1]" :: "r"(a), "r"(b));
}

int main(void)
{
foo(NULL, 0);
}
"""

with GCC 6.3 (at -O2) gives me

.arch armv7-a
.eabi_attribute 28, 1
.eabi_attribute 20, 1
.eabi_attribute 21, 1
.eabi_attribute 23, 3
.eabi_attribute 24, 1
.eabi_attribute 25, 1
.eabi_attribute 26, 2
.eabi_attribute 30, 2
.eabi_attribute 34, 1
.eabi_attribute 18, 4
.file "futex.c"
.section .text.startup,"ax",%progbits
.align 1
.p2align 2,,3
.global main
.syntax unified
.thumb
.thumb_func
.fpu vfpv3-d16
.type main, %function
main:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
movs r0, #0
.syntax unified
@ 6 "/tmp/futex.c" 1
str r0, [r0]
@ 0 "" 2
.thumb
.syntax unified
bx lr
.size main, .-main
.ident "GCC: (Debian 6.3.0-18) 6.3.0 20170516"
.section .note.GNU-stack,"",%progbits

and so GCC definitely behaves similar in this regard.

2019-03-08 10:35:38

by Russell King (Oracle)

[permalink] [raw]
Subject: Re: [PATCH 2/2] ARM: futex: make futex_detect_cmpxchg more reliable

On Fri, Mar 08, 2019 at 11:08:40AM +0100, Ard Biesheuvel wrote:
> On Fri, 8 Mar 2019 at 10:53, Russell King - ARM Linux admin
> <[email protected]> wrote:
> >
> > On Fri, Mar 08, 2019 at 09:57:45AM +0100, Ard Biesheuvel wrote:
> > > On Fri, 8 Mar 2019 at 00:49, Russell King - ARM Linux admin
> > > <[email protected]> wrote:
> > > >
> > > > On Thu, Mar 07, 2019 at 11:39:08AM -0800, Nick Desaulniers wrote:
> > > > > On Thu, Mar 7, 2019 at 1:15 AM Arnd Bergmann <[email protected]> wrote:
> > > > > >
> > > > > > Passing registers containing zero as both the address (NULL pointer)
> > > > > > and data into cmpxchg_futex_value_locked() leads clang to assign
> > > > > > the same register for both inputs on ARM, which triggers a warning
> > > > > > explaining that this instruction has unpredictable behavior on ARMv5.
> > > > > >
> > > > > > /tmp/futex-7e740e.s: Assembler messages:
> > > > > > /tmp/futex-7e740e.s:12713: Warning: source register same as write-back base
> > > > > >
> > > > > > This patch was suggested by Mikael Pettersson back in 2011 (!) with gcc-4.4,
> > > > > > as Mikael wrote:
> > > > > > "One way of fixing this is to make uaddr an input/output register, since
> > > > > > "that prevents it from overlapping any other input or output."
> > > > > >
> > > > > > but then withdrawn as the warning was determined to be harmless, and it
> > > > > > apparently never showed up again with later gcc versions.
> > > > > >
> > > > > > Now the same problem is back when compiling with clang, and we are trying
> > > > > > to get clang to build the kernel without warnings, as gcc normally does.
> > > > > >
> > > > > > Cc: Mikael Pettersson <[email protected]>
> > > > > > Cc: Mikael Pettersson <[email protected]>
> > > > > > Cc: Dave Martin <[email protected]>
> > > > > > Link: https://lore.kernel.org/linux-arm-kernel/[email protected]/
> > > > > > Signed-off-by: Arnd Bergmann <[email protected]>
> > > > > > ---
> > > > > > arch/arm/include/asm/futex.h | 10 +++++-----
> > > > > > 1 file changed, 5 insertions(+), 5 deletions(-)
> > > > > >
> > > > > > diff --git a/arch/arm/include/asm/futex.h b/arch/arm/include/asm/futex.h
> > > > > > index 0a46676b4245..79790912974e 100644
> > > > > > --- a/arch/arm/include/asm/futex.h
> > > > > > +++ b/arch/arm/include/asm/futex.h
> > > > > > @@ -110,13 +110,13 @@ futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr,
> > > > > > preempt_disable();
> > > > > > __ua_flags = uaccess_save_and_enable();
> > > > > > __asm__ __volatile__("@futex_atomic_cmpxchg_inatomic\n"
> > > > > > - "1: " TUSER(ldr) " %1, [%4]\n"
> > > > > > - " teq %1, %2\n"
> > > > > > + "1: " TUSER(ldr) " %1, [%2]\n"
> > > > > > + " teq %1, %3\n"
> > > > > > " it eq @ explicit IT needed for the 2b label\n"
> > > > > > - "2: " TUSER(streq) " %3, [%4]\n"
> > > > > > + "2: " TUSER(streq) " %4, [%2]\n"
> > > > > > __futex_atomic_ex_table("%5")
> > > > > > - : "+r" (ret), "=&r" (val)
> > > > > > - : "r" (oldval), "r" (newval), "r" (uaddr), "Ir" (-EFAULT)
> > > > > > + : "+&r" (ret), "=&r" (val), "+&r" (uaddr)
> > > > > > + : "r" (oldval), "r" (newval), "Ir" (-EFAULT)
> > > > > > : "cc", "memory");
> > > > > > uaccess_restore(__ua_flags);
> > > > >
> > > > > Underspecification of constraints to extended inline assembly is a
> > > > > common issue exposed by other compilers (and possibly but in-effect
> > > > > infrequently compiler upgrades).
> > > > > So the reordering of the constraints means the in the assembly (notes
> > > > > for other reviewers):
> > > > > %2 -> %3
> > > > > %3 -> %4
> > > > > %4 -> %2
> > > > > Yep, looks good to me, thanks for finding this old patch and resending, Arnd!
> > > >
> > > > I don't see what is "underspecified" in the original constraints.
> > > > Please explain.
> > > >
> > >
> > > I agree that that statement makes little sense.
> > >
> > > As Russell points out in the referenced thread, there is nothing wrong
> > > with the generated assembly, given that the UNPREDICTABLE opcode is
> > > unreachable in practice. Unfortunately, we have no way to flag this
> > > diagnostic as a known false positive, and AFAICT, there is no reason
> > > we couldn't end up with the same diagnostic popping up for GCC builds
> > > in the future, considering that the register assignment matches the
> > > constraints. (We have seen somewhat similar issues where constant
> > > folded function clones are emitted with a constant argument that could
> > > never occur in reality [0])
> > >
> > > Given the above, the only meaningful way to invoke this function is
> > > with different registers assigned to %3 and %4, and so tightening the
> > > constraints to guarantee that does not actually result in worse code
> > > (except maybe for the instantiations that we won't ever call in the
> > > first place). So I think we should fix this.
> > >
> > > I wonder if just adding
> > >
> > > BUG_ON(__builtin_constant_p(uaddr));
> > >
> > > at the beginning makes any difference - this shouldn't result in any
> > > object code differences since the conditional will always evaluate to
> > > false at build time for instantiations we care about.
> > >
> > >
> > > [0] https://lore.kernel.org/lkml/[email protected]/
> >
> > What I'm actually asking is:
> >
> > The GCC manual says that input operands _may_ overlap output operands
> > since GCC assumes that input operands are consumed before output
> > operands are written. This is an explicit statement.
> >
> > The GCC manual does not say that input operands may overlap with each
> > other, and the behaviour of GCC thus far (apart from one version,
> > presumably caused by a bug) has been that input operands are unique.
> >
>
> Not entirely. I have run into issues where GCC assumes that registers
> that are only used for input operands are left untouched by the asm
> code. I.e., if you put an asm() block in a loop and modify an input
> register, your code may break on the next pass, even if the input
> register does not overlap with an output register.

GCC has had the expectation for decades that _input_ operands are not
changed in value by the code in the assembly. This isn't quite the
same thing as the uniqueness of the register allocation for input
operands.

> To me, that seems to suggest that whether or not inputs may overlap is
> irrelevant, since they are not expected to be modified.

How is:

stmfd sp!, {r0-r3, ip, lr}
bl foo
ldmfd sp!, {r0-r3, ip, lr}

where r1 may be an input operand (to pass an argument to foo) any
different from:

ldrt r0, [r1]

as far as whether r1 is modified in both cases? In both cases, the
value of r1 is read and written by both instructions, but in both
cases the value of r1 remains the same no matter what the value of r1
was.

The "input operands should not be modified" is entirely orthogonal to
the input operand register allocation.

> > Clang appears to be different: it allows input operands that are
> > registers, and contain the same constant value to be the same physical
> > register.
> >
> > The assertion is that the constraints are under-specified. I am
> > questioning that assertion.
> >
> > If the constraints are under-specified, I would have expected gcc-4.4's
> > behaviour to have persisted, and we would've been told by gcc's
> > developers to fix our code. That didn't happen, and instead gcc seems
> > to have been fixed. So, my conclusion is that it is intentional that
> > input operands to asm() do not overlap with themselves.
> >
>
> Whether we hit the error or not is not deterministic. Like in the
> ilog2() case I quoted, GCC may decide to instantiate a constant folded
> ['curried', if you will] clone of a function, and so even if any calls
> to futex_atomic_cmpxchg_inatomic() with constant NULL args for newval
> and uaddr are compiled, it does not mean they occur like that in the C
> code.

Again, I think this is different: gcc knows what the C code is doing and
can optimise it. GCC doesn't have any idea what the code in an asm() is
doing beyond what the constraints are telling it, and the rules for
those constraints set out in the GCC manual.

Given that we are explicitly talking about the register allocation for
input operands, I'm not sure how the ilog2() case you mention applies.

> > It seems to me that the work-around for clang is to change every input
> > operand to be an output operand with a "+&r" contraint - an operand
> > that is both read and written by the "instruction", and that the operand
> > is "earlyclobber". For something that is really only read, that seems
> > strange.
> >
> > Also, reading GCC's manual, it would appear that "+&" is wrong.
> >
> > `+'
> > Means that this operand is both read and written by the
> > instruction.
> >
> > When the compiler fixes up the operands to satisfy the constraints,
> > it needs to know which operands are inputs to the instruction and
> > which are outputs from it. `=' identifies an output; `+'
> > identifies an operand that is both input and output; all other
> > ^^^^^^^^^^^^^^^^^^^^^
> > operands are assumed to be input only.
> >
> > `&'
> > Means (in a particular alternative) that this operand is an
> > "earlyclobber" operand, which is modified before the instruction is
> > finished using the input operands. Therefore, this operand may
> > not lie in a register that is used as an input operand or as part
> > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > of any memory address.
> >
> > So "+" says that this operand is an input but "&" says that it must not
> > be in a register that is used as an input. That's contradictory, and I
> > think we can expect GCC to barf or at least end up doing strange stuff,
> > if not with existing versions, then with future versions.
> >
>
> I wondered about the same thing: given that the asm itself is a black
> box to the compiler, it can never reuse an in/output register for
> output, so when it is clobbered is irrelevant.

Let me try again - you seem to have completely missed my point.

+ specifies that the operand is an input.
& specifies that the operand is not an input.

+ and & are contradictory.

GCC is at liberty to not assign a value to an operand with a +&
modifier, or error out such a construction.

>
> > Hence, I'm asking for clarification why it is thought that the existing
> > code underspecifies the asm constraints, and I'm trying to get some more
> > thought about what the constraints should be, in case there is a need to
> > use "better" constraints.
>
> I think the constraints are correct, but as I argued before,
> tightening the constraints to ensure that uaddr and newval are not
> mapped onto the same register should not result in any object code
> changes, except for the case where the compiler instantiated a
> constprop clone that is bogus to begin with.

... by tightening it to an undefined combination of constraint modifiers
that just happens to seem to do the right thing. No, this is not proper
"engineering". This is bodging.

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up

2019-03-08 10:47:04

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [PATCH 2/2] ARM: futex: make futex_detect_cmpxchg more reliable

On Fri, 8 Mar 2019 at 11:34, Russell King - ARM Linux admin
<[email protected]> wrote:
>
> On Fri, Mar 08, 2019 at 11:08:40AM +0100, Ard Biesheuvel wrote:
> > On Fri, 8 Mar 2019 at 10:53, Russell King - ARM Linux admin
> > <[email protected]> wrote:
> > >
> > > On Fri, Mar 08, 2019 at 09:57:45AM +0100, Ard Biesheuvel wrote:
> > > > On Fri, 8 Mar 2019 at 00:49, Russell King - ARM Linux admin
> > > > <[email protected]> wrote:
> > > > >
> > > > > On Thu, Mar 07, 2019 at 11:39:08AM -0800, Nick Desaulniers wrote:
> > > > > > On Thu, Mar 7, 2019 at 1:15 AM Arnd Bergmann <[email protected]> wrote:
> > > > > > >
> > > > > > > Passing registers containing zero as both the address (NULL pointer)
> > > > > > > and data into cmpxchg_futex_value_locked() leads clang to assign
> > > > > > > the same register for both inputs on ARM, which triggers a warning
> > > > > > > explaining that this instruction has unpredictable behavior on ARMv5.
> > > > > > >
> > > > > > > /tmp/futex-7e740e.s: Assembler messages:
> > > > > > > /tmp/futex-7e740e.s:12713: Warning: source register same as write-back base
> > > > > > >
> > > > > > > This patch was suggested by Mikael Pettersson back in 2011 (!) with gcc-4.4,
> > > > > > > as Mikael wrote:
> > > > > > > "One way of fixing this is to make uaddr an input/output register, since
> > > > > > > "that prevents it from overlapping any other input or output."
> > > > > > >
> > > > > > > but then withdrawn as the warning was determined to be harmless, and it
> > > > > > > apparently never showed up again with later gcc versions.
> > > > > > >
> > > > > > > Now the same problem is back when compiling with clang, and we are trying
> > > > > > > to get clang to build the kernel without warnings, as gcc normally does.
> > > > > > >
> > > > > > > Cc: Mikael Pettersson <[email protected]>
> > > > > > > Cc: Mikael Pettersson <[email protected]>
> > > > > > > Cc: Dave Martin <[email protected]>
> > > > > > > Link: https://lore.kernel.org/linux-arm-kernel/[email protected]/
> > > > > > > Signed-off-by: Arnd Bergmann <[email protected]>
> > > > > > > ---
> > > > > > > arch/arm/include/asm/futex.h | 10 +++++-----
> > > > > > > 1 file changed, 5 insertions(+), 5 deletions(-)
> > > > > > >
> > > > > > > diff --git a/arch/arm/include/asm/futex.h b/arch/arm/include/asm/futex.h
> > > > > > > index 0a46676b4245..79790912974e 100644
> > > > > > > --- a/arch/arm/include/asm/futex.h
> > > > > > > +++ b/arch/arm/include/asm/futex.h
> > > > > > > @@ -110,13 +110,13 @@ futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr,
> > > > > > > preempt_disable();
> > > > > > > __ua_flags = uaccess_save_and_enable();
> > > > > > > __asm__ __volatile__("@futex_atomic_cmpxchg_inatomic\n"
> > > > > > > - "1: " TUSER(ldr) " %1, [%4]\n"
> > > > > > > - " teq %1, %2\n"
> > > > > > > + "1: " TUSER(ldr) " %1, [%2]\n"
> > > > > > > + " teq %1, %3\n"
> > > > > > > " it eq @ explicit IT needed for the 2b label\n"
> > > > > > > - "2: " TUSER(streq) " %3, [%4]\n"
> > > > > > > + "2: " TUSER(streq) " %4, [%2]\n"
> > > > > > > __futex_atomic_ex_table("%5")
> > > > > > > - : "+r" (ret), "=&r" (val)
> > > > > > > - : "r" (oldval), "r" (newval), "r" (uaddr), "Ir" (-EFAULT)
> > > > > > > + : "+&r" (ret), "=&r" (val), "+&r" (uaddr)
> > > > > > > + : "r" (oldval), "r" (newval), "Ir" (-EFAULT)
> > > > > > > : "cc", "memory");
> > > > > > > uaccess_restore(__ua_flags);
> > > > > >
> > > > > > Underspecification of constraints to extended inline assembly is a
> > > > > > common issue exposed by other compilers (and possibly but in-effect
> > > > > > infrequently compiler upgrades).
> > > > > > So the reordering of the constraints means the in the assembly (notes
> > > > > > for other reviewers):
> > > > > > %2 -> %3
> > > > > > %3 -> %4
> > > > > > %4 -> %2
> > > > > > Yep, looks good to me, thanks for finding this old patch and resending, Arnd!
> > > > >
> > > > > I don't see what is "underspecified" in the original constraints.
> > > > > Please explain.
> > > > >
> > > >
> > > > I agree that that statement makes little sense.
> > > >
> > > > As Russell points out in the referenced thread, there is nothing wrong
> > > > with the generated assembly, given that the UNPREDICTABLE opcode is
> > > > unreachable in practice. Unfortunately, we have no way to flag this
> > > > diagnostic as a known false positive, and AFAICT, there is no reason
> > > > we couldn't end up with the same diagnostic popping up for GCC builds
> > > > in the future, considering that the register assignment matches the
> > > > constraints. (We have seen somewhat similar issues where constant
> > > > folded function clones are emitted with a constant argument that could
> > > > never occur in reality [0])
> > > >
> > > > Given the above, the only meaningful way to invoke this function is
> > > > with different registers assigned to %3 and %4, and so tightening the
> > > > constraints to guarantee that does not actually result in worse code
> > > > (except maybe for the instantiations that we won't ever call in the
> > > > first place). So I think we should fix this.
> > > >
> > > > I wonder if just adding
> > > >
> > > > BUG_ON(__builtin_constant_p(uaddr));
> > > >
> > > > at the beginning makes any difference - this shouldn't result in any
> > > > object code differences since the conditional will always evaluate to
> > > > false at build time for instantiations we care about.
> > > >
> > > >
> > > > [0] https://lore.kernel.org/lkml/[email protected]/
> > >
> > > What I'm actually asking is:
> > >
> > > The GCC manual says that input operands _may_ overlap output operands
> > > since GCC assumes that input operands are consumed before output
> > > operands are written. This is an explicit statement.
> > >
> > > The GCC manual does not say that input operands may overlap with each
> > > other, and the behaviour of GCC thus far (apart from one version,
> > > presumably caused by a bug) has been that input operands are unique.
> > >
> >
> > Not entirely. I have run into issues where GCC assumes that registers
> > that are only used for input operands are left untouched by the asm
> > code. I.e., if you put an asm() block in a loop and modify an input
> > register, your code may break on the next pass, even if the input
> > register does not overlap with an output register.
>
> GCC has had the expectation for decades that _input_ operands are not
> changed in value by the code in the assembly. This isn't quite the
> same thing as the uniqueness of the register allocation for input
> operands.
>
> > To me, that seems to suggest that whether or not inputs may overlap is
> > irrelevant, since they are not expected to be modified.
>
> How is:
>
> stmfd sp!, {r0-r3, ip, lr}
> bl foo
> ldmfd sp!, {r0-r3, ip, lr}
>
> where r1 may be an input operand (to pass an argument to foo) any
> different from:
>
> ldrt r0, [r1]
>
> as far as whether r1 is modified in both cases? In both cases, the
> value of r1 is read and written by both instructions, but in both
> cases the value of r1 remains the same no matter what the value of r1
> was.
>
> The "input operands should not be modified" is entirely orthogonal to
> the input operand register allocation.
>

The question is whether it is reasonable for GCC to use the same
register for input operands that have the same value. From the
assumption that GCC makes that the asm will not modified follows
directly that we can use the same register for different operands.

And in fact, since that asm code (when built in ARM mode) does modify
the register, uaddr should not be an input operand to begin with. In
other words, there is an actual bug here, and this patch fixes it.



> > > Clang appears to be different: it allows input operands that are
> > > registers, and contain the same constant value to be the same physical
> > > register.
> > >
> > > The assertion is that the constraints are under-specified. I am
> > > questioning that assertion.
> > >
> > > If the constraints are under-specified, I would have expected gcc-4.4's
> > > behaviour to have persisted, and we would've been told by gcc's
> > > developers to fix our code. That didn't happen, and instead gcc seems
> > > to have been fixed. So, my conclusion is that it is intentional that
> > > input operands to asm() do not overlap with themselves.
> > >
> >
> > Whether we hit the error or not is not deterministic. Like in the
> > ilog2() case I quoted, GCC may decide to instantiate a constant folded
> > ['curried', if you will] clone of a function, and so even if any calls
> > to futex_atomic_cmpxchg_inatomic() with constant NULL args for newval
> > and uaddr are compiled, it does not mean they occur like that in the C
> > code.
>
> Again, I think this is different: gcc knows what the C code is doing and
> can optimise it. GCC doesn't have any idea what the code in an asm() is
> doing beyond what the constraints are telling it, and the rules for
> those constraints set out in the GCC manual.
>
> Given that we are explicitly talking about the register allocation for
> input operands, I'm not sure how the ilog2() case you mention applies.
>

The relevance of the ilog2() case is that we are dealing with an
invocation of the function that never actually occurs in the code. The
compiler emits it as part of an optimization step, and this is how we
end up with constant operands for newval and uaddr.

> > > It seems to me that the work-around for clang is to change every input
> > > operand to be an output operand with a "+&r" contraint - an operand
> > > that is both read and written by the "instruction", and that the operand
> > > is "earlyclobber". For something that is really only read, that seems
> > > strange.
> > >
> > > Also, reading GCC's manual, it would appear that "+&" is wrong.
> > >
> > > `+'
> > > Means that this operand is both read and written by the
> > > instruction.
> > >
> > > When the compiler fixes up the operands to satisfy the constraints,
> > > it needs to know which operands are inputs to the instruction and
> > > which are outputs from it. `=' identifies an output; `+'
> > > identifies an operand that is both input and output; all other
> > > ^^^^^^^^^^^^^^^^^^^^^
> > > operands are assumed to be input only.
> > >
> > > `&'
> > > Means (in a particular alternative) that this operand is an
> > > "earlyclobber" operand, which is modified before the instruction is
> > > finished using the input operands. Therefore, this operand may
> > > not lie in a register that is used as an input operand or as part
> > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > > of any memory address.
> > >
> > > So "+" says that this operand is an input but "&" says that it must not
> > > be in a register that is used as an input. That's contradictory, and I
> > > think we can expect GCC to barf or at least end up doing strange stuff,
> > > if not with existing versions, then with future versions.
> > >
> >
> > I wondered about the same thing: given that the asm itself is a black
> > box to the compiler, it can never reuse an in/output register for
> > output, so when it is clobbered is irrelevant.
>
> Let me try again - you seem to have completely missed my point.
>
> + specifies that the operand is an input.
> & specifies that the operand is not an input.
>
> + and & are contradictory.
>
> GCC is at liberty to not assign a value to an operand with a +&
> modifier, or error out such a construction.
>

I agree that the +& does not make sense.

> >
> > > Hence, I'm asking for clarification why it is thought that the existing
> > > code underspecifies the asm constraints, and I'm trying to get some more
> > > thought about what the constraints should be, in case there is a need to
> > > use "better" constraints.
> >
> > I think the constraints are correct, but as I argued before,
> > tightening the constraints to ensure that uaddr and newval are not
> > mapped onto the same register should not result in any object code
> > changes, except for the case where the compiler instantiated a
> > constprop clone that is bogus to begin with.
>
> ... by tightening it to an undefined combination of constraint modifiers
> that just happens to seem to do the right thing. No, this is not proper
> "engineering". This is bodging.
>

As I argued above, using an input operand for uaddr is incorrect (in
ARM mode) since the instruction does modify the register. So modulo
the +&, I think the patch is an improvement.

2019-03-08 11:00:13

by Russell King (Oracle)

[permalink] [raw]
Subject: Re: [PATCH 2/2] ARM: futex: make futex_detect_cmpxchg more reliable

On Fri, Mar 08, 2019 at 11:16:47AM +0100, Ard Biesheuvel wrote:
> Compiling the following code
>
> """
> #include <stdio.h>
>
> static void foo(void *a, int b)
> {
> asm("str %0, [%1]" :: "r"(a), "r"(b));
> }
>
> int main(void)
> {
> foo(NULL, 0);
> }
> """
>
> with GCC 6.3 (at -O2) gives me
>
> .arch armv7-a
> .eabi_attribute 28, 1
> .eabi_attribute 20, 1
> .eabi_attribute 21, 1
> .eabi_attribute 23, 3
> .eabi_attribute 24, 1
> .eabi_attribute 25, 1
> .eabi_attribute 26, 2
> .eabi_attribute 30, 2
> .eabi_attribute 34, 1
> .eabi_attribute 18, 4
> .file "futex.c"
> .section .text.startup,"ax",%progbits
> .align 1
> .p2align 2,,3
> .global main
> .syntax unified
> .thumb
> .thumb_func
> .fpu vfpv3-d16
> .type main, %function
> main:
> @ args = 0, pretend = 0, frame = 0
> @ frame_needed = 0, uses_anonymous_args = 0
> @ link register save eliminated.
> movs r0, #0
> .syntax unified
> @ 6 "/tmp/futex.c" 1
> str r0, [r0]
> @ 0 "" 2
> .thumb
> .syntax unified
> bx lr
> .size main, .-main
> .ident "GCC: (Debian 6.3.0-18) 6.3.0 20170516"
> .section .note.GNU-stack,"",%progbits
>
> and so GCC definitely behaves similar in this regard.

Let's take this further - a volatile is required for these cases to
avoid gcc eliminating the asm() due to the output not being used:

#define NULL ((void *)0)
static void foo(void *a, int b)
{
asm volatile("str %1, [%0]" : "=&r" (a) : "0" (a), "r" (b));
}
int main(void)
{
foo(NULL, 0);
}

produces:

mov r3, #0
mov r2, r3
str r2, [r2]

which looks to me to be incorrect to the GCC manual - the '&' on the
output operand should mean that it does not conflict with other input
operands, but clearly 'r2' has ended up being 'b' as well. I suspect
this is a bug, or if not, is completely counter-intuitive from the
description in the GCC manual.

Using "+r" (a) : "r" (b) also results in:

mov r3, #0
str r3, [r3]

It seems that only using "+&r" (a) : "r" (b) avoids a and b being in
the same register, but I question whether we are stepping into
undefined compiler behaviour with that.

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up

2019-03-08 11:00:59

by Russell King (Oracle)

[permalink] [raw]
Subject: Re: [PATCH 2/2] ARM: futex: make futex_detect_cmpxchg more reliable

On Fri, Mar 08, 2019 at 11:45:21AM +0100, Ard Biesheuvel wrote:
> On Fri, 8 Mar 2019 at 11:34, Russell King - ARM Linux admin
> <[email protected]> wrote:
> >
> > On Fri, Mar 08, 2019 at 11:08:40AM +0100, Ard Biesheuvel wrote:
> > > On Fri, 8 Mar 2019 at 10:53, Russell King - ARM Linux admin
> > > <[email protected]> wrote:
> > > >
> > > > On Fri, Mar 08, 2019 at 09:57:45AM +0100, Ard Biesheuvel wrote:
> > > > > On Fri, 8 Mar 2019 at 00:49, Russell King - ARM Linux admin
> > > > > <[email protected]> wrote:
> > > > > >
> > > > > > On Thu, Mar 07, 2019 at 11:39:08AM -0800, Nick Desaulniers wrote:
> > > > > > > On Thu, Mar 7, 2019 at 1:15 AM Arnd Bergmann <[email protected]> wrote:
> > > > > > > >
> > > > > > > > Passing registers containing zero as both the address (NULL pointer)
> > > > > > > > and data into cmpxchg_futex_value_locked() leads clang to assign
> > > > > > > > the same register for both inputs on ARM, which triggers a warning
> > > > > > > > explaining that this instruction has unpredictable behavior on ARMv5.
> > > > > > > >
> > > > > > > > /tmp/futex-7e740e.s: Assembler messages:
> > > > > > > > /tmp/futex-7e740e.s:12713: Warning: source register same as write-back base
> > > > > > > >
> > > > > > > > This patch was suggested by Mikael Pettersson back in 2011 (!) with gcc-4.4,
> > > > > > > > as Mikael wrote:
> > > > > > > > "One way of fixing this is to make uaddr an input/output register, since
> > > > > > > > "that prevents it from overlapping any other input or output."
> > > > > > > >
> > > > > > > > but then withdrawn as the warning was determined to be harmless, and it
> > > > > > > > apparently never showed up again with later gcc versions.
> > > > > > > >
> > > > > > > > Now the same problem is back when compiling with clang, and we are trying
> > > > > > > > to get clang to build the kernel without warnings, as gcc normally does.
> > > > > > > >
> > > > > > > > Cc: Mikael Pettersson <[email protected]>
> > > > > > > > Cc: Mikael Pettersson <[email protected]>
> > > > > > > > Cc: Dave Martin <[email protected]>
> > > > > > > > Link: https://lore.kernel.org/linux-arm-kernel/[email protected]/
> > > > > > > > Signed-off-by: Arnd Bergmann <[email protected]>
> > > > > > > > ---
> > > > > > > > arch/arm/include/asm/futex.h | 10 +++++-----
> > > > > > > > 1 file changed, 5 insertions(+), 5 deletions(-)
> > > > > > > >
> > > > > > > > diff --git a/arch/arm/include/asm/futex.h b/arch/arm/include/asm/futex.h
> > > > > > > > index 0a46676b4245..79790912974e 100644
> > > > > > > > --- a/arch/arm/include/asm/futex.h
> > > > > > > > +++ b/arch/arm/include/asm/futex.h
> > > > > > > > @@ -110,13 +110,13 @@ futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr,
> > > > > > > > preempt_disable();
> > > > > > > > __ua_flags = uaccess_save_and_enable();
> > > > > > > > __asm__ __volatile__("@futex_atomic_cmpxchg_inatomic\n"
> > > > > > > > - "1: " TUSER(ldr) " %1, [%4]\n"
> > > > > > > > - " teq %1, %2\n"
> > > > > > > > + "1: " TUSER(ldr) " %1, [%2]\n"
> > > > > > > > + " teq %1, %3\n"
> > > > > > > > " it eq @ explicit IT needed for the 2b label\n"
> > > > > > > > - "2: " TUSER(streq) " %3, [%4]\n"
> > > > > > > > + "2: " TUSER(streq) " %4, [%2]\n"
> > > > > > > > __futex_atomic_ex_table("%5")
> > > > > > > > - : "+r" (ret), "=&r" (val)
> > > > > > > > - : "r" (oldval), "r" (newval), "r" (uaddr), "Ir" (-EFAULT)
> > > > > > > > + : "+&r" (ret), "=&r" (val), "+&r" (uaddr)
> > > > > > > > + : "r" (oldval), "r" (newval), "Ir" (-EFAULT)
> > > > > > > > : "cc", "memory");
> > > > > > > > uaccess_restore(__ua_flags);
> > > > > > >
> > > > > > > Underspecification of constraints to extended inline assembly is a
> > > > > > > common issue exposed by other compilers (and possibly but in-effect
> > > > > > > infrequently compiler upgrades).
> > > > > > > So the reordering of the constraints means the in the assembly (notes
> > > > > > > for other reviewers):
> > > > > > > %2 -> %3
> > > > > > > %3 -> %4
> > > > > > > %4 -> %2
> > > > > > > Yep, looks good to me, thanks for finding this old patch and resending, Arnd!
> > > > > >
> > > > > > I don't see what is "underspecified" in the original constraints.
> > > > > > Please explain.
> > > > > >
> > > > >
> > > > > I agree that that statement makes little sense.
> > > > >
> > > > > As Russell points out in the referenced thread, there is nothing wrong
> > > > > with the generated assembly, given that the UNPREDICTABLE opcode is
> > > > > unreachable in practice. Unfortunately, we have no way to flag this
> > > > > diagnostic as a known false positive, and AFAICT, there is no reason
> > > > > we couldn't end up with the same diagnostic popping up for GCC builds
> > > > > in the future, considering that the register assignment matches the
> > > > > constraints. (We have seen somewhat similar issues where constant
> > > > > folded function clones are emitted with a constant argument that could
> > > > > never occur in reality [0])
> > > > >
> > > > > Given the above, the only meaningful way to invoke this function is
> > > > > with different registers assigned to %3 and %4, and so tightening the
> > > > > constraints to guarantee that does not actually result in worse code
> > > > > (except maybe for the instantiations that we won't ever call in the
> > > > > first place). So I think we should fix this.
> > > > >
> > > > > I wonder if just adding
> > > > >
> > > > > BUG_ON(__builtin_constant_p(uaddr));
> > > > >
> > > > > at the beginning makes any difference - this shouldn't result in any
> > > > > object code differences since the conditional will always evaluate to
> > > > > false at build time for instantiations we care about.
> > > > >
> > > > >
> > > > > [0] https://lore.kernel.org/lkml/[email protected]/
> > > >
> > > > What I'm actually asking is:
> > > >
> > > > The GCC manual says that input operands _may_ overlap output operands
> > > > since GCC assumes that input operands are consumed before output
> > > > operands are written. This is an explicit statement.
> > > >
> > > > The GCC manual does not say that input operands may overlap with each
> > > > other, and the behaviour of GCC thus far (apart from one version,
> > > > presumably caused by a bug) has been that input operands are unique.
> > > >
> > >
> > > Not entirely. I have run into issues where GCC assumes that registers
> > > that are only used for input operands are left untouched by the asm
> > > code. I.e., if you put an asm() block in a loop and modify an input
> > > register, your code may break on the next pass, even if the input
> > > register does not overlap with an output register.
> >
> > GCC has had the expectation for decades that _input_ operands are not
> > changed in value by the code in the assembly. This isn't quite the
> > same thing as the uniqueness of the register allocation for input
> > operands.
> >
> > > To me, that seems to suggest that whether or not inputs may overlap is
> > > irrelevant, since they are not expected to be modified.
> >
> > How is:
> >
> > stmfd sp!, {r0-r3, ip, lr}
> > bl foo
> > ldmfd sp!, {r0-r3, ip, lr}
> >
> > where r1 may be an input operand (to pass an argument to foo) any
> > different from:
> >
> > ldrt r0, [r1]
> >
> > as far as whether r1 is modified in both cases? In both cases, the
> > value of r1 is read and written by both instructions, but in both
> > cases the value of r1 remains the same no matter what the value of r1
> > was.
> >
> > The "input operands should not be modified" is entirely orthogonal to
> > the input operand register allocation.
> >
>
> The question is whether it is reasonable for GCC to use the same
> register for input operands that have the same value. From the
> assumption that GCC makes that the asm will not modified follows
> directly that we can use the same register for different operands.
>
> And in fact, since that asm code (when built in ARM mode) does modify
> the register, uaddr should not be an input operand to begin with. In
> other words, there is an actual bug here, and this patch fixes it.

Again, you miss my point.

> > > > Clang appears to be different: it allows input operands that are
> > > > registers, and contain the same constant value to be the same physical
> > > > register.
> > > >
> > > > The assertion is that the constraints are under-specified. I am
> > > > questioning that assertion.
> > > >
> > > > If the constraints are under-specified, I would have expected gcc-4.4's
> > > > behaviour to have persisted, and we would've been told by gcc's
> > > > developers to fix our code. That didn't happen, and instead gcc seems
> > > > to have been fixed. So, my conclusion is that it is intentional that
> > > > input operands to asm() do not overlap with themselves.
> > > >
> > >
> > > Whether we hit the error or not is not deterministic. Like in the
> > > ilog2() case I quoted, GCC may decide to instantiate a constant folded
> > > ['curried', if you will] clone of a function, and so even if any calls
> > > to futex_atomic_cmpxchg_inatomic() with constant NULL args for newval
> > > and uaddr are compiled, it does not mean they occur like that in the C
> > > code.
> >
> > Again, I think this is different: gcc knows what the C code is doing and
> > can optimise it. GCC doesn't have any idea what the code in an asm() is
> > doing beyond what the constraints are telling it, and the rules for
> > those constraints set out in the GCC manual.
> >
> > Given that we are explicitly talking about the register allocation for
> > input operands, I'm not sure how the ilog2() case you mention applies.
> >
>
> The relevance of the ilog2() case is that we are dealing with an
> invocation of the function that never actually occurs in the code. The
> compiler emits it as part of an optimization step, and this is how we
> end up with constant operands for newval and uaddr.
>
> > > > It seems to me that the work-around for clang is to change every input
> > > > operand to be an output operand with a "+&r" contraint - an operand
> > > > that is both read and written by the "instruction", and that the operand
> > > > is "earlyclobber". For something that is really only read, that seems
> > > > strange.
> > > >
> > > > Also, reading GCC's manual, it would appear that "+&" is wrong.
> > > >
> > > > `+'
> > > > Means that this operand is both read and written by the
> > > > instruction.
> > > >
> > > > When the compiler fixes up the operands to satisfy the constraints,
> > > > it needs to know which operands are inputs to the instruction and
> > > > which are outputs from it. `=' identifies an output; `+'
> > > > identifies an operand that is both input and output; all other
> > > > ^^^^^^^^^^^^^^^^^^^^^
> > > > operands are assumed to be input only.
> > > >
> > > > `&'
> > > > Means (in a particular alternative) that this operand is an
> > > > "earlyclobber" operand, which is modified before the instruction is
> > > > finished using the input operands. Therefore, this operand may
> > > > not lie in a register that is used as an input operand or as part
> > > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > > > of any memory address.
> > > >
> > > > So "+" says that this operand is an input but "&" says that it must not
> > > > be in a register that is used as an input. That's contradictory, and I
> > > > think we can expect GCC to barf or at least end up doing strange stuff,
> > > > if not with existing versions, then with future versions.
> > > >
> > >
> > > I wondered about the same thing: given that the asm itself is a black
> > > box to the compiler, it can never reuse an in/output register for
> > > output, so when it is clobbered is irrelevant.
> >
> > Let me try again - you seem to have completely missed my point.
> >
> > + specifies that the operand is an input.
> > & specifies that the operand is not an input.
> >
> > + and & are contradictory.
> >
> > GCC is at liberty to not assign a value to an operand with a +&
> > modifier, or error out such a construction.
> >
>
> I agree that the +& does not make sense.
>
> > >
> > > > Hence, I'm asking for clarification why it is thought that the existing
> > > > code underspecifies the asm constraints, and I'm trying to get some more
> > > > thought about what the constraints should be, in case there is a need to
> > > > use "better" constraints.
> > >
> > > I think the constraints are correct, but as I argued before,
> > > tightening the constraints to ensure that uaddr and newval are not
> > > mapped onto the same register should not result in any object code
> > > changes, except for the case where the compiler instantiated a
> > > constprop clone that is bogus to begin with.
> >
> > ... by tightening it to an undefined combination of constraint modifiers
> > that just happens to seem to do the right thing. No, this is not proper
> > "engineering". This is bodging.
> >
>
> As I argued above, using an input operand for uaddr is incorrect (in
> ARM mode) since the instruction does modify the register. So modulo
> the +&, I think the patch is an improvement.
>

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up

2019-03-08 11:56:38

by Dave Martin

[permalink] [raw]
Subject: Re: [PATCH 2/2] ARM: futex: make futex_detect_cmpxchg more reliable

oN fRI, Mar 08, 2019 at 11:45:21AM +0100, Ard Biesheuvel wrote:
> On Fri, 8 Mar 2019 at 11:34, Russell King - ARM Linux admin
> <[email protected]> wrote:
> >
> > On Fri, Mar 08, 2019 at 11:08:40AM +0100, Ard Biesheuvel wrote:
> > > On Fri, 8 Mar 2019 at 10:53, Russell King - ARM Linux admin
> > > <[email protected]> wrote:
> > > >
> > > > On Fri, Mar 08, 2019 at 09:57:45AM +0100, Ard Biesheuvel wrote:
> > > > > On Fri, 8 Mar 2019 at 00:49, Russell King - ARM Linux admin
> > > > > <[email protected]> wrote:
> > > > > >
> > > > > > On Thu, Mar 07, 2019 at 11:39:08AM -0800, Nick Desaulniers wrote:
> > > > > > > On Thu, Mar 7, 2019 at 1:15 AM Arnd Bergmann <[email protected]> wrote:
> > > > > > > >
> > > > > > > > Passing registers containing zero as both the address (NULL pointer)
> > > > > > > > and data into cmpxchg_futex_value_locked() leads clang to assign
> > > > > > > > the same register for both inputs on ARM, which triggers a warning
> > > > > > > > explaining that this instruction has unpredictable behavior on ARMv5.
> > > > > > > >
> > > > > > > > /tmp/futex-7e740e.s: Assembler messages:
> > > > > > > > /tmp/futex-7e740e.s:12713: Warning: source register same as write-back base
> > > > > > > >
> > > > > > > > This patch was suggested by Mikael Pettersson back in 2011 (!) with gcc-4.4,
> > > > > > > > as Mikael wrote:
> > > > > > > > "One way of fixing this is to make uaddr an input/output register, since
> > > > > > > > "that prevents it from overlapping any other input or output."
> > > > > > > >
> > > > > > > > but then withdrawn as the warning was determined to be harmless, and it
> > > > > > > > apparently never showed up again with later gcc versions.
> > > > > > > >
> > > > > > > > Now the same problem is back when compiling with clang, and we are trying
> > > > > > > > to get clang to build the kernel without warnings, as gcc normally does.
> > > > > > > >
> > > > > > > > Cc: Mikael Pettersson <[email protected]>
> > > > > > > > Cc: Mikael Pettersson <[email protected]>
> > > > > > > > Cc: Dave Martin <[email protected]>
> > > > > > > > Link: https://lore.kernel.org/linux-arm-kernel/[email protected]/
> > > > > > > > Signed-off-by: Arnd Bergmann <[email protected]>
> > > > > > > > ---
> > > > > > > > arch/arm/include/asm/futex.h | 10 +++++-----
> > > > > > > > 1 file changed, 5 insertions(+), 5 deletions(-)
> > > > > > > >
> > > > > > > > diff --git a/arch/arm/include/asm/futex.h b/arch/arm/include/asm/futex.h
> > > > > > > > index 0a46676b4245..79790912974e 100644
> > > > > > > > --- a/arch/arm/include/asm/futex.h
> > > > > > > > +++ b/arch/arm/include/asm/futex.h
> > > > > > > > @@ -110,13 +110,13 @@ futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr,
> > > > > > > > preempt_disable();
> > > > > > > > __ua_flags = uaccess_save_and_enable();
> > > > > > > > __asm__ __volatile__("@futex_atomic_cmpxchg_inatomic\n"
> > > > > > > > - "1: " TUSER(ldr) " %1, [%4]\n"
> > > > > > > > - " teq %1, %2\n"
> > > > > > > > + "1: " TUSER(ldr) " %1, [%2]\n"
> > > > > > > > + " teq %1, %3\n"
> > > > > > > > " it eq @ explicit IT needed for the 2b label\n"
> > > > > > > > - "2: " TUSER(streq) " %3, [%4]\n"
> > > > > > > > + "2: " TUSER(streq) " %4, [%2]\n"
> > > > > > > > __futex_atomic_ex_table("%5")
> > > > > > > > - : "+r" (ret), "=&r" (val)
> > > > > > > > - : "r" (oldval), "r" (newval), "r" (uaddr), "Ir" (-EFAULT)
> > > > > > > > + : "+&r" (ret), "=&r" (val), "+&r" (uaddr)
> > > > > > > > + : "r" (oldval), "r" (newval), "Ir" (-EFAULT)
> > > > > > > > : "cc", "memory");
> > > > > > > > uaccess_restore(__ua_flags);
> > > > > > >
> > > > > > > Underspecification of constraints to extended inline assembly is a
> > > > > > > common issue exposed by other compilers (and possibly but in-effect
> > > > > > > infrequently compiler upgrades).
> > > > > > > So the reordering of the constraints means the in the assembly (notes
> > > > > > > for other reviewers):
> > > > > > > %2 -> %3
> > > > > > > %3 -> %4
> > > > > > > %4 -> %2
> > > > > > > Yep, looks good to me, thanks for finding this old patch and resending, Arnd!
> > > > > >
> > > > > > I don't see what is "underspecified" in the original constraints.
> > > > > > Please explain.
> > > > > >
> > > > >
> > > > > I agree that that statement makes little sense.
> > > > >
> > > > > As Russell points out in the referenced thread, there is nothing wrong
> > > > > with the generated assembly, given that the UNPREDICTABLE opcode is
> > > > > unreachable in practice. Unfortunately, we have no way to flag this
> > > > > diagnostic as a known false positive, and AFAICT, there is no reason
> > > > > we couldn't end up with the same diagnostic popping up for GCC builds
> > > > > in the future, considering that the register assignment matches the
> > > > > constraints. (We have seen somewhat similar issues where constant
> > > > > folded function clones are emitted with a constant argument that could
> > > > > never occur in reality [0])
> > > > >
> > > > > Given the above, the only meaningful way to invoke this function is
> > > > > with different registers assigned to %3 and %4, and so tightening the
> > > > > constraints to guarantee that does not actually result in worse code
> > > > > (except maybe for the instantiations that we won't ever call in the
> > > > > first place). So I think we should fix this.
> > > > >
> > > > > I wonder if just adding
> > > > >
> > > > > BUG_ON(__builtin_constant_p(uaddr));
> > > > >
> > > > > at the beginning makes any difference - this shouldn't result in any
> > > > > object code differences since the conditional will always evaluate to
> > > > > false at build time for instantiations we care about.
> > > > >
> > > > >
> > > > > [0] https://lore.kernel.org/lkml/[email protected]/
> > > >
> > > > What I'm actually asking is:
> > > >
> > > > The GCC manual says that input operands _may_ overlap output operands
> > > > since GCC assumes that input operands are consumed before output
> > > > operands are written. This is an explicit statement.
> > > >
> > > > The GCC manual does not say that input operands may overlap with each
> > > > other, and the behaviour of GCC thus far (apart from one version,
> > > > presumably caused by a bug) has been that input operands are unique.
> > > >
> > >
> > > Not entirely. I have run into issues where GCC assumes that registers
> > > that are only used for input operands are left untouched by the asm
> > > code. I.e., if you put an asm() block in a loop and modify an input
> > > register, your code may break on the next pass, even if the input
> > > register does not overlap with an output register.
> >
> > GCC has had the expectation for decades that _input_ operands are not
> > changed in value by the code in the assembly. This isn't quite the
> > same thing as the uniqueness of the register allocation for input
> > operands.
> >
> > > To me, that seems to suggest that whether or not inputs may overlap is
> > > irrelevant, since they are not expected to be modified.
> >
> > How is:
> >
> > stmfd sp!, {r0-r3, ip, lr}
> > bl foo
> > ldmfd sp!, {r0-r3, ip, lr}
> >
> > where r1 may be an input operand (to pass an argument to foo) any
> > different from:
> >
> > ldrt r0, [r1]
> >
> > as far as whether r1 is modified in both cases? In both cases, the
> > value of r1 is read and written by both instructions, but in both
> > cases the value of r1 remains the same no matter what the value of r1
> > was.
> >
> > The "input operands should not be modified" is entirely orthogonal to
> > the input operand register allocation.
> >
>
> The question is whether it is reasonable for GCC to use the same
> register for input operands that have the same value. From the
> assumption that GCC makes that the asm will not modified follows
> directly that we can use the same register for different operands.

Whether "reasonable" or not, GCC does it. And I don't think this is new
behaviour...

int f(void)
{
int res;

asm ("ADD %0, %0, %0" : "=r" (res) : "r" (77), "r" (77));

return res;
}

->

00000000 <f>:
0: e3a0004d mov r0, #77 ; 0x4d
4: e0800000 add r0, r0, r0
8: e12fff1e bx lr

> And in fact, since that asm code (when built in ARM mode) does modify
> the register, uaddr should not be an input operand to begin with. In
> other words, there is an actual bug here, and this patch fixes it.

Does the old code modify the register?

As I read it, the register is written (in the ARM case) by the
underlying STRT instruction, but since the post-index offset it 0, the
value written back is the same as the value originally read.

In ARMv7-A,

strt r0, [r0], #imm

will store the original (unmodified) value of r0 if I read the
pseudocode correctly. I can't remember the history, but I think that
older architecture versions provided a choice about whether the
unmodified or modified value was stored. So gas probably just checks
whether the registers are the same and emits a warning to be on the safe
side. If #imm is 0 (as in the existing futex code here) then it may
make no difference in practice though.

So, I'm not absolutely convinced there's a bug here, unless this is
truly specified as UNPREDICATABLE in older arch versions.

But the warning it at least annoying and use of "&" to prevent gas
allocating things to the same register is already widespread for "+"
asm arguments, even for arm.

If the _value_ of the affected operand is not changed by the asm, I
think we don't strictly need "+", but we are using "&" here for its
register allocation side-effects, and "&" (for its original purpose at
least) is only applicable to output ("=" or "+") operands.

So I think the patch probably makes sense.

IMHO the gas documentation is misleading (or at least unhelpful).

Cheers
---Dave

2019-03-08 11:57:02

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [PATCH 2/2] ARM: futex: make futex_detect_cmpxchg more reliable

On Fri, 8 Mar 2019 at 11:58, Russell King - ARM Linux admin
<[email protected]> wrote:
>
> On Fri, Mar 08, 2019 at 11:45:21AM +0100, Ard Biesheuvel wrote:
> > On Fri, 8 Mar 2019 at 11:34, Russell King - ARM Linux admin
> > <[email protected]> wrote:
> > >
> > > On Fri, Mar 08, 2019 at 11:08:40AM +0100, Ard Biesheuvel wrote:
> > > > On Fri, 8 Mar 2019 at 10:53, Russell King - ARM Linux admin
> > > > <[email protected]> wrote:
> > > > >
> > > > > On Fri, Mar 08, 2019 at 09:57:45AM +0100, Ard Biesheuvel wrote:
> > > > > > On Fri, 8 Mar 2019 at 00:49, Russell King - ARM Linux admin
> > > > > > <[email protected]> wrote:
> > > > > > >
> > > > > > > On Thu, Mar 07, 2019 at 11:39:08AM -0800, Nick Desaulniers wrote:
> > > > > > > > On Thu, Mar 7, 2019 at 1:15 AM Arnd Bergmann <[email protected]> wrote:
> > > > > > > > >
> > > > > > > > > Passing registers containing zero as both the address (NULL pointer)
> > > > > > > > > and data into cmpxchg_futex_value_locked() leads clang to assign
> > > > > > > > > the same register for both inputs on ARM, which triggers a warning
> > > > > > > > > explaining that this instruction has unpredictable behavior on ARMv5.
> > > > > > > > >
> > > > > > > > > /tmp/futex-7e740e.s: Assembler messages:
> > > > > > > > > /tmp/futex-7e740e.s:12713: Warning: source register same as write-back base
> > > > > > > > >
> > > > > > > > > This patch was suggested by Mikael Pettersson back in 2011 (!) with gcc-4.4,
> > > > > > > > > as Mikael wrote:
> > > > > > > > > "One way of fixing this is to make uaddr an input/output register, since
> > > > > > > > > "that prevents it from overlapping any other input or output."
> > > > > > > > >
> > > > > > > > > but then withdrawn as the warning was determined to be harmless, and it
> > > > > > > > > apparently never showed up again with later gcc versions.
> > > > > > > > >
> > > > > > > > > Now the same problem is back when compiling with clang, and we are trying
> > > > > > > > > to get clang to build the kernel without warnings, as gcc normally does.
> > > > > > > > >
> > > > > > > > > Cc: Mikael Pettersson <[email protected]>
> > > > > > > > > Cc: Mikael Pettersson <[email protected]>
> > > > > > > > > Cc: Dave Martin <[email protected]>
> > > > > > > > > Link: https://lore.kernel.org/linux-arm-kernel/[email protected]/
> > > > > > > > > Signed-off-by: Arnd Bergmann <[email protected]>
> > > > > > > > > ---
> > > > > > > > > arch/arm/include/asm/futex.h | 10 +++++-----
> > > > > > > > > 1 file changed, 5 insertions(+), 5 deletions(-)
> > > > > > > > >
> > > > > > > > > diff --git a/arch/arm/include/asm/futex.h b/arch/arm/include/asm/futex.h
> > > > > > > > > index 0a46676b4245..79790912974e 100644
> > > > > > > > > --- a/arch/arm/include/asm/futex.h
> > > > > > > > > +++ b/arch/arm/include/asm/futex.h
> > > > > > > > > @@ -110,13 +110,13 @@ futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr,
> > > > > > > > > preempt_disable();
> > > > > > > > > __ua_flags = uaccess_save_and_enable();
> > > > > > > > > __asm__ __volatile__("@futex_atomic_cmpxchg_inatomic\n"
> > > > > > > > > - "1: " TUSER(ldr) " %1, [%4]\n"
> > > > > > > > > - " teq %1, %2\n"
> > > > > > > > > + "1: " TUSER(ldr) " %1, [%2]\n"
> > > > > > > > > + " teq %1, %3\n"
> > > > > > > > > " it eq @ explicit IT needed for the 2b label\n"
> > > > > > > > > - "2: " TUSER(streq) " %3, [%4]\n"
> > > > > > > > > + "2: " TUSER(streq) " %4, [%2]\n"
> > > > > > > > > __futex_atomic_ex_table("%5")
> > > > > > > > > - : "+r" (ret), "=&r" (val)
> > > > > > > > > - : "r" (oldval), "r" (newval), "r" (uaddr), "Ir" (-EFAULT)
> > > > > > > > > + : "+&r" (ret), "=&r" (val), "+&r" (uaddr)
> > > > > > > > > + : "r" (oldval), "r" (newval), "Ir" (-EFAULT)
> > > > > > > > > : "cc", "memory");
> > > > > > > > > uaccess_restore(__ua_flags);
> > > > > > > >
> > > > > > > > Underspecification of constraints to extended inline assembly is a
> > > > > > > > common issue exposed by other compilers (and possibly but in-effect
> > > > > > > > infrequently compiler upgrades).
> > > > > > > > So the reordering of the constraints means the in the assembly (notes
> > > > > > > > for other reviewers):
> > > > > > > > %2 -> %3
> > > > > > > > %3 -> %4
> > > > > > > > %4 -> %2
> > > > > > > > Yep, looks good to me, thanks for finding this old patch and resending, Arnd!
> > > > > > >
> > > > > > > I don't see what is "underspecified" in the original constraints.
> > > > > > > Please explain.
> > > > > > >
> > > > > >
> > > > > > I agree that that statement makes little sense.
> > > > > >
> > > > > > As Russell points out in the referenced thread, there is nothing wrong
> > > > > > with the generated assembly, given that the UNPREDICTABLE opcode is
> > > > > > unreachable in practice. Unfortunately, we have no way to flag this
> > > > > > diagnostic as a known false positive, and AFAICT, there is no reason
> > > > > > we couldn't end up with the same diagnostic popping up for GCC builds
> > > > > > in the future, considering that the register assignment matches the
> > > > > > constraints. (We have seen somewhat similar issues where constant
> > > > > > folded function clones are emitted with a constant argument that could
> > > > > > never occur in reality [0])
> > > > > >
> > > > > > Given the above, the only meaningful way to invoke this function is
> > > > > > with different registers assigned to %3 and %4, and so tightening the
> > > > > > constraints to guarantee that does not actually result in worse code
> > > > > > (except maybe for the instantiations that we won't ever call in the
> > > > > > first place). So I think we should fix this.
> > > > > >
> > > > > > I wonder if just adding
> > > > > >
> > > > > > BUG_ON(__builtin_constant_p(uaddr));
> > > > > >
> > > > > > at the beginning makes any difference - this shouldn't result in any
> > > > > > object code differences since the conditional will always evaluate to
> > > > > > false at build time for instantiations we care about.
> > > > > >
> > > > > >
> > > > > > [0] https://lore.kernel.org/lkml/[email protected]/
> > > > >
> > > > > What I'm actually asking is:
> > > > >
> > > > > The GCC manual says that input operands _may_ overlap output operands
> > > > > since GCC assumes that input operands are consumed before output
> > > > > operands are written. This is an explicit statement.
> > > > >
> > > > > The GCC manual does not say that input operands may overlap with each
> > > > > other, and the behaviour of GCC thus far (apart from one version,
> > > > > presumably caused by a bug) has been that input operands are unique.
> > > > >
> > > >
> > > > Not entirely. I have run into issues where GCC assumes that registers
> > > > that are only used for input operands are left untouched by the asm
> > > > code. I.e., if you put an asm() block in a loop and modify an input
> > > > register, your code may break on the next pass, even if the input
> > > > register does not overlap with an output register.
> > >
> > > GCC has had the expectation for decades that _input_ operands are not
> > > changed in value by the code in the assembly. This isn't quite the
> > > same thing as the uniqueness of the register allocation for input
> > > operands.
> > >
> > > > To me, that seems to suggest that whether or not inputs may overlap is
> > > > irrelevant, since they are not expected to be modified.
> > >
> > > How is:
> > >
> > > stmfd sp!, {r0-r3, ip, lr}
> > > bl foo
> > > ldmfd sp!, {r0-r3, ip, lr}
> > >
> > > where r1 may be an input operand (to pass an argument to foo) any
> > > different from:
> > >
> > > ldrt r0, [r1]
> > >
> > > as far as whether r1 is modified in both cases? In both cases, the
> > > value of r1 is read and written by both instructions, but in both
> > > cases the value of r1 remains the same no matter what the value of r1
> > > was.
> > >
> > > The "input operands should not be modified" is entirely orthogonal to
> > > the input operand register allocation.
> > >
> >
> > The question is whether it is reasonable for GCC to use the same
> > register for input operands that have the same value. From the
> > assumption that GCC makes that the asm will not modified follows
> > directly that we can use the same register for different operands.
> >
> > And in fact, since that asm code (when built in ARM mode) does modify
> > the register, uaddr should not be an input operand to begin with. In
> > other words, there is an actual bug here, and this patch fixes it.
>
> Again, you miss my point.
>

Perhaps. So let me summarize what I do understand.

1) if futex_atomic_cmpxchg_inatomic() is instantiated *and executed*
with the same compile time constant value of 0x0 for newval and uaddr,
we end up with an opcode for the STRT instruction that is CONSTRAINED
UNPREDICTABLE, but we will never execute it since the preceding load
will fault and enter the fixup handler.
2) such occurrences of futex_atomic_cmpxchg_inatomic() are unlikely to
occur in the code, but may be instantiated by the compiler when doing
constant propagation (like in the ilog2() case I quoted), but these
instantiations will never be called
3) both clang and gcc may map different inline asm input operands onto
the same register if the value is guaranteed to be the same (i.e.,
they are both compile time constants)

My statement about uaddr was slightly misguided, in the sense that our
invocation of STRT does use the post-index variant, but with an
increment of zero, so the value written back to the register equals
the original value. But it does explain why this opcode is CONSTRAINED
UNPREDICTABLE in the first place.

Given 2) above, I wonder if anyone could confirm whether adding
'BUG_ON(__builtin_constant_p(uaddr))' silences the warning as well.

2019-03-11 14:35:40

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH 2/2] ARM: futex: make futex_detect_cmpxchg more reliable

On Fri, Mar 8, 2019 at 12:56 PM Ard Biesheuvel
<[email protected]> wrote:
> On Fri, 8 Mar 2019 at 11:58, Russell King - ARM Linux admin <[email protected]> wrote:
> > On Fri, Mar 08, 2019 at 11:45:21AM +0100, Ard Biesheuvel wrote:

>
> Perhaps. So let me summarize what I do understand.
>
> 1) if futex_atomic_cmpxchg_inatomic() is instantiated *and executed*
> with the same compile time constant value of 0x0 for newval and uaddr,
> we end up with an opcode for the STRT instruction that is CONSTRAINED
> UNPREDICTABLE, but we will never execute it since the preceding load
> will fault and enter the fixup handler.
> 2) such occurrences of futex_atomic_cmpxchg_inatomic() are unlikely to
> occur in the code, but may be instantiated by the compiler when doing
> constant propagation (like in the ilog2() case I quoted), but these
> instantiations will never be called
> 3) both clang and gcc may map different inline asm input operands onto
> the same register if the value is guaranteed to be the same (i.e.,
> they are both compile time constants)
>
> My statement about uaddr was slightly misguided, in the sense that our
> invocation of STRT does use the post-index variant, but with an
> increment of zero, so the value written back to the register equals
> the original value. But it does explain why this opcode is CONSTRAINED
> UNPREDICTABLE in the first place.
>
> Given 2) above, I wonder if anyone could confirm whether adding
> 'BUG_ON(__builtin_constant_p(uaddr))' silences the warning as well.

Like this?

diff --git a/arch/arm/include/asm/futex.h b/arch/arm/include/asm/futex.h
index 0a46676b4245..e6e9b403cd61 100644
--- a/arch/arm/include/asm/futex.h
+++ b/arch/arm/include/asm/futex.h
@@ -57,6 +57,7 @@ futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr,
/* Prefetching cannot fault */
prefetchw(uaddr);
__ua_flags = uaccess_save_and_enable();
+ BUG_ON(__builtin_constant_p(uaddr) || !uaddr);
__asm__ __volatile__("@futex_atomic_cmpxchg_inatomic\n"
"1: ldrex %1, [%4]\n"
" teq %1, %2\n"

This had no effect here.

My first attempt (before finding the original patch from Mikael Pettersson)
was to change the probe to pass '1' as the value instead of '0', that
worked fine.

Arnd

2019-03-11 14:37:18

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [PATCH 2/2] ARM: futex: make futex_detect_cmpxchg more reliable

On Mon, 11 Mar 2019 at 15:34, Arnd Bergmann <[email protected]> wrote:
>
> On Fri, Mar 8, 2019 at 12:56 PM Ard Biesheuvel
> <[email protected]> wrote:
> > On Fri, 8 Mar 2019 at 11:58, Russell King - ARM Linux admin <[email protected]> wrote:
> > > On Fri, Mar 08, 2019 at 11:45:21AM +0100, Ard Biesheuvel wrote:
>
> >
> > Perhaps. So let me summarize what I do understand.
> >
> > 1) if futex_atomic_cmpxchg_inatomic() is instantiated *and executed*
> > with the same compile time constant value of 0x0 for newval and uaddr,
> > we end up with an opcode for the STRT instruction that is CONSTRAINED
> > UNPREDICTABLE, but we will never execute it since the preceding load
> > will fault and enter the fixup handler.
> > 2) such occurrences of futex_atomic_cmpxchg_inatomic() are unlikely to
> > occur in the code, but may be instantiated by the compiler when doing
> > constant propagation (like in the ilog2() case I quoted), but these
> > instantiations will never be called
> > 3) both clang and gcc may map different inline asm input operands onto
> > the same register if the value is guaranteed to be the same (i.e.,
> > they are both compile time constants)
> >
> > My statement about uaddr was slightly misguided, in the sense that our
> > invocation of STRT does use the post-index variant, but with an
> > increment of zero, so the value written back to the register equals
> > the original value. But it does explain why this opcode is CONSTRAINED
> > UNPREDICTABLE in the first place.
> >
> > Given 2) above, I wonder if anyone could confirm whether adding
> > 'BUG_ON(__builtin_constant_p(uaddr))' silences the warning as well.
>
> Like this?
>
> diff --git a/arch/arm/include/asm/futex.h b/arch/arm/include/asm/futex.h
> index 0a46676b4245..e6e9b403cd61 100644
> --- a/arch/arm/include/asm/futex.h
> +++ b/arch/arm/include/asm/futex.h
> @@ -57,6 +57,7 @@ futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr,
> /* Prefetching cannot fault */
> prefetchw(uaddr);
> __ua_flags = uaccess_save_and_enable();
> + BUG_ON(__builtin_constant_p(uaddr) || !uaddr);
> __asm__ __volatile__("@futex_atomic_cmpxchg_inatomic\n"
> "1: ldrex %1, [%4]\n"
> " teq %1, %2\n"
>
> This had no effect here.
>
> My first attempt (before finding the original patch from Mikael Pettersson)
> was to change the probe to pass '1' as the value instead of '0', that
> worked fine.
>

Which probe is that?

2019-03-11 16:30:59

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH 2/2] ARM: futex: make futex_detect_cmpxchg more reliable

On Mon, Mar 11, 2019 at 3:36 PM Ard Biesheuvel
<[email protected]> wrote:
> On Mon, 11 Mar 2019 at 15:34, Arnd Bergmann <[email protected]> wrote:
> > On Fri, Mar 8, 2019 at 12:56 PM Ard Biesheuvel
> > <[email protected]> wrote:
> > > On Fri, 8 Mar 2019 at 11:58, Russell King - ARM Linux admin <[email protected]> wrote:
> > > > On Fri, Mar 08, 2019 at 11:45:21AM +0100, Ard Biesheuvel wrote:
> >
> > My first attempt (before finding the original patch from Mikael Pettersson)
> > was to change the probe to pass '1' as the value instead of '0', that
> > worked fine.
> >
>
> Which probe is that?

diff --git a/kernel/futex.c b/kernel/futex.c
index c3b73b0311bc..19615ad3c4f7 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -3864,7 +3864,7 @@ static void __init futex_detect_cmpxchg(void)
* implementation, the non-functional ones will return
* -ENOSYS.
*/
- if (cmpxchg_futex_value_locked(&curval, NULL, 0, 0) == -EFAULT)
+ if (cmpxchg_futex_value_locked(&curval, NULL, 1, 1) == -EFAULT)
futex_cmpxchg_enabled = 1;
#endif
}

Arnd

2019-03-11 16:37:11

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [PATCH 2/2] ARM: futex: make futex_detect_cmpxchg more reliable

On Mon, 11 Mar 2019 at 17:30, Arnd Bergmann <[email protected]> wrote:
>
> On Mon, Mar 11, 2019 at 3:36 PM Ard Biesheuvel
> <[email protected]> wrote:
> > On Mon, 11 Mar 2019 at 15:34, Arnd Bergmann <[email protected]> wrote:
> > > On Fri, Mar 8, 2019 at 12:56 PM Ard Biesheuvel
> > > <[email protected]> wrote:
> > > > On Fri, 8 Mar 2019 at 11:58, Russell King - ARM Linux admin <[email protected]> wrote:
> > > > > On Fri, Mar 08, 2019 at 11:45:21AM +0100, Ard Biesheuvel wrote:
> > >
> > > My first attempt (before finding the original patch from Mikael Pettersson)
> > > was to change the probe to pass '1' as the value instead of '0', that
> > > worked fine.
> > >
> >
> > Which probe is that?
>
> diff --git a/kernel/futex.c b/kernel/futex.c
> index c3b73b0311bc..19615ad3c4f7 100644
> --- a/kernel/futex.c
> +++ b/kernel/futex.c
> @@ -3864,7 +3864,7 @@ static void __init futex_detect_cmpxchg(void)
> * implementation, the non-functional ones will return
> * -ENOSYS.
> */
> - if (cmpxchg_futex_value_locked(&curval, NULL, 0, 0) == -EFAULT)
> + if (cmpxchg_futex_value_locked(&curval, NULL, 1, 1) == -EFAULT)
> futex_cmpxchg_enabled = 1;
> #endif
> }
>

Ah ok.

That explains a lot.

Can't we just return -EFAULT if uaddr is NULL? Or does that defeat this check?

Note that PA-RISC has

/* futex.c wants to do a cmpxchg_inatomic on kernel NULL, which is
* our gateway page, and causes no end of trouble...
*/
if (uaccess_kernel() && !uaddr)
return -EFAULT;

2019-03-11 20:59:56

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH 2/2] ARM: futex: make futex_detect_cmpxchg more reliable

On Mon, Mar 11, 2019 at 5:36 PM Ard Biesheuvel
<[email protected]> wrote:
>
> On Mon, 11 Mar 2019 at 17:30, Arnd Bergmann <[email protected]> wrote:
> >
> > On Mon, Mar 11, 2019 at 3:36 PM Ard Biesheuvel
> > <[email protected]> wrote:
> > > On Mon, 11 Mar 2019 at 15:34, Arnd Bergmann <[email protected]> wrote:
> > > > On Fri, Mar 8, 2019 at 12:56 PM Ard Biesheuvel
> > > > <[email protected]> wrote:
> > > > > On Fri, 8 Mar 2019 at 11:58, Russell King - ARM Linux admin <[email protected]> wrote:
> > > > > > On Fri, Mar 08, 2019 at 11:45:21AM +0100, Ard Biesheuvel wrote:
> > > >
> > > > My first attempt (before finding the original patch from Mikael Pettersson)
> > > > was to change the probe to pass '1' as the value instead of '0', that
> > > > worked fine.
> > > >
> > >
> > > Which probe is that?
> >
> > diff --git a/kernel/futex.c b/kernel/futex.c
> > index c3b73b0311bc..19615ad3c4f7 100644
> > --- a/kernel/futex.c
> > +++ b/kernel/futex.c
> > @@ -3864,7 +3864,7 @@ static void __init futex_detect_cmpxchg(void)
> > * implementation, the non-functional ones will return
> > * -ENOSYS.
> > */
> > - if (cmpxchg_futex_value_locked(&curval, NULL, 0, 0) == -EFAULT)
> > + if (cmpxchg_futex_value_locked(&curval, NULL, 1, 1) == -EFAULT)
> > futex_cmpxchg_enabled = 1;
> > #endif
> > }
> >
>
> Ah ok.
>
> That explains a lot.
>
> Can't we just return -EFAULT if uaddr is NULL? Or does that defeat this check?

I think that would work here, it would just create a tiny overhead
for each call to futex_atomic_cmpxchg_inatomic().

Semi-related side note:
After I looked at access_ok() for a bit too long, I tried replacing it with

#define access_ok(addr, size) \
(((u64)(uintptr_t)addr + (u64)(size_t)size) >=
current_thread_info()->addr_limit)

which interestingly seemed to improve the output with clang (it lets it
combine multiple access_ok() checks and schedule the instructions better,
compared to our inline asm implementation), but it unfortunately creates
horrible code with gcc.

Arnd

2020-12-13 16:56:19

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 1/2] futex: mark futex_detect_cmpxchg() as 'noinline'

On Sat, Dec 12 2020 at 13:26, Marco Elver wrote:
> On Thu, Mar 07, 2019 at 10:14AM +0100, Arnd Bergmann wrote:
>> -static void __init futex_detect_cmpxchg(void)
>> +static noinline void futex_detect_cmpxchg(void)
>> {
>> #ifndef CONFIG_HAVE_FUTEX_CMPXCHG
>> u32 curval;
>
> What ever happened to this patch?

It obviously fell through the cracks.

> I'm seeing this again with the attached config + next-20201211 (for
> testing https://bugs.llvm.org/show_bug.cgi?id=48492). Had to apply this
> patch to build the kernel.

What really bothers me is to remove the __init from a function which is
clearly only used during init. And looking deeper it's simply a hack.

This function is only needed when an architecture has to runtime
discover whether the CPU supports it or not. ARM has unconditional
support for this, so the obvious thing to do is the below.

Thanks,

tglx
---
arch/arm/Kconfig | 1 +
1 file changed, 1 insertion(+)

--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -86,6 +86,7 @@ config ARM
select HAVE_FTRACE_MCOUNT_RECORD if !XIP_KERNEL
select HAVE_FUNCTION_GRAPH_TRACER if !THUMB2_KERNEL && !CC_IS_CLANG
select HAVE_FUNCTION_TRACER if !XIP_KERNEL
+ select HAVE_FUTEX_CMPXCHG if FUTEX
select HAVE_GCC_PLUGINS
select HAVE_HW_BREAKPOINT if PERF_EVENTS && (CPU_V6 || CPU_V6K || CPU_V7)
select HAVE_IDE if PCI || ISA || PCMCIA


2020-12-13 23:31:37

by Marco Elver

[permalink] [raw]
Subject: Re: [PATCH 1/2] futex: mark futex_detect_cmpxchg() as 'noinline'

On Thu, Mar 07, 2019 at 10:14AM +0100, Arnd Bergmann wrote:
> On 32-bit ARM, I got a link failure in futex_init() when building
> with clang in some random configurations:
>
> kernel/futex.o:(.text.fixup+0x5c): relocation truncated to fit: R_ARM_JUMP24 against `.init.text'
>
> As far as I can tell, the problem is that a branch is over 16MB
> apart in those configurations, but only if it branches back to
> the init text.
>
> Marking the futex_detect_cmpxchg() function as noinline and
> not __init avoids the problem for me.
>
> Signed-off-by: Arnd Bergmann <[email protected]>
> ---
> kernel/futex.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/futex.c b/kernel/futex.c
> index c3b73b0311bc..dda77ed9f445 100644
> --- a/kernel/futex.c
> +++ b/kernel/futex.c
> @@ -3849,7 +3849,7 @@ SYSCALL_DEFINE6(futex_time32, u32 __user *, uaddr, int, op, u32, val,
> }
> #endif /* CONFIG_COMPAT_32BIT_TIME */
>
> -static void __init futex_detect_cmpxchg(void)
> +static noinline void futex_detect_cmpxchg(void)
> {
> #ifndef CONFIG_HAVE_FUTEX_CMPXCHG
> u32 curval;

What ever happened to this patch?

I'm seeing this again with the attached config + next-20201211 (for
testing https://bugs.llvm.org/show_bug.cgi?id=48492). Had to apply this
patch to build the kernel.

Thanks,
-- Marco


Attachments:
(No filename) (1.33 kB)
.config (244.59 kB)
Download all attachments

2020-12-15 06:14:24

by Guo Ren

[permalink] [raw]
Subject: Re: [PATCH 1/2] futex: mark futex_detect_cmpxchg() as 'noinline'

Hi Arnd,

On Mon, Dec 14, 2020 at 9:15 PM Arnd Bergmann <[email protected]> wrote:
>
> On Sat, Dec 12, 2020 at 9:01 PM Thomas Gleixner <[email protected]> wrote:
> >
> > On Sat, Dec 12 2020 at 13:26, Marco Elver wrote:
> > > On Thu, Mar 07, 2019 at 10:14AM +0100, Arnd Bergmann wrote:
> > >> -static void __init futex_detect_cmpxchg(void)
> > >> +static noinline void futex_detect_cmpxchg(void)
> > >> {
> > >> #ifndef CONFIG_HAVE_FUTEX_CMPXCHG
> > >> u32 curval;
> > >
> > > What ever happened to this patch?
> >
> > It obviously fell through the cracks.
> >
> > > I'm seeing this again with the attached config + next-20201211 (for
> > > testing https://bugs.llvm.org/show_bug.cgi?id=48492). Had to apply this
> > > patch to build the kernel.
> >
> > What really bothers me is to remove the __init from a function which is
> > clearly only used during init. And looking deeper it's simply a hack.
> >
> > This function is only needed when an architecture has to runtime
> > discover whether the CPU supports it or not. ARM has unconditional
> > support for this, so the obvious thing to do is the below.
> >
>
> Ah perfect, that is clearly the right solution here.
>
> > --- a/arch/arm/Kconfig
> > +++ b/arch/arm/Kconfig
> > @@ -86,6 +86,7 @@ config ARM
> > select HAVE_FTRACE_MCOUNT_RECORD if !XIP_KERNEL
> > select HAVE_FUNCTION_GRAPH_TRACER if !THUMB2_KERNEL && !CC_IS_CLANG
> > select HAVE_FUNCTION_TRACER if !XIP_KERNEL
> > + select HAVE_FUTEX_CMPXCHG if FUTEX
> > select HAVE_GCC_PLUGINS
> > select HAVE_HW_BREAKPOINT if PERF_EVENTS && (CPU_V6 || CPU_V6K || CPU_V7)
> > select HAVE_IDE if PCI || ISA || PCMCIA
>
> I had a look at what other architectures always implement
> futex_atomic_cmpxchg_inatomic() or can use the asm-generic non-SMP version,
> and I found that it's pretty much all of them, the odd ones being just sparc32
> and csky, which use asm-generic/futex.h but do have an SMP option,
> as well as xtensa
>
> I would guess that for csky, this is a mistake, as the architecture is fairly
> new and should be able to implement it. Not sure about sparc32.

The c610, c807, c810 don't support SMP, so futex_cmpxchg_enabled = 1
with asm-generic's implementation.
For c860, there is no HAVE_FUTEX_CMPXCHG and cmpxchg_inatomic/inuser
implementation, so futex_cmpxchg_enabled = 0.

Thx for point it out, we'll implement cmpxchg_inatomic/inuser for C860
and still use asm-generic for non-smp CPUs:

diff --git a/arch/csky/Kconfig b/arch/csky/Kconfig
index a2189c0..e968c58 100644
--- a/arch/csky/Kconfig
+++ b/arch/csky/Kconfig
@@ -49,6 +49,7 @@ config CSKY
select HAVE_FUNCTION_TRACER
select HAVE_FUNCTION_GRAPH_TRACER
select HAVE_FUNCTION_ERROR_INJECTION
+ select HAVE_FUTEX_CMPXCHG if FUTEX && SMP
select HAVE_FTRACE_MCOUNT_RECORD
select HAVE_KERNEL_GZIP
select HAVE_KERNEL_LZO
diff --git a/arch/csky/include/asm/futex.h b/arch/csky/include/asm/futex.h
new file mode 100644
index 00000000..29275e8
--- /dev/null
+++ b/arch/csky/include/asm/futex.h
@@ -0,0 +1,42 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef __ASM_CSKY_FUTEX_H
+#define __ASM_CSKY_FUTEX_H
+
+#ifndef CONFIG_SMP
+#include <asm-generic/futex.h>
+#else
+#include <linux/futex.h>
+#include <linux/uaccess.h>
+#include <linux/errno.h>
+
+static inline int
+arch_futex_atomic_op_inuser(int op, int oparg, int *oval, u32 __user *uaddr)
+{
+ int oldval = 0, ret = 0;
+
+ if (!access_ok(uaddr, sizeof(u32)))
+ return -EFAULT;
+
+ <...>
+
+ return ret;
+}
+
+static inline int
+futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr,
+ u32 oldval, u32 newval)
+{
+ int ret = 0;
+ u32 val;
+ uintptr_t tmp;
+
+ if (!access_ok(uaddr, sizeof(u32)))
+ return -EFAULT;
+
+ <...>
+
+ return ret;
+}
+#endif
+#endif /* __ASM_CSKY_FUTEX_H */
--
Best Regards
Guo Ren

ML: https://lore.kernel.org/linux-csky/

2020-12-15 11:30:22

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH 1/2] futex: mark futex_detect_cmpxchg() as 'noinline'

On Tue, Dec 15, 2020 at 7:09 AM Guo Ren <[email protected]> wrote:
> On Mon, Dec 14, 2020 at 9:15 PM Arnd Bergmann <[email protected]> wrote:
> > I had a look at what other architectures always implement
> > futex_atomic_cmpxchg_inatomic() or can use the asm-generic non-SMP version,
> > and I found that it's pretty much all of them, the odd ones being just sparc32
> > and csky, which use asm-generic/futex.h but do have an SMP option,
> > as well as xtensa
> >
> > I would guess that for csky, this is a mistake, as the architecture is fairly
> > new and should be able to implement it. Not sure about sparc32.
>
> The c610, c807, c810 don't support SMP, so futex_cmpxchg_enabled = 1
> with asm-generic's implementation.
> For c860, there is no HAVE_FUTEX_CMPXCHG and cmpxchg_inatomic/inuser
> implementation, so futex_cmpxchg_enabled = 0.
>
> Thx for point it out, we'll implement cmpxchg_inatomic/inuser for C860
> and still use asm-generic for non-smp CPUs.

Sounds good to me.

With that, I would suggest we actually remove the -ENOSYS fallback
for arch_futex_atomic_op_inuser() and futex_atomic_cmpxchg_inatomic()
in asm-generic/futex.h as well as the HAVE_FUTEX_CMPXCHG Kconfig
symbol, plus these additional fixups:

- for xtensa and mips configurations without ll/sc, fall back to the
asm-generic version. These are all uniprocessor, while the
corresponding SMP machines have a working
arch_futex_atomic_op_inuser().

- Disable SMP support for sun4m/sun4d. From the historic git
tree, it's unclear how well this ever worked, and very few machines
of this class ever existed

- Mark SMP for LEON as temporarily broken. As I see in the LEON
patch set, they have changes to enable compare-and-swap-atomic
instructions unconditionally, as all SMP Leons have those and
seem to require this support already for other things.

Arnd

2020-12-15 19:45:49

by Sam Ravnborg

[permalink] [raw]
Subject: Re: [PATCH 1/2] futex: mark futex_detect_cmpxchg() as 'noinline'

Hi Arnd,

On Tue, Dec 15, 2020 at 12:26:10PM +0100, Arnd Bergmann wrote:
> On Tue, Dec 15, 2020 at 7:09 AM Guo Ren <[email protected]> wrote:
> > On Mon, Dec 14, 2020 at 9:15 PM Arnd Bergmann <[email protected]> wrote:
> > > I had a look at what other architectures always implement
> > > futex_atomic_cmpxchg_inatomic() or can use the asm-generic non-SMP version,
> > > and I found that it's pretty much all of them, the odd ones being just sparc32
> > > and csky, which use asm-generic/futex.h but do have an SMP option,
> > > as well as xtensa
> > >
> > > I would guess that for csky, this is a mistake, as the architecture is fairly
> > > new and should be able to implement it. Not sure about sparc32.
> >
> > The c610, c807, c810 don't support SMP, so futex_cmpxchg_enabled = 1
> > with asm-generic's implementation.
> > For c860, there is no HAVE_FUTEX_CMPXCHG and cmpxchg_inatomic/inuser
> > implementation, so futex_cmpxchg_enabled = 0.
> >
> > Thx for point it out, we'll implement cmpxchg_inatomic/inuser for C860
> > and still use asm-generic for non-smp CPUs.
>
> Sounds good to me.
>
> With that, I would suggest we actually remove the -ENOSYS fallback
> for arch_futex_atomic_op_inuser() and futex_atomic_cmpxchg_inatomic()
> in asm-generic/futex.h as well as the HAVE_FUTEX_CMPXCHG Kconfig
> symbol, plus these additional fixups:
>
> - for xtensa and mips configurations without ll/sc, fall back to the
> asm-generic version. These are all uniprocessor, while the
> corresponding SMP machines have a working
> arch_futex_atomic_op_inuser().
>
> - Disable SMP support for sun4m/sun4d. From the historic git
> tree, it's unclear how well this ever worked, and very few machines
> of this class ever existed
Yeah, I have collection of sparc32 machines that I played around with
once. Including one sun4d that I brought from a friendly Linux fellow in
the UK. But somehow I lost interest as this is all very nice machines
but not useful for anything real work.

I think we would be better served dropping support for sun4m and sun4d
from the kernel.

Last I suggested deleting sun4m/sun4d the argument to keep sun4m was that
QEMU supports sun4m - which is a good argument for sun4m. I dunno what
would be needed to migrate QEMU to LEON, see below.

>
> - Mark SMP for LEON as temporarily broken. As I see in the LEON
> patch set, they have changes to enable compare-and-swap-atomic
> instructions unconditionally, as all SMP Leons have those and
> seem to require this support already for other things.
LEON on the other hand could have some nice future. They are right now
stuck on an older kernel and someone that was motivated should be able
to get LEON4 running on latest upstream.
We had it working in the past - but is was around the time I lost my
sparc interest and no-one jumped in to move it much more forward.

So in other words - no complains for the plan you outline.

Sam

2020-12-16 00:10:57

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH 1/2] futex: mark futex_detect_cmpxchg() as 'noinline'

On Tue, Dec 15, 2020 at 8:38 PM Sam Ravnborg <[email protected]> wrote:
> On Tue, Dec 15, 2020 at 12:26:10PM +0100, Arnd Bergmann wrote:
> >
> > - Disable SMP support for sun4m/sun4d. From the historic git
> > tree, it's unclear how well this ever worked, and very few machines
> > of this class ever existed
> Yeah, I have collection of sparc32 machines that I played around with
> once. Including one sun4d that I brought from a friendly Linux fellow in
> the UK. But somehow I lost interest as this is all very nice machines
> but not useful for anything real work.
>
> I think we would be better served dropping support for sun4m and sun4d
> from the kernel.

This seems appropriate as well to me.

> Last I suggested deleting sun4m/sun4d the argument to keep sun4m was that
> QEMU supports sun4m - which is a good argument for sun4m. I dunno what
> would be needed to migrate QEMU to LEON, see below.

"qemu-system-sparc -M help" shows a "leon3_generic" platform, apparently
added in 2013. Do you think that would be sufficient?

> > - Mark SMP for LEON as temporarily broken. As I see in the LEON
> > patch set, they have changes to enable compare-and-swap-atomic
> > instructions unconditionally, as all SMP Leons have those and
> > seem to require this support already for other things.
> LEON on the other hand could have some nice future. They are right now
> stuck on an older kernel and someone that was motivated should be able
> to get LEON4 running on latest upstream.
> We had it working in the past - but is was around the time I lost my
> sparc interest and no-one jumped in to move it much more forward.

My best guess from the public information I could find on LEON is that
it keeps shifting away from Linux on LEON to other OSs, and to
and to Linux on NOEL-V.

So even though the CPU itself will likely have a long life ahead of it
with LEON5 only a year old, it's unclear how many more updates
we'll see to the kernel from the current 4.9 based release.

> So in other words - no complains for the plan you outline.

Thanks. I'd probably queue up a patch in my asm-generic tree for
v5.12 to disable SMP on all SPARC32, add the helpers for C-Sky
once Guo Ren has tested a patch, and clean up the futex code based
on this. I guess we want the one-line fix for Arm that Thomas suggested
for v5.10 and backports anyway, The sun4m/sun4d removal should
clearly be discussed separately and go through the sparc tree, to see
if anyone has objections, or if we want to remove other obsolete
platforms (sun3?) along with it.

Arnd

2020-12-17 15:43:12

by Andreas Larsson

[permalink] [raw]
Subject: Re: [PATCH 1/2] futex: mark futex_detect_cmpxchg() as 'noinline'

On 2020-12-16 00:24, Arnd Bergmann wrote:
> On Tue, Dec 15, 2020 at 8:38 PM Sam Ravnborg <[email protected]> wrote:
>> On Tue, Dec 15, 2020 at 12:26:10PM +0100, Arnd Bergmann wrote:
>>>
>>> - Disable SMP support for sun4m/sun4d. From the historic git
>>> tree, it's unclear how well this ever worked, and very few machines
>>> of this class ever existed
>> Yeah, I have collection of sparc32 machines that I played around with
>> once. Including one sun4d that I brought from a friendly Linux fellow in
>> the UK. But somehow I lost interest as this is all very nice machines
>> but not useful for anything real work.
>>
>> I think we would be better served dropping support for sun4m and sun4d
>> from the kernel.
>
> This seems appropriate as well to me.
>
>> Last I suggested deleting sun4m/sun4d the argument to keep sun4m was that
>> QEMU supports sun4m - which is a good argument for sun4m. I dunno what
>> would be needed to migrate QEMU to LEON, see below.
>
> "qemu-system-sparc -M help" shows a "leon3_generic" platform, apparently
> added in 2013. Do you think that would be sufficient?

We have only use QEMU for LEON3 on limited and simpler system
setups. For example the Zephyr SPARC arch + LEON3 BSP port we recently
submitted to the Linux Foundation Zephyr project run their test-suite
successfully on QEMU+LEON3. We will have to look into the QEMU for LEON
and Linux situation.

We mainly use TSIM that is our own high fidelity simulator.


>>> - Mark SMP for LEON as temporarily broken. As I see in the LEON
>>> patch set, they have changes to enable compare-and-swap-atomic
>>> instructions unconditionally, as all SMP Leons have those and
>>> seem to require this support already for other things.
Regarding unconditional compare-and-swap-atomic instructions (CASA) for
LEON. I tried to get those into mainline under the LEON configuration
option but did not get OK for that at that time, with the feedback to do
it dynamically instead. I haven't come around to try to get an updated
patch doing probing instead into mainline yet. If the thought now is to
drop support for SMP for sparc32, maybe we can have the CASA
unconditionally on SMP configured kernels instead. Still we'd like to
have CASA usage even for non-SMP kernels for LEON systems that supports
it.


>> LEON on the other hand could have some nice future. They are right now
>> stuck on an older kernel and someone that was motivated should be able
>> to get LEON4 running on latest upstream.
>> We had it working in the past - but is was around the time I lost my
>> sparc interest and no-one jumped in to move it much more forward.

I would not say that we are stuck on an old kernel. It is rather that
for our official releases we stay on long term support kernels. We still
aim for kernels based on master to work on LEON. Right now we do have a
bit of backlog of things to get into upstream.


> My best guess from the public information I could find on LEON is that
> it keeps shifting away from Linux on LEON to other OSs, and to
> and to Linux on NOEL-V.

On the contrary. Our LEON users shows an increased interest in running
Linux, now that LEON-based systems gains in number of processors,
processor performance and memory. We are adding NOEL as a new processor
line. With NOEL we have a 64-bit architecture. We are not shifting from
LEON to NOEL. LEON is not going away.

>
> So even though the CPU itself will likely have a long life ahead of it
> with LEON5 only a year old, it's unclear how many more updates
> we'll see to the kernel from the current 4.9 based release.

Our latest release was indeed based on 4.9, but we have no plans to stay
there forever. We just tend to select long term support kernels for our
official releases. The next step will be to go to 5.4 if not 5.10.

We recently released a new Linux 4.9 toolchain where we stepped up to
GLIBC 2.31 and GCC 10.2. So we do not consider us stuck at old versions
of any of the involved software.

In addition, non-LTS users are running other mainline kernel versions
as well.


Best regards,

Andreas Larsson
Software Engineer
Cobham Gaisler

2020-12-17 16:46:05

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH 1/2] futex: mark futex_detect_cmpxchg() as 'noinline'

On Thu, Dec 17, 2020 at 4:32 PM Andreas Larsson <[email protected]> wrote:
> On 2020-12-16 00:24, Arnd Bergmann wrote:
> > On Tue, Dec 15, 2020 at 8:38 PM Sam Ravnborg <[email protected]> wrote:
> >> On Tue, Dec 15, 2020 at 12:26:10PM +0100, Arnd Bergmann wrote:
> >>>
> >>> - Disable SMP support for sun4m/sun4d. From the historic git
> >>> tree, it's unclear how well this ever worked, and very few machines
> >>> of this class ever existed
> >> Yeah, I have collection of sparc32 machines that I played around with
> >> once. Including one sun4d that I brought from a friendly Linux fellow in
> >> the UK. But somehow I lost interest as this is all very nice machines
> >> but not useful for anything real work.
> >>
> >> I think we would be better served dropping support for sun4m and sun4d
> >> from the kernel.
> >
> > This seems appropriate as well to me.
> >
> >> Last I suggested deleting sun4m/sun4d the argument to keep sun4m was that
> >> QEMU supports sun4m - which is a good argument for sun4m. I dunno what
> >> would be needed to migrate QEMU to LEON, see below.
> >
> > "qemu-system-sparc -M help" shows a "leon3_generic" platform, apparently
> > added in 2013. Do you think that would be sufficient?
>
> We have only use QEMU for LEON3 on limited and simpler system
> setups. For example the Zephyr SPARC arch + LEON3 BSP port we recently
> submitted to the Linux Foundation Zephyr project run their test-suite
> successfully on QEMU+LEON3. We will have to look into the QEMU for LEON
> and Linux situation.
>
> We mainly use TSIM that is our own high fidelity simulator.

I don't think there is a need to have many features supported in qemu,
as long as you have enough RAM, block and network devices (which
are trivial if you have PCI or USB).

> >>> - Mark SMP for LEON as temporarily broken. As I see in the LEON
> >>> patch set, they have changes to enable compare-and-swap-atomic
> >>> instructions unconditionally, as all SMP Leons have those and
> >>> seem to require this support already for other things.
>
> Regarding unconditional compare-and-swap-atomic instructions (CASA) for
> LEON. I tried to get those into mainline under the LEON configuration
> option but did not get OK for that at that time, with the feedback to do
> it dynamically instead. I haven't come around to try to get an updated
> patch doing probing instead into mainline yet. If the thought now is to
> drop support for SMP for sparc32, maybe we can have the CASA
> unconditionally on SMP configured kernels instead. Still we'd like to
> have CASA usage even for non-SMP kernels for LEON systems that
> supports it.

It does make sense to require that a single kernel can work on all
possible hardware. So if we remove sun4m/sun4d support, all that
is left is LEON, and you likely wouldn't need to worry about other
CPUs any more.

However, there is still the question whether a single kernel needs
to work on LEON both with and without CASA. Do you still care
about Linux users on LEON cores that do not support CASA, or is
widespread enough that you just make it unconditional for both
SMP and non-SMP?

> >> LEON on the other hand could have some nice future. They are right now
> >> stuck on an older kernel and someone that was motivated should be able
> >> to get LEON4 running on latest upstream.
> >> We had it working in the past - but is was around the time I lost my
> >> sparc interest and no-one jumped in to move it much more forward.
>
> I would not say that we are stuck on an old kernel. It is rather that
> for our official releases we stay on long term support kernels. We still
> aim for kernels based on master to work on LEON. Right now we do have a
> bit of backlog of things to get into upstream.

Ok, good to hear. There were a couple of old bugs that got found
on mainline that made me wonder whether mainline sparc32 was
used on any hardware at all these days.

I looked at your v4.9 patches and didn't see anything too scary
there, in particular once sparc32 becomes leon-only, you no longer
have to worry about making it work across different CPUs.

> > My best guess from the public information I could find on LEON is that
> > it keeps shifting away from Linux on LEON to other OSs, and to
> > and to Linux on NOEL-V.
>
> On the contrary. Our LEON users shows an increased interest in running
> Linux, now that LEON-based systems gains in number of processors,
> processor performance and memory. We are adding NOEL as a new processor
> line. With NOEL we have a 64-bit architecture. We are not shifting from
> LEON to NOEL. LEON is not going away.

Ok.

> > So even though the CPU itself will likely have a long life ahead of it
> > with LEON5 only a year old, it's unclear how many more updates
> > we'll see to the kernel from the current 4.9 based release.
>
> Our latest release was indeed based on 4.9, but we have no plans to stay
> there forever. We just tend to select long term support kernels for our
> official releases. The next step will be to go to 5.4 if not 5.10.

I hope that you can make it to 5.10 then, as this contains the work
I did for 64-bit time_t, which is required if you have users that want to
run systems after 2038.

> We recently released a new Linux 4.9 toolchain where we stepped up to
> GLIBC 2.31 and GCC 10.2. So we do not consider us stuck at old versions
> of any of the involved software.
>
> In addition, non-LTS users are running other mainline kernel versions
> as well.

FWIW, glibc-2.31 does not have support for 64-bit time_t yet, but I
know there was interest in adding sparc support to the musl libc, which
does support 64-bit time_t.

Arnd

2020-12-17 20:04:30

by Sam Ravnborg

[permalink] [raw]
Subject: Re: [PATCH 1/2] futex: mark futex_detect_cmpxchg() as 'noinline'

Hi Arnd,

> > I think we would be better served dropping support for sun4m and sun4d
> > from the kernel.
>
> This seems appropriate as well to me.

I did a quick hack:
20 files changed, 40 insertions(+), 3051 deletions(-)

All the leon stuff is kept and there is room for more cleaning up.
The kernel can build a sparc32_defconfig with the Gaisler toochain.
This was a one hour quick hack that I cannot commit - it needs to be
split up to allow some resemble of review.

And it touched only arch/sparc/ - no sparc32 specific drivers were
dropped.

If we decide to chase this and we can drop sun4m/sun4d then maintaining
leon will be simpler as a nice side effect.

Sam

2020-12-18 11:10:20

by Andreas Larsson

[permalink] [raw]
Subject: Re: [PATCH 1/2] futex: mark futex_detect_cmpxchg() as 'noinline'

On 2020-12-17 17:43, Arnd Bergmann wrote:
> It does make sense to require that a single kernel can work on all
> possible hardware. So if we remove sun4m/sun4d support, all that
> is left is LEON, and you likely wouldn't need to worry about other
> CPUs any more.
>
> However, there is still the question whether a single kernel needs
> to work on LEON both with and without CASA. Do you still care
> about Linux users on LEON cores that do not support CASA, or is
> widespread enough that you just make it unconditional for both
> SMP and non-SMP?

We are fine with unconditional CASA for both SMP and non-SMP for LEON.


> I hope that you can make it to 5.10 then, as this contains the work
> I did for 64-bit time_t, which is required if you have users that want to
> run systems after 2038.

That is a good point! Thank you!


> FWIW, glibc-2.31 does not have support for 64-bit time_t yet, but I
> know there was interest in adding sparc support to the musl libc, which
> does support 64-bit time_t.

Yes, we will have to follow the developments regarding 64-bit time_t
in GLIBC as well.

--
Andreas

2020-12-20 15:47:56

by Guo Ren

[permalink] [raw]
Subject: Re: [PATCH 1/2] futex: mark futex_detect_cmpxchg() as 'noinline'

Hi Arnd,

On Tue, Dec 15, 2020 at 7:26 PM Arnd Bergmann <[email protected]> wrote:
>
> On Tue, Dec 15, 2020 at 7:09 AM Guo Ren <[email protected]> wrote:
> > On Mon, Dec 14, 2020 at 9:15 PM Arnd Bergmann <[email protected]> wrote:
> > > I had a look at what other architectures always implement
> > > futex_atomic_cmpxchg_inatomic() or can use the asm-generic non-SMP version,
> > > and I found that it's pretty much all of them, the odd ones being just sparc32
> > > and csky, which use asm-generic/futex.h but do have an SMP option,
> > > as well as xtensa
> > >
> > > I would guess that for csky, this is a mistake, as the architecture is fairly
> > > new and should be able to implement it. Not sure about sparc32.
> >
> > The c610, c807, c810 don't support SMP, so futex_cmpxchg_enabled = 1
> > with asm-generic's implementation.
> > For c860, there is no HAVE_FUTEX_CMPXCHG and cmpxchg_inatomic/inuser
> > implementation, so futex_cmpxchg_enabled = 0.
> >
> > Thx for point it out, we'll implement cmpxchg_inatomic/inuser for C860
> > and still use asm-generic for non-smp CPUs.
>
> Sounds good to me.
Done: https://lore.kernel.org/linux-csky/[email protected]/T/#u

--
Best Regards
Guo Ren

ML: https://lore.kernel.org/linux-csky/

2020-12-20 17:52:31

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH 1/2] futex: mark futex_detect_cmpxchg() as 'noinline'

On Sun, Dec 20, 2020 at 4:46 PM Guo Ren <[email protected]> wrote:
> On Tue, Dec 15, 2020 at 7:26 PM Arnd Bergmann <[email protected]> wrote:
> >
> > On Tue, Dec 15, 2020 at 7:09 AM Guo Ren <[email protected]> wrote:
> > > On Mon, Dec 14, 2020 at 9:15 PM Arnd Bergmann <[email protected]> wrote:
> > > > I had a look at what other architectures always implement
> > > > futex_atomic_cmpxchg_inatomic() or can use the asm-generic non-SMP version,
> > > > and I found that it's pretty much all of them, the odd ones being just sparc32
> > > > and csky, which use asm-generic/futex.h but do have an SMP option,
> > > > as well as xtensa
> > > >
> > > > I would guess that for csky, this is a mistake, as the architecture is fairly
> > > > new and should be able to implement it. Not sure about sparc32.
> > >
> > > The c610, c807, c810 don't support SMP, so futex_cmpxchg_enabled = 1
> > > with asm-generic's implementation.
> > > For c860, there is no HAVE_FUTEX_CMPXCHG and cmpxchg_inatomic/inuser
> > > implementation, so futex_cmpxchg_enabled = 0.
> > >
> > > Thx for point it out, we'll implement cmpxchg_inatomic/inuser for C860
> > > and still use asm-generic for non-smp CPUs.
> >
> > Sounds good to me.
> Done: https://lore.kernel.org/linux-csky/[email protected]/T/#u

Thanks!

Can you clarify if there are any dependencies on the other patches in
that series?

I'd like to take the futex patch through the asm-generic tree along with the
patches for the other architectures.

Arnd

2020-12-21 03:00:39

by Guo Ren

[permalink] [raw]
Subject: Re: [PATCH 1/2] futex: mark futex_detect_cmpxchg() as 'noinline'

Hi Arnd,

On Mon, Dec 21, 2020 at 1:49 AM Arnd Bergmann <[email protected]> wrote:
>
> On Sun, Dec 20, 2020 at 4:46 PM Guo Ren <[email protected]> wrote:
> > On Tue, Dec 15, 2020 at 7:26 PM Arnd Bergmann <[email protected]> wrote:
> > >
> > > On Tue, Dec 15, 2020 at 7:09 AM Guo Ren <[email protected]> wrote:
> > > > On Mon, Dec 14, 2020 at 9:15 PM Arnd Bergmann <[email protected]> wrote:
> > > > > I had a look at what other architectures always implement
> > > > > futex_atomic_cmpxchg_inatomic() or can use the asm-generic non-SMP version,
> > > > > and I found that it's pretty much all of them, the odd ones being just sparc32
> > > > > and csky, which use asm-generic/futex.h but do have an SMP option,
> > > > > as well as xtensa
> > > > >
> > > > > I would guess that for csky, this is a mistake, as the architecture is fairly
> > > > > new and should be able to implement it. Not sure about sparc32.
> > > >
> > > > The c610, c807, c810 don't support SMP, so futex_cmpxchg_enabled = 1
> > > > with asm-generic's implementation.
> > > > For c860, there is no HAVE_FUTEX_CMPXCHG and cmpxchg_inatomic/inuser
> > > > implementation, so futex_cmpxchg_enabled = 0.
> > > >
> > > > Thx for point it out, we'll implement cmpxchg_inatomic/inuser for C860
> > > > and still use asm-generic for non-smp CPUs.
> > >
> > > Sounds good to me.
> > Done: https://lore.kernel.org/linux-csky/[email protected]/T/#u
>
> Thanks!
>
> Can you clarify if there are any dependencies on the other patches in
> that series?
No dependency.

>
> I'd like to take the futex patch through the asm-generic tree along with the
> patches for the other architectures.
You take the futex patch and I'll remove it from my tree.

--
Best Regards
Guo Ren

ML: https://lore.kernel.org/linux-csky/

2021-07-22 20:08:44

by Nathan Chancellor

[permalink] [raw]
Subject: Re: [PATCH 1/2] futex: mark futex_detect_cmpxchg() as 'noinline'

On Sat, Dec 12, 2020 at 09:01:34PM +0100, Thomas Gleixner wrote:
> On Sat, Dec 12 2020 at 13:26, Marco Elver wrote:
> > On Thu, Mar 07, 2019 at 10:14AM +0100, Arnd Bergmann wrote:
> >> -static void __init futex_detect_cmpxchg(void)
> >> +static noinline void futex_detect_cmpxchg(void)
> >> {
> >> #ifndef CONFIG_HAVE_FUTEX_CMPXCHG
> >> u32 curval;
> >
> > What ever happened to this patch?
>
> It obviously fell through the cracks.
>
> > I'm seeing this again with the attached config + next-20201211 (for
> > testing https://bugs.llvm.org/show_bug.cgi?id=48492). Had to apply this
> > patch to build the kernel.
>
> What really bothers me is to remove the __init from a function which is
> clearly only used during init. And looking deeper it's simply a hack.
>
> This function is only needed when an architecture has to runtime
> discover whether the CPU supports it or not. ARM has unconditional
> support for this, so the obvious thing to do is the below.
>
> Thanks,
>
> tglx
> ---
> arch/arm/Kconfig | 1 +
> 1 file changed, 1 insertion(+)
>
> --- a/arch/arm/Kconfig
> +++ b/arch/arm/Kconfig
> @@ -86,6 +86,7 @@ config ARM
> select HAVE_FTRACE_MCOUNT_RECORD if !XIP_KERNEL
> select HAVE_FUNCTION_GRAPH_TRACER if !THUMB2_KERNEL && !CC_IS_CLANG
> select HAVE_FUNCTION_TRACER if !XIP_KERNEL
> + select HAVE_FUTEX_CMPXCHG if FUTEX
> select HAVE_GCC_PLUGINS
> select HAVE_HW_BREAKPOINT if PERF_EVENTS && (CPU_V6 || CPU_V6K || CPU_V7)
> select HAVE_IDE if PCI || ISA || PCMCIA
>
>

Hi Thomas,

Did this ever get sent along as a formal patch? I just ran into another
issue that seems to be similar to the one Arnd sent the initial patch in
this thread for and it is resolved by this change.

Cheers,
Nathan

2021-10-25 14:30:25

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH 1/2] futex: mark futex_detect_cmpxchg() as 'noinline'

On Thu, Jul 22, 2021 at 10:05 PM Nathan Chancellor <[email protected]> wrote:
> On Sat, Dec 12, 2020 at 09:01:34PM +0100, Thomas Gleixner wrote:
> > --- a/arch/arm/Kconfig
> > +++ b/arch/arm/Kconfig
> > @@ -86,6 +86,7 @@ config ARM
> > select HAVE_FTRACE_MCOUNT_RECORD if !XIP_KERNEL
> > select HAVE_FUNCTION_GRAPH_TRACER if !THUMB2_KERNEL && !CC_IS_CLANG
> > select HAVE_FUNCTION_TRACER if !XIP_KERNEL
> > + select HAVE_FUTEX_CMPXCHG if FUTEX
> > select HAVE_GCC_PLUGINS
> > select HAVE_HW_BREAKPOINT if PERF_EVENTS && (CPU_V6 || CPU_V6K || CPU_V7)
> > select HAVE_IDE if PCI || ISA || PCMCIA
>
> Did this ever get sent along as a formal patch? I just ran into another
> issue that seems to be similar to the one Arnd sent the initial patch in
> this thread for and it is resolved by this change.

Nick sent this patch in September, and Russell applied it as commit
9d417cbe36ee ("ARM: 9122/1: select HAVE_FUTEX_CMPXCHG").
This addresses the problem for arm, but I think we should really
just remove HAVE_FUTEX_CMPXCHG entirely and require it to
be there for SMP.

I have a patch for that, but that needs a separate fix for sparc32,
which I think nobody is interested in working on. I can post it anyway
to get discussion moving again.

Arnd