2013-09-26 17:38:48

by Chris Metcalf

[permalink] [raw]
Subject: [PATCH] tile: use a more conservative __my_cpu_offset in CONFIG_PREEMPT

It turns out the kernel relies on barrier() to force a reload of the
percpu offset value. Since we can't easily modify the definition of
barrier() to include "tp" as an output register, we instead provide a
definition of __my_cpu_offset as extended assembly that includes a fake
stack read to hazard against barrier(), forcing gcc to know that it
must reread "tp" and recompute anything based on "tp" after a barrier.

This fixes observed hangs in the slub allocator when we are looping
on a percpu cmpxchg_double.

A similar fix for ARMv7 was made in June in change 509eb76ebf97.

Cc: [email protected]
Signed-off-by: Chris Metcalf <[email protected]>
---
Ben Herrenschmidt offered to look into extending barrier() in some
per-architecture way, but for now this patch is the minimal thing to
fix the bug in 3.12 and earlier, so I'm planning to push it as-is.

arch/tile/include/asm/percpu.h | 33 ++++++++++++++++++++++++++++++---
1 file changed, 30 insertions(+), 3 deletions(-)

diff --git a/arch/tile/include/asm/percpu.h b/arch/tile/include/asm/percpu.h
index 63294f5..de121d3 100644
--- a/arch/tile/include/asm/percpu.h
+++ b/arch/tile/include/asm/percpu.h
@@ -15,9 +15,36 @@
#ifndef _ASM_TILE_PERCPU_H
#define _ASM_TILE_PERCPU_H

-register unsigned long __my_cpu_offset __asm__("tp");
-#define __my_cpu_offset __my_cpu_offset
-#define set_my_cpu_offset(tp) (__my_cpu_offset = (tp))
+register unsigned long my_cpu_offset_reg asm("tp");
+
+#ifdef CONFIG_PREEMPT
+/*
+ * For full preemption, we can't just use the register variable
+ * directly, since we need barrier() to hazard against it, causing the
+ * compiler to reload anything computed from a previous "tp" value.
+ * But we also don't want to use volatile asm, since we'd like the
+ * compiler to be able to cache the value across multiple percpu reads.
+ * So we use a fake stack read as a hazard against barrier().
+ */
+static inline unsigned long __my_cpu_offset(void)
+{
+ unsigned long tp;
+ register unsigned long *sp asm("sp");
+ asm("move %0, tp" : "=r" (tp) : "m" (*sp));
+ return tp;
+}
+#define __my_cpu_offset __my_cpu_offset()
+#else
+/*
+ * We don't need to hazard against barrier() since "tp" doesn't ever
+ * change with PREEMPT_NONE, and with PREEMPT_VOLUNTARY it only
+ * changes at function call points, at which we are already re-reading
+ * the value of "tp" due to "my_cpu_offset_reg" being a global variable.
+ */
+#define __my_cpu_offset my_cpu_offset_reg
+#endif
+
+#define set_my_cpu_offset(tp) (my_cpu_offset_reg = (tp))

#include <asm-generic/percpu.h>

--
1.8.3.1


2013-09-26 17:58:19

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH] tile: use a more conservative __my_cpu_offset in CONFIG_PREEMPT

Hi Chris,

On Thu, Sep 26, 2013 at 06:24:53PM +0100, Chris Metcalf wrote:
> It turns out the kernel relies on barrier() to force a reload of the
> percpu offset value. Since we can't easily modify the definition of
> barrier() to include "tp" as an output register, we instead provide a
> definition of __my_cpu_offset as extended assembly that includes a fake
> stack read to hazard against barrier(), forcing gcc to know that it
> must reread "tp" and recompute anything based on "tp" after a barrier.
>
> This fixes observed hangs in the slub allocator when we are looping
> on a percpu cmpxchg_double.
>
> A similar fix for ARMv7 was made in June in change 509eb76ebf97.
>
> Cc: [email protected]
> Signed-off-by: Chris Metcalf <[email protected]>
> ---
> Ben Herrenschmidt offered to look into extending barrier() in some
> per-architecture way, but for now this patch is the minimal thing to
> fix the bug in 3.12 and earlier, so I'm planning to push it as-is.
>
> arch/tile/include/asm/percpu.h | 33 ++++++++++++++++++++++++++++++---
> 1 file changed, 30 insertions(+), 3 deletions(-)
>
> diff --git a/arch/tile/include/asm/percpu.h b/arch/tile/include/asm/percpu.h
> index 63294f5..de121d3 100644
> --- a/arch/tile/include/asm/percpu.h
> +++ b/arch/tile/include/asm/percpu.h
> @@ -15,9 +15,36 @@
> #ifndef _ASM_TILE_PERCPU_H
> #define _ASM_TILE_PERCPU_H
>
> -register unsigned long __my_cpu_offset __asm__("tp");
> -#define __my_cpu_offset __my_cpu_offset
> -#define set_my_cpu_offset(tp) (__my_cpu_offset = (tp))
> +register unsigned long my_cpu_offset_reg asm("tp");
> +
> +#ifdef CONFIG_PREEMPT
> +/*
> + * For full preemption, we can't just use the register variable
> + * directly, since we need barrier() to hazard against it, causing the
> + * compiler to reload anything computed from a previous "tp" value.
> + * But we also don't want to use volatile asm, since we'd like the
> + * compiler to be able to cache the value across multiple percpu reads.
> + * So we use a fake stack read as a hazard against barrier().
> + */
> +static inline unsigned long __my_cpu_offset(void)
> +{
> + unsigned long tp;
> + register unsigned long *sp asm("sp");
> + asm("move %0, tp" : "=r" (tp) : "m" (*sp));
> + return tp;
> +}

Hehe, nice to see this hack working out for you too. One thing to check is
whether you have any funky addressing modes (things like writeback or
post-increment), since the "m" constraint can bite you if you don't actually
use it in the asm.

Will

2013-09-27 20:13:19

by Chris Metcalf

[permalink] [raw]
Subject: Re: [PATCH] tile: use a more conservative __my_cpu_offset in CONFIG_PREEMPT

On 9/26/2013 1:57 PM, Will Deacon wrote:
> Hi Chris,
>
> On Thu, Sep 26, 2013 at 06:24:53PM +0100, Chris Metcalf wrote:
>> [...]
>> +static inline unsigned long __my_cpu_offset(void)
>> +{
>> + unsigned long tp;
>> + register unsigned long *sp asm("sp");
>> + asm("move %0, tp" : "=r" (tp) : "m" (*sp));
>> + return tp;
>> +}
> Hehe, nice to see this hack working out for you too. One thing to check is
> whether you have any funky addressing modes (things like writeback or
> post-increment), since the "m" constraint can bite you if you don't actually
> use it in the asm.

Well, we do have post increments, though I don't see why this is a problem here.
We define a target specific constraint "U" that excludes post-increments, but
again I don't see why "m" would cause trouble here. What was your experience?

--
Chris Metcalf, Tilera Corp.
http://www.tilera.com

2013-09-30 09:45:23

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH] tile: use a more conservative __my_cpu_offset in CONFIG_PREEMPT

On Fri, Sep 27, 2013 at 09:13:15PM +0100, Chris Metcalf wrote:
> On 9/26/2013 1:57 PM, Will Deacon wrote:
> > Hi Chris,
> >
> > On Thu, Sep 26, 2013 at 06:24:53PM +0100, Chris Metcalf wrote:
> >> [...]
> >> +static inline unsigned long __my_cpu_offset(void)
> >> +{
> >> + unsigned long tp;
> >> + register unsigned long *sp asm("sp");
> >> + asm("move %0, tp" : "=r" (tp) : "m" (*sp));
> >> + return tp;
> >> +}
> > Hehe, nice to see this hack working out for you too. One thing to check is
> > whether you have any funky addressing modes (things like writeback or
> > post-increment), since the "m" constraint can bite you if you don't actually
> > use it in the asm.
>
> Well, we do have post increments, though I don't see why this is a problem here.
> We define a target specific constraint "U" that excludes post-increments, but
> again I don't see why "m" would cause trouble here. What was your experience?

GCC assumes that each "m" operand is used *exactly once* in the asm, so if
it decided to generate a post-increment/writeback addressing mode, you can
end up with pointers off by a word if you didn't make use of the constraint
in the code. You can try using "o", but GCC sometimes decides that's
impossible during reload, so we end up combining it with the (undocumented,
ARM-specific) "Q" constraint which is simply a [rN] addressing mode.

Will

2013-09-30 18:01:16

by Chris Metcalf

[permalink] [raw]
Subject: Re: [PATCH] tile: use a more conservative __my_cpu_offset in CONFIG_PREEMPT

On 9/30/2013 5:45 AM, Will Deacon wrote:
> On Fri, Sep 27, 2013 at 09:13:15PM +0100, Chris Metcalf wrote:
>> On 9/26/2013 1:57 PM, Will Deacon wrote:
>>> Hi Chris,
>>>
>>> On Thu, Sep 26, 2013 at 06:24:53PM +0100, Chris Metcalf wrote:
>>>> [...]
>>>> +static inline unsigned long __my_cpu_offset(void)
>>>> +{
>>>> + unsigned long tp;
>>>> + register unsigned long *sp asm("sp");
>>>> + asm("move %0, tp" : "=r" (tp) : "m" (*sp));
>>>> + return tp;
>>>> +}
>>> Hehe, nice to see this hack working out for you too. One thing to check is
>>> whether you have any funky addressing modes (things like writeback or
>>> post-increment), since the "m" constraint can bite you if you don't actually
>>> use it in the asm.
>> Well, we do have post increments, though I don't see why this is a problem here.
>> We define a target specific constraint "U" that excludes post-increments, but
>> again I don't see why "m" would cause trouble here. What was your experience?
> GCC assumes that each "m" operand is used *exactly once* in the asm, so if
> it decided to generate a post-increment/writeback addressing mode, you can
> end up with pointers off by a word if you didn't make use of the constraint
> in the code. You can try using "o", but GCC sometimes decides that's
> impossible during reload, so we end up combining it with the (undocumented,
> ARM-specific) "Q" constraint which is simply a [rN] addressing mode.

OK, makes sense. I've changed the code to use a "U" constraint (pushed up to the linux-tile tree).

Thanks!

--
Chris Metcalf, Tilera Corp.
http://www.tilera.com