2013-03-29 10:34:08

by Vineet Gupta

[permalink] [raw]
Subject: [PATCH] timer: Fix possible issues with non serialized timer_pending( )

When stress testing ARC Linux from 3.9-rc3, we've hit a serialization
issue when mod_timer() races with itself. This is on a FPGA board and
kernel .config among others has !SMP and !PREEMPT_COUNT.

The issue happens in mod_timer( ) because timer_pending( ) based early
exit check is NOT done inside the timer base spinlock - as a networking
optimization.

The value used in there, timer->entry.next is also used further in call
chain (all inlines though) for actual list manipulation. However if the
register containing this pointer remains live across the spinlock (in a
UP setup with !PREEMPT_COUNT there's nothing forcing gcc to reload) then
a stale value of next pointer causes incorrect list manipulation,
observed with following sequence in our tests.

(0). tv1[x] <----> t1 <---> t2
(1). mod_timer(t1) interrupted after it calls timer_pending()
(2). mod_timer(t2) completes
(3). mod_timer(t1) resumes but messes up the list.
(4). __runt_timers( ) uses bogus timer_list entry / crashes in
timer->function

The simplest fix is to NOT rely on spinlock based compiler barrier but
add an explicit one in timer_pending()

FWIW, the relevant ARCompact disassembly of mod_timer which clearly
shows the issue due to register reuse is:

mod_timer:
push_s blink
mov_s r13,r0 # timer, timer

...
###### timer_pending( )
ld_s r3,[r13] # <------ <variable>.entry.next LOADED
brne r3, 0, @.L163

.L163:
....
###### spin_lock_irq( )
lr r5, [status32] # flags
bic r4, r5, 6 # temp, flags,
and.f 0, r5, 6 # flags,
flag.nz r4

###### detach_if_pending( ) begins

tst_s r3,r3 <--------------
# timer_pending( ) checks timer->entry.next
# r3 is NOT reloaded by gcc, using stale value
beq.d @.L169
mov.eq r0,0

# detach_timer( ): __list_del( )

ld r4,[r13,4] # <variable>.entry.prev, D.31439
st r4,[r3,4] # <variable>.prev, D.31439
st r3,[r4] # <variable>.next, D.30246

Signed-off-by: Vineet Gupta <[email protected]>
Reported-by: Christian Ruppert <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Christian Ruppert <[email protected]>
Cc: Pierrick Hascoet <[email protected]>
Cc: [email protected]
---
include/linux/timer.h | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/include/linux/timer.h b/include/linux/timer.h
index 8c5a197..1537104 100644
--- a/include/linux/timer.h
+++ b/include/linux/timer.h
@@ -168,7 +168,16 @@ static inline void init_timer_on_stack_key(struct timer_list *timer,
*/
static inline int timer_pending(const struct timer_list * timer)
{
- return timer->entry.next != NULL;
+ int pending = timer->entry.next != NULL;
+
+ /*
+ * The check above enables timer fast path - early exit.
+ * However most of the call sites are not protected by timer->base
+ * spinlock. If the caller (say mod_timer) races with itself, it
+ * can use the stale "next" pointer. See commit log for details.
+ */
+ barrier();
+ return pending;
}

extern void add_timer_on(struct timer_list *timer, int cpu);
--
1.7.10.4


2013-04-03 07:20:17

by Vineet Gupta

[permalink] [raw]
Subject: Re: [PATCH] timer: Fix possible issues with non serialized timer_pending( )

Hi Thomas,

Did you get a chance to look at this one !
It fixes a real problem for ARC platform - w/o it my stress test setup buckles up
in ~20 mins.

Thx,
-Vineet

On 03/29/2013 04:03 PM, Vineet Gupta wrote:
> When stress testing ARC Linux from 3.9-rc3, we've hit a serialization
> issue when mod_timer() races with itself. This is on a FPGA board and
> kernel .config among others has !SMP and !PREEMPT_COUNT.
>
> The issue happens in mod_timer( ) because timer_pending( ) based early
> exit check is NOT done inside the timer base spinlock - as a networking
> optimization.
>
> The value used in there, timer->entry.next is also used further in call
> chain (all inlines though) for actual list manipulation. However if the
> register containing this pointer remains live across the spinlock (in a
> UP setup with !PREEMPT_COUNT there's nothing forcing gcc to reload) then
> a stale value of next pointer causes incorrect list manipulation,
> observed with following sequence in our tests.
>
> (0). tv1[x] <----> t1 <---> t2
> (1). mod_timer(t1) interrupted after it calls timer_pending()
> (2). mod_timer(t2) completes
> (3). mod_timer(t1) resumes but messes up the list.
> (4). __runt_timers( ) uses bogus timer_list entry / crashes in
> timer->function
>
> The simplest fix is to NOT rely on spinlock based compiler barrier but
> add an explicit one in timer_pending()
>
> FWIW, the relevant ARCompact disassembly of mod_timer which clearly
> shows the issue due to register reuse is:
>
> mod_timer:
> push_s blink
> mov_s r13,r0 # timer, timer
>
> ...
> ###### timer_pending( )
> ld_s r3,[r13] # <------ <variable>.entry.next LOADED
> brne r3, 0, @.L163
>
> .L163:
> ....
> ###### spin_lock_irq( )
> lr r5, [status32] # flags
> bic r4, r5, 6 # temp, flags,
> and.f 0, r5, 6 # flags,
> flag.nz r4
>
> ###### detach_if_pending( ) begins
>
> tst_s r3,r3 <--------------
> # timer_pending( ) checks timer->entry.next
> # r3 is NOT reloaded by gcc, using stale value
> beq.d @.L169
> mov.eq r0,0
>
> # detach_timer( ): __list_del( )
>
> ld r4,[r13,4] # <variable>.entry.prev, D.31439
> st r4,[r3,4] # <variable>.prev, D.31439
> st r3,[r4] # <variable>.next, D.30246
>
> Signed-off-by: Vineet Gupta <[email protected]>
> Reported-by: Christian Ruppert <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Christian Ruppert <[email protected]>
> Cc: Pierrick Hascoet <[email protected]>
> Cc: [email protected]
> ---
> include/linux/timer.h | 11 ++++++++++-
> 1 file changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/timer.h b/include/linux/timer.h
> index 8c5a197..1537104 100644
> --- a/include/linux/timer.h
> +++ b/include/linux/timer.h
> @@ -168,7 +168,16 @@ static inline void init_timer_on_stack_key(struct timer_list *timer,
> */
> static inline int timer_pending(const struct timer_list * timer)
> {
> - return timer->entry.next != NULL;
> + int pending = timer->entry.next != NULL;
> +
> + /*
> + * The check above enables timer fast path - early exit.
> + * However most of the call sites are not protected by timer->base
> + * spinlock. If the caller (say mod_timer) races with itself, it
> + * can use the stale "next" pointer. See commit log for details.
> + */
> + barrier();
> + return pending;
> }
>
> extern void add_timer_on(struct timer_list *timer, int cpu);

2013-04-03 09:07:38

by Christian Ruppert

[permalink] [raw]
Subject: Re: [PATCH] timer: Fix possible issues with non serialized timer_pending( )

We have tested the patch in several configurations and boards using
tests which previously crashed the kernel in less than an hour. The
crashes in question could not be reproduced in long term tests (3 days)
and nightly tests with the patch. I can thus confirm that this fixes a
real issue.

On Fri, Mar 29, 2013 at 04:03:38PM +0530, Vineet Gupta wrote:
> When stress testing ARC Linux from 3.9-rc3, we've hit a serialization
> issue when mod_timer() races with itself. This is on a FPGA board and
> kernel .config among others has !SMP and !PREEMPT_COUNT.
>
> The issue happens in mod_timer( ) because timer_pending( ) based early
> exit check is NOT done inside the timer base spinlock - as a networking
> optimization.
>
> The value used in there, timer->entry.next is also used further in call
> chain (all inlines though) for actual list manipulation. However if the
> register containing this pointer remains live across the spinlock (in a
> UP setup with !PREEMPT_COUNT there's nothing forcing gcc to reload) then
> a stale value of next pointer causes incorrect list manipulation,
> observed with following sequence in our tests.
>
> [...]
>
> Signed-off-by: Vineet Gupta <[email protected]>
> Reported-by: Christian Ruppert <[email protected]>
Tested-by: Christian Ruppert <[email protected]>

> Cc: Thomas Gleixner <[email protected]>
> Cc: Christian Ruppert <[email protected]>
> Cc: Pierrick Hascoet <[email protected]>
> Cc: [email protected]
> ---
> include/linux/timer.h | 11 ++++++++++-
> 1 file changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/timer.h b/include/linux/timer.h
> index 8c5a197..1537104 100644
> --- a/include/linux/timer.h
> +++ b/include/linux/timer.h
> @@ -168,7 +168,16 @@ static inline void init_timer_on_stack_key(struct timer_list *timer,
> */
> static inline int timer_pending(const struct timer_list * timer)
> {
> - return timer->entry.next != NULL;
> + int pending = timer->entry.next != NULL;
> +
> + /*
> + * The check above enables timer fast path - early exit.
> + * However most of the call sites are not protected by timer->base
> + * spinlock. If the caller (say mod_timer) races with itself, it
> + * can use the stale "next" pointer. See commit log for details.
> + */
> + barrier();
> + return pending;
> }
>
> extern void add_timer_on(struct timer_list *timer, int cpu);
> --
> 1.7.10.4
>

--
Christian Ruppert , <[email protected]>
/|
Tel: +41/(0)22 816 19-42 //| 3, Chemin du Pr?-Fleuri
_// | bilis Systems CH-1228 Plan-les-Ouates

2013-04-03 12:37:07

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] timer: Fix possible issues with non serialized timer_pending( )

Vineet,

On Fri, 29 Mar 2013, Vineet Gupta wrote:

> When stress testing ARC Linux from 3.9-rc3, we've hit a serialization
> issue when mod_timer() races with itself. This is on a FPGA board and
> kernel .config among others has !SMP and !PREEMPT_COUNT.
>
> The issue happens in mod_timer( ) because timer_pending( ) based early
> exit check is NOT done inside the timer base spinlock - as a networking
> optimization.
>
> The value used in there, timer->entry.next is also used further in call
> chain (all inlines though) for actual list manipulation. However if the
> register containing this pointer remains live across the spinlock (in a
> UP setup with !PREEMPT_COUNT there's nothing forcing gcc to reload) then
> a stale value of next pointer causes incorrect list manipulation,
> observed with following sequence in our tests.
>
> (0). tv1[x] <----> t1 <---> t2
> (1). mod_timer(t1) interrupted after it calls timer_pending()
> (2). mod_timer(t2) completes
> (3). mod_timer(t1) resumes but messes up the list.
> (4). __runt_timers( ) uses bogus timer_list entry / crashes in
> timer->function
>
> The simplest fix is to NOT rely on spinlock based compiler barrier but
> add an explicit one in timer_pending()

That's simple, but dangerous. There is other code which relies on the
implicit barriers of spinlocks, so I think we need to add the barrier
to the !PREEMPT_COUNT implementation of preempt_*() macros.

Thanks,

tglx

> FWIW, the relevant ARCompact disassembly of mod_timer which clearly
> shows the issue due to register reuse is:
>
> mod_timer:
> push_s blink
> mov_s r13,r0 # timer, timer
>
> ...
> ###### timer_pending( )
> ld_s r3,[r13] # <------ <variable>.entry.next LOADED
> brne r3, 0, @.L163
>
> .L163:
> ....
> ###### spin_lock_irq( )
> lr r5, [status32] # flags
> bic r4, r5, 6 # temp, flags,
> and.f 0, r5, 6 # flags,
> flag.nz r4
>
> ###### detach_if_pending( ) begins
>
> tst_s r3,r3 <--------------
> # timer_pending( ) checks timer->entry.next
> # r3 is NOT reloaded by gcc, using stale value
> beq.d @.L169
> mov.eq r0,0
>
> # detach_timer( ): __list_del( )
>
> ld r4,[r13,4] # <variable>.entry.prev, D.31439
> st r4,[r3,4] # <variable>.prev, D.31439
> st r3,[r4] # <variable>.next, D.30246
>
> Signed-off-by: Vineet Gupta <[email protected]>
> Reported-by: Christian Ruppert <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Christian Ruppert <[email protected]>
> Cc: Pierrick Hascoet <[email protected]>
> Cc: [email protected]
> ---
> include/linux/timer.h | 11 ++++++++++-
> 1 file changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/timer.h b/include/linux/timer.h
> index 8c5a197..1537104 100644
> --- a/include/linux/timer.h
> +++ b/include/linux/timer.h
> @@ -168,7 +168,16 @@ static inline void init_timer_on_stack_key(struct timer_list *timer,
> */
> static inline int timer_pending(const struct timer_list * timer)
> {
> - return timer->entry.next != NULL;
> + int pending = timer->entry.next != NULL;
> +
> + /*
> + * The check above enables timer fast path - early exit.
> + * However most of the call sites are not protected by timer->base
> + * spinlock. If the caller (say mod_timer) races with itself, it
> + * can use the stale "next" pointer. See commit log for details.
> + */
> + barrier();
> + return pending;
> }
>
> extern void add_timer_on(struct timer_list *timer, int cpu);
> --
> 1.7.10.4
>
>

2013-04-03 13:05:07

by Christian Ruppert

[permalink] [raw]
Subject: Re: [PATCH] timer: Fix possible issues with non serialized timer_pending( )

Hello Vineet,

In the following a patch for a seemingly unrelated problem (RB-trees in
hrtimer implementation) which seems to fix the original problem at the
same time. For the moment, this patch is insufficiently tested but I am
posting it here (preliminarily) since it seems to be in line with
Thomas' comment: The implementation of irqsave/restore in other
architectures (MIPS, ARM) imply memory barriers and adding this implicit
barrier to ARC code as well seems to stabilise some of our remaining
crashes.

Greetings,
Christian

On Wed, Apr 03, 2013 at 02:36:59PM +0200, Thomas Gleixner wrote:
> Vineet,
>
> On Fri, 29 Mar 2013, Vineet Gupta wrote:
>
> > When stress testing ARC Linux from 3.9-rc3, we've hit a serialization
> > issue when mod_timer() races with itself. This is on a FPGA board and
> > kernel .config among others has !SMP and !PREEMPT_COUNT.
> >
> > The issue happens in mod_timer( ) because timer_pending( ) based early
> > exit check is NOT done inside the timer base spinlock - as a networking
> > optimization.
> >
> > The value used in there, timer->entry.next is also used further in call
> > chain (all inlines though) for actual list manipulation. However if the
> > register containing this pointer remains live across the spinlock (in a
> > UP setup with !PREEMPT_COUNT there's nothing forcing gcc to reload) then
> > a stale value of next pointer causes incorrect list manipulation,
> > observed with following sequence in our tests.
> >
> > (0). tv1[x] <----> t1 <---> t2
> > (1). mod_timer(t1) interrupted after it calls timer_pending()
> > (2). mod_timer(t2) completes
> > (3). mod_timer(t1) resumes but messes up the list.
> > (4). __runt_timers( ) uses bogus timer_list entry / crashes in
> > timer->function
> >
> > The simplest fix is to NOT rely on spinlock based compiler barrier but
> > add an explicit one in timer_pending()
>
> That's simple, but dangerous. There is other code which relies on the
> implicit barriers of spinlocks, so I think we need to add the barrier
> to the !PREEMPT_COUNT implementation of preempt_*() macros.
>
> Thanks,
>
> tglx
>
> > FWIW, the relevant ARCompact disassembly of mod_timer which clearly
> > shows the issue due to register reuse is:
> >
> > mod_timer:
> > push_s blink
> > mov_s r13,r0 # timer, timer
> >
> > ...
> > ###### timer_pending( )
> > ld_s r3,[r13] # <------ <variable>.entry.next LOADED
> > brne r3, 0, @.L163
> >
> > .L163:
> > ....
> > ###### spin_lock_irq( )
> > lr r5, [status32] # flags
> > bic r4, r5, 6 # temp, flags,
> > and.f 0, r5, 6 # flags,
> > flag.nz r4
> >
> > ###### detach_if_pending( ) begins
> >
> > tst_s r3,r3 <--------------
> > # timer_pending( ) checks timer->entry.next
> > # r3 is NOT reloaded by gcc, using stale value
> > beq.d @.L169
> > mov.eq r0,0
> >
> > # detach_timer( ): __list_del( )
> >
> > ld r4,[r13,4] # <variable>.entry.prev, D.31439
> > st r4,[r3,4] # <variable>.prev, D.31439
> > st r3,[r4] # <variable>.next, D.30246
> >
> > Signed-off-by: Vineet Gupta <[email protected]>
> > Reported-by: Christian Ruppert <[email protected]>
> > Cc: Thomas Gleixner <[email protected]>
> > Cc: Christian Ruppert <[email protected]>
> > Cc: Pierrick Hascoet <[email protected]>
> > Cc: [email protected]
> > ---
> > include/linux/timer.h | 11 ++++++++++-
> > 1 file changed, 10 insertions(+), 1 deletion(-)
> >
> > diff --git a/include/linux/timer.h b/include/linux/timer.h
> > index 8c5a197..1537104 100644
> > --- a/include/linux/timer.h
> > +++ b/include/linux/timer.h
> > @@ -168,7 +168,16 @@ static inline void init_timer_on_stack_key(struct timer_list *timer,
> > */
> > static inline int timer_pending(const struct timer_list * timer)
> > {
> > - return timer->entry.next != NULL;
> > + int pending = timer->entry.next != NULL;
> > +
> > + /*
> > + * The check above enables timer fast path - early exit.
> > + * However most of the call sites are not protected by timer->base
> > + * spinlock. If the caller (say mod_timer) races with itself, it
> > + * can use the stale "next" pointer. See commit log for details.
> > + */
> > + barrier();
> > + return pending;
> > }
> >
> > extern void add_timer_on(struct timer_list *timer, int cpu);
> > --
> > 1.7.10.4
> >
> >

--
Christian Ruppert , <[email protected]>
/|
Tel: +41/(0)22 816 19-42 //| 3, Chemin du Pr?-Fleuri
_// | bilis Systems CH-1228 Plan-les-Ouates

2013-04-03 13:11:29

by Christian Ruppert

[permalink] [raw]
Subject: [RFC] Add implicit barriers to irqsave/restore class of functions

This patch adds implicit memory barriers to irqsave/restore functions of
the ARC architecture port in line with what is done in other architectures.

It seems to fix several seemingly unrelated issues in our platform but for
the moment it is insufficiently tested (and might even be incomplete).
Please comment.

Signed-off-by: Christian Ruppert <[email protected]>
---
arch/arc/include/asm/irqflags.h | 8 ++++----
1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/arc/include/asm/irqflags.h b/arch/arc/include/asm/irqflags.h
index ccd8480..c8147d1 100644
--- a/arch/arc/include/asm/irqflags.h
+++ b/arch/arc/include/asm/irqflags.h
@@ -39,7 +39,7 @@ static inline long arch_local_irq_save(void)
" flag.nz %0 \n"
: "=r"(temp), "=r"(flags)
: "n"((STATUS_E1_MASK | STATUS_E2_MASK))
- : "cc");
+ : "memory", "cc");

return flags;
}
@@ -53,7 +53,7 @@ static inline void arch_local_irq_restore(unsigned long flags)
__asm__ __volatile__(
" flag %0 \n"
:
- : "r"(flags));
+ : "r"(flags) : "memory");
}

/*
@@ -73,7 +73,7 @@ static inline void arch_local_irq_disable(void)
" and %0, %0, %1 \n"
" flag %0 \n"
: "=&r"(temp)
- : "n"(~(STATUS_E1_MASK | STATUS_E2_MASK)));
+ : "n"(~(STATUS_E1_MASK | STATUS_E2_MASK)) : "memory");
}

/*
@@ -85,7 +85,7 @@ static inline long arch_local_save_flags(void)

__asm__ __volatile__(
" lr %0, [status32] \n"
- : "=&r"(temp));
+ : "=&r"(temp) : : "memory");

return temp;
}
--
1.7.1

2013-04-03 13:29:51

by Vineet Gupta

[permalink] [raw]
Subject: Re: [RFC] Add implicit barriers to irqsave/restore class of functions

Hi Christian,

On 04/03/2013 06:40 PM, Christian Ruppert wrote:
> This patch adds implicit memory barriers to irqsave/restore functions of
> the ARC architecture port in line with what is done in other architectures.
>
> It seems to fix several seemingly unrelated issues in our platform but for
> the moment it is insufficiently tested (and might even be incomplete).
> Please comment.

I think this is not needed. Semantically we need barrier for spinlocks, not irq
save/restore - although spin locks are one of the primary users of irq
save/restore API.

So repeating what Thomas already said, the barrier already exists for
PREEMPT_COUNT, we need to make sure they are present for !PREEMPT_COUNT.

-Vineet

> Signed-off-by: Christian Ruppert <[email protected]>
> ---
> arch/arc/include/asm/irqflags.h | 8 ++++----
> 1 files changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/arch/arc/include/asm/irqflags.h b/arch/arc/include/asm/irqflags.h
> index ccd8480..c8147d1 100644
> --- a/arch/arc/include/asm/irqflags.h
> +++ b/arch/arc/include/asm/irqflags.h
> @@ -39,7 +39,7 @@ static inline long arch_local_irq_save(void)
> " flag.nz %0 \n"
> : "=r"(temp), "=r"(flags)
> : "n"((STATUS_E1_MASK | STATUS_E2_MASK))
> - : "cc");
> + : "memory", "cc");
>
> return flags;
> }
> @@ -53,7 +53,7 @@ static inline void arch_local_irq_restore(unsigned long flags)
> __asm__ __volatile__(
> " flag %0 \n"
> :
> - : "r"(flags));
> + : "r"(flags) : "memory");
> }
>
> /*
> @@ -73,7 +73,7 @@ static inline void arch_local_irq_disable(void)
> " and %0, %0, %1 \n"
> " flag %0 \n"
> : "=&r"(temp)
> - : "n"(~(STATUS_E1_MASK | STATUS_E2_MASK)));
> + : "n"(~(STATUS_E1_MASK | STATUS_E2_MASK)) : "memory");
> }
>
> /*
> @@ -85,7 +85,7 @@ static inline long arch_local_save_flags(void)
>
> __asm__ __volatile__(
> " lr %0, [status32] \n"
> - : "=&r"(temp));
> + : "=&r"(temp) : : "memory");
>
> return temp;
> }

2013-04-03 14:13:46

by Vineet Gupta

[permalink] [raw]
Subject: [PATCH] [PATCH] Gaurantee spinlocks implicit barrier for !PREEMPT_COUNT

spinlocks built in a !PREEMPT_COUNT config don't have the compiler
barrier provided by preempt_* routines. This can break lot of code which
relies on barrier semantics.

This manifested as random crashes in timer code when stress testing
ARC Linux (3.9-rc3): !SMP && !PREEMPT_COUNT

Here's the exact sequence which caused this:
(0). tv1[x] <----> t1 <---> t2
(1). mod_timer(t1) interrupted after it calls timer_pending()
(2). mod_timer(t2) completes
(3). mod_timer(t1) resumes but messes up the list.
(4). __runt_timers( ) uses bogus timer_list entry / crashes in
timer->function

when mod_timer() races against itself, the spinlock rightly serializes
the tv1[] timer link list, however timer_pending() called outside the
spinlock accesses timer's link list element, cached in a register.
With low register pressure (and a deep register file), there's nothing
forcing gcc to reload the element across the spinlock, causing a stale
value in register in case of race - ensuing a list corruption.

And the ARcompact disassembly which shows the culprit generated code:

mod_timer:
push_s blink
mov_s r13,r0 # timer, timer

..
###### timer_pending( )
ld_s r3,[r13] # <------ <variable>.entry.next LOADED
brne r3, 0, @.L163

.L163:
..
###### spin_lock_irq( )
lr r5, [status32] # flags
bic r4, r5, 6 # temp, flags,
and.f 0, r5, 6 # flags,
flag.nz r4

###### detach_if_pending( ) begins

tst_s r3,r3 <--------------
# timer_pending( ) checks timer->entry.next
# r3 is NOT reloaded by gcc, using stale value
beq.d @.L169
mov.eq r0,0

##### detach_timer( ): __list_del( )

ld r4,[r13,4] # <variable>.entry.prev, D.31439
st r4,[r3,4] # <variable>.prev, D.31439

st r3,[r4] # <variable>.next, D.30246

Signed-off-by: Vineet Gupta <[email protected]>
Reported-by: Christian Ruppert <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Christian Ruppert <[email protected]>
Cc: Pierrick Hascoet <[email protected]>
Cc: Robert Love <[email protected]>
Cc: [email protected]
Cc: Frederic Weisbecker <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: [email protected]
---
include/linux/preempt.h | 21 +++++++++++++--------
1 file changed, 13 insertions(+), 8 deletions(-)

diff --git a/include/linux/preempt.h b/include/linux/preempt.h
index 5a710b9..354d6e3 100644
--- a/include/linux/preempt.h
+++ b/include/linux/preempt.h
@@ -93,14 +93,19 @@ do { \

#else /* !CONFIG_PREEMPT_COUNT */

-#define preempt_disable() do { } while (0)
-#define sched_preempt_enable_no_resched() do { } while (0)
-#define preempt_enable_no_resched() do { } while (0)
-#define preempt_enable() do { } while (0)
-
-#define preempt_disable_notrace() do { } while (0)
-#define preempt_enable_no_resched_notrace() do { } while (0)
-#define preempt_enable_notrace() do { } while (0)
+/*
+ * compiler barrier needed to ensure that spinlocks provide the barrier
+ * semantics despite !CONFIG_PREEMPT_COUNT.
+ * See commit log for actual bug which forced this change
+ */
+#define preempt_disable() do { barrier(); } while (0)
+#define sched_preempt_enable_no_resched() do { barrier(); } while (0)
+#define preempt_enable_no_resched() do { barrier(); } while (0)
+#define preempt_enable() do { barrier(); } while (0)
+
+#define preempt_disable_notrace() do { barrier(); } while (0)
+#define preempt_enable_no_resched_notrace() do { barrier(); } while (0)
+#define preempt_enable_notrace() do { barrier(); } while (0)

#endif /* CONFIG_PREEMPT_COUNT */

--
1.7.10.4

2013-04-04 08:27:29

by Christian Ruppert

[permalink] [raw]
Subject: Re: [RFC] Add implicit barriers to irqsave/restore class of functions

Hi Vineet,

Just a short message to inform you that the test campaign on our RFC
patch has run through successfully and we consider the patch stable
enough for release to our customers. The main reason the barriers were
added in this particular place is because other architectures do the
same.

I agree that the patch adding barriers in preempt_* is more in line with
Thomas' comment than our RFC (which predates that comment) and we have
also started a test campaign for that patch yesterday. I'll give an update
directly in reply to the patch when that campaign has run through.

Greetings,
Christian

On Wed, Apr 03, 2013 at 06:59:36PM +0530, Vineet Gupta wrote:
> Hi Christian,
>
> On 04/03/2013 06:40 PM, Christian Ruppert wrote:
> > This patch adds implicit memory barriers to irqsave/restore functions of
> > the ARC architecture port in line with what is done in other architectures.
> >
> > It seems to fix several seemingly unrelated issues in our platform but for
> > the moment it is insufficiently tested (and might even be incomplete).
> > Please comment.
>
> I think this is not needed. Semantically we need barrier for spinlocks, not irq
> save/restore - although spin locks are one of the primary users of irq
> save/restore API.
>
> So repeating what Thomas already said, the barrier already exists for
> PREEMPT_COUNT, we need to make sure they are present for !PREEMPT_COUNT.
>
> -Vineet
>
> > Signed-off-by: Christian Ruppert <[email protected]>
> > ---
> > arch/arc/include/asm/irqflags.h | 8 ++++----
> > 1 files changed, 4 insertions(+), 4 deletions(-)
> >
> > diff --git a/arch/arc/include/asm/irqflags.h b/arch/arc/include/asm/irqflags.h
> > index ccd8480..c8147d1 100644
> > --- a/arch/arc/include/asm/irqflags.h
> > +++ b/arch/arc/include/asm/irqflags.h
> > @@ -39,7 +39,7 @@ static inline long arch_local_irq_save(void)
> > " flag.nz %0 \n"
> > : "=r"(temp), "=r"(flags)
> > : "n"((STATUS_E1_MASK | STATUS_E2_MASK))
> > - : "cc");
> > + : "memory", "cc");
> >
> > return flags;
> > }
> > @@ -53,7 +53,7 @@ static inline void arch_local_irq_restore(unsigned long flags)
> > __asm__ __volatile__(
> > " flag %0 \n"
> > :
> > - : "r"(flags));
> > + : "r"(flags) : "memory");
> > }
> >
> > /*
> > @@ -73,7 +73,7 @@ static inline void arch_local_irq_disable(void)
> > " and %0, %0, %1 \n"
> > " flag %0 \n"
> > : "=&r"(temp)
> > - : "n"(~(STATUS_E1_MASK | STATUS_E2_MASK)));
> > + : "n"(~(STATUS_E1_MASK | STATUS_E2_MASK)) : "memory");
> > }
> >
> > /*
> > @@ -85,7 +85,7 @@ static inline long arch_local_save_flags(void)
> >
> > __asm__ __volatile__(
> > " lr %0, [status32] \n"
> > - : "=&r"(temp));
> > + : "=&r"(temp) : : "memory");
> >
> > return temp;
> > }
>

--
Christian Ruppert , <[email protected]>
/|
Tel: +41/(0)22 816 19-42 //| 3, Chemin du Pr?-Fleuri
_// | bilis Systems CH-1228 Plan-les-Ouates

2013-04-04 15:29:12

by Christian Ruppert

[permalink] [raw]
Subject: Re: [PATCH] [PATCH] Gaurantee spinlocks implicit barrier for !PREEMPT_COUNT

Hi Vineet,

Our stress testing campaign has just successfully completed on this
patch. It seems to solve several issues we have seen in unpatched
versions, amongst others the original timer issue, a crash in hrtimer
rb-tree manipulation etc.

Greetings,
Christian

On Wed, Apr 03, 2013 at 07:41:22PM +0530, Vineet Gupta wrote:
> spinlocks built in a !PREEMPT_COUNT config don't have the compiler
> barrier provided by preempt_* routines. This can break lot of code which
> relies on barrier semantics.
>
> This manifested as random crashes in timer code when stress testing
> ARC Linux (3.9-rc3): !SMP && !PREEMPT_COUNT
>
> [...]
>
> Signed-off-by: Vineet Gupta <[email protected]>
> Reported-by: Christian Ruppert <[email protected]>
Tested-by: Christian Ruppert <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Christian Ruppert <[email protected]>
> Cc: Pierrick Hascoet <[email protected]>
> Cc: Robert Love <[email protected]>
> Cc: [email protected]
> Cc: Frederic Weisbecker <[email protected]>
> Cc: Steven Rostedt <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: [email protected]
> [...]
--
Christian Ruppert , <[email protected]>
/|
Tel: +41/(0)22 816 19-42 //| 3, Chemin du Pr?-Fleuri
_// | bilis Systems CH-1228 Plan-les-Ouates

2013-04-04 16:13:09

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFC] Add implicit barriers to irqsave/restore class of functions

On Wed, 2013-04-03 at 15:10 +0200, Christian Ruppert wrote:
> This patch adds implicit memory barriers to irqsave/restore functions
> of
> the ARC architecture port in line with what is done in other
> architectures.

> diff --git a/arch/arc/include/asm/irqflags.h
> b/arch/arc/include/asm/irqflags.h
> index ccd8480..c8147d1 100644
> --- a/arch/arc/include/asm/irqflags.h
> +++ b/arch/arc/include/asm/irqflags.h
> @@ -39,7 +39,7 @@ static inline long arch_local_irq_save(void)
> " flag.nz %0 \n"
> : "=r"(temp), "=r"(flags)
> : "n"((STATUS_E1_MASK | STATUS_E2_MASK))
> - : "cc");
> + : "memory", "cc");

That's not a memory barrier, that a memory clobber, aka a compiler
barrier.

2013-04-05 04:27:29

by Vineet Gupta

[permalink] [raw]
Subject: Re: [RFC] Add implicit barriers to irqsave/restore class of functions

Hi Peter,

On 04/04/2013 09:43 PM, Peter Zijlstra wrote:
> - : "cc");
> + : "memory", "cc");
> That's not a memory barrier, that a memory clobber, aka a compiler
> barrier.

For the problem under consideration we indeed want a compiler barrier because the
error shows up due to a stale register which is live across a spinlock for
!PREEMPT_COUNT config.

However IMO doing this in irq save/restore macros is semantically incorrect since
those macros might be used elsewhere which don't need the compiler reload reg
semantics. Further per tglx' suggestion fixing preempt_* macros for !PREEMPT_COUNT
case would fix this independent of what arch is doing.

A patch to that effect was already posted to lists:
http://www.spinics.net/lists/kernel/msg1510885.html
Please let us know what you think.

Thx,
-Vineet

2013-04-05 04:36:29

by Vineet Gupta

[permalink] [raw]
Subject: Re: [PATCH] [PATCH] Gaurantee spinlocks implicit barrier for !PREEMPT_COUNT

Hi Thomas,

On 04/04/2013 08:58 PM, Christian Ruppert wrote:
> Hi Vineet,
>
> Our stress testing campaign has just successfully completed on this
> patch. It seems to solve several issues we have seen in unpatched
> versions, amongst others the original timer issue, a crash in hrtimer
> rb-tree manipulation etc.
>
> Greetings,
> Christian

Given that we are closing on 3.9 release, and that one/more of these patches fix a
real issue for us - can you please consider my earlier patch to fix
timer_pending() only for 3.9 [http://www.spinics.net/lists/kernel/msg1508224.html]
This will be a localized / low risk change for this late in cycle.

For 3.10 - assuming preempt_* change is blessed, we can revert this one and add
that fuller/better fix.

What do you think ?

Thx,
-Vineet

2013-04-06 13:34:36

by Vineet Gupta

[permalink] [raw]
Subject: Re: [PATCH] [PATCH] Gaurantee spinlocks implicit barrier for !PREEMPT_COUNT

> On 04/05/2013 10:06 AM, Vineet Gupta wrote:
> Hi Thomas,
>
> Given that we are closing on 3.9 release, and that one/more of these patches fix a
> real issue for us - can you please consider my earlier patch to fix
> timer_pending() only for 3.9 [http://www.spinics.net/lists/kernel/msg1508224.html]
> This will be a localized / low risk change for this late in cycle.
>
> For 3.10 - assuming preempt_* change is blessed, we can revert this one and add
> that fuller/better fix.
>
> What do you think ?
>
> Thx,
> -Vineet
>

Ping ! Sorry for pestering, but one of the fixes is needed before 3.9 goes out.

Simple localized fix: http://www.spinics.net/lists/kernel/msg1508224.html
Better but risky: http://www.spinics.net/lists/kernel/msg1510885.html

Thx,
-Vineet

2013-04-06 16:13:55

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] [PATCH] Gaurantee spinlocks implicit barrier for !PREEMPT_COUNT

This is all *COMPLETELY* wrong.

Neither the normal preempt macros, nor the plain spinlocks, should
protect anything at all against interrupts.

The real protection should come from the spin_lock_irqsave() in
lock_timer_base(), not from spinlocks, and not from preemption.

It sounds like ARC is completely buggered, and hasn't made the irq
disable be a compiler barrier. That's an ARC bug, and it's a big one,
and can affect a lot more than just the timers.

So the real fix is to add a "memory" clobber to
arch_local_irq_save/restore() and friends, so that the compiler
doesn't get to cache memory state from the irq-enabled region into the
irq-disabled one.

Fix ARC, don't try to blame generic code. You should have asked
yourself why only ARC saw this bug, when the code apparently works
fine for everybody else!

Linus

On Sat, Apr 6, 2013 at 6:34 AM, Vineet Gupta <[email protected]> wrote:
>> On 04/05/2013 10:06 AM, Vineet Gupta wrote:
>> Hi Thomas,
>>
>> Given that we are closing on 3.9 release, and that one/more of these patches fix a
>> real issue for us - can you please consider my earlier patch to fix
>> timer_pending() only for 3.9 [http://www.spinics.net/lists/kernel/msg1508224.html]
>> This will be a localized / low risk change for this late in cycle.
>>
>> For 3.10 - assuming preempt_* change is blessed, we can revert this one and add
>> that fuller/better fix.
>>
>> What do you think ?
>>
>> Thx,
>> -Vineet
>>
>
> Ping ! Sorry for pestering, but one of the fixes is needed before 3.9 goes out.
>
> Simple localized fix: http://www.spinics.net/lists/kernel/msg1508224.html
> Better but risky: http://www.spinics.net/lists/kernel/msg1510885.html
>
> Thx,
> -Vineet

2013-04-06 18:01:26

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] [PATCH] Gaurantee spinlocks implicit barrier for !PREEMPT_COUNT

Looking around, it looks like c6x has the same bug.

Some other architectures (tile) have such subtle implementations
(where is __insn_mtspr() defined?) that I have a hard time judging.
And maybe I missed something, but the rest seem ok.

Linus

On Sat, Apr 6, 2013 at 9:13 AM, Linus Torvalds
<[email protected]> wrote:
> This is all *COMPLETELY* wrong.
>
> Neither the normal preempt macros, nor the plain spinlocks, should
> protect anything at all against interrupts.
>
> The real protection should come from the spin_lock_irqsave() in
> lock_timer_base(), not from spinlocks, and not from preemption.
>
> It sounds like ARC is completely buggered, and hasn't made the irq
> disable be a compiler barrier. That's an ARC bug, and it's a big one,
> and can affect a lot more than just the timers.
>
> So the real fix is to add a "memory" clobber to
> arch_local_irq_save/restore() and friends, so that the compiler
> doesn't get to cache memory state from the irq-enabled region into the
> irq-disabled one.
>
> Fix ARC, don't try to blame generic code. You should have asked
> yourself why only ARC saw this bug, when the code apparently works
> fine for everybody else!
>
> Linus
>
> On Sat, Apr 6, 2013 at 6:34 AM, Vineet Gupta <[email protected]> wrote:
>>> On 04/05/2013 10:06 AM, Vineet Gupta wrote:
>>> Hi Thomas,
>>>
>>> Given that we are closing on 3.9 release, and that one/more of these patches fix a
>>> real issue for us - can you please consider my earlier patch to fix
>>> timer_pending() only for 3.9 [http://www.spinics.net/lists/kernel/msg1508224.html]
>>> This will be a localized / low risk change for this late in cycle.
>>>
>>> For 3.10 - assuming preempt_* change is blessed, we can revert this one and add
>>> that fuller/better fix.
>>>
>>> What do you think ?
>>>
>>> Thx,
>>> -Vineet
>>>
>>
>> Ping ! Sorry for pestering, but one of the fixes is needed before 3.9 goes out.
>>
>> Simple localized fix: http://www.spinics.net/lists/kernel/msg1508224.html
>> Better but risky: http://www.spinics.net/lists/kernel/msg1510885.html
>>
>> Thx,
>> -Vineet

2013-04-06 19:55:34

by Jacquiot, Aurelien

[permalink] [raw]
Subject: RE: [PATCH] [PATCH] Gaurantee spinlocks implicit barrier for !PREEMPT_COUNT

Agreed. We will fix it.

Aurelien


Texas Instruments France SA, 821 Avenue Jack Kilby, 06270 Villeneuve Loubet. 036 420 040 R.C.S Antibes. Capital de EUR 12.654.784

-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Linus Torvalds
Sent: Saturday, April 06, 2013 8:01 PM
To: Vineet Gupta; Mark Salter; Jacquiot, Aurelien
Cc: Thomas Gleixner; Christian Ruppert; Pierrick Hascoet; Frederic Weisbecker; Steven Rostedt; Peter Zijlstra; Ingo Molnar; Linux Kernel Mailing List; [email protected]; [email protected]
Subject: Re: [PATCH] [PATCH] Gaurantee spinlocks implicit barrier for !PREEMPT_COUNT

Looking around, it looks like c6x has the same bug.

Some other architectures (tile) have such subtle implementations (where is __insn_mtspr() defined?) that I have a hard time judging.
And maybe I missed something, but the rest seem ok.

Linus

On Sat, Apr 6, 2013 at 9:13 AM, Linus Torvalds <[email protected]> wrote:
> This is all *COMPLETELY* wrong.
>
> Neither the normal preempt macros, nor the plain spinlocks, should
> protect anything at all against interrupts.
>
> The real protection should come from the spin_lock_irqsave() in
> lock_timer_base(), not from spinlocks, and not from preemption.
>
> It sounds like ARC is completely buggered, and hasn't made the irq
> disable be a compiler barrier. That's an ARC bug, and it's a big one,
> and can affect a lot more than just the timers.
>
> So the real fix is to add a "memory" clobber to
> arch_local_irq_save/restore() and friends, so that the compiler
> doesn't get to cache memory state from the irq-enabled region into the
> irq-disabled one.
>
> Fix ARC, don't try to blame generic code. You should have asked
> yourself why only ARC saw this bug, when the code apparently works
> fine for everybody else!
>
> Linus
>
> On Sat, Apr 6, 2013 at 6:34 AM, Vineet Gupta <[email protected]> wrote:
>>> On 04/05/2013 10:06 AM, Vineet Gupta wrote:
>>> Hi Thomas,
>>>
>>> Given that we are closing on 3.9 release, and that one/more of these
>>> patches fix a real issue for us - can you please consider my earlier
>>> patch to fix
>>> timer_pending() only for 3.9
>>> [http://www.spinics.net/lists/kernel/msg1508224.html]
>>> This will be a localized / low risk change for this late in cycle.
>>>
>>> For 3.10 - assuming preempt_* change is blessed, we can revert this
>>> one and add that fuller/better fix.
>>>
>>> What do you think ?
>>>
>>> Thx,
>>> -Vineet
>>>
>>
>> Ping ! Sorry for pestering, but one of the fixes is needed before 3.9 goes out.
>>
>> Simple localized fix:
>> http://www.spinics.net/lists/kernel/msg1508224.html
>> Better but risky: http://www.spinics.net/lists/kernel/msg1510885.html
>>
>> Thx,
>> -Vineet

????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2013-04-08 04:26:57

by Vineet Gupta

[permalink] [raw]
Subject: Re: [PATCH] [PATCH] Gaurantee spinlocks implicit barrier for !PREEMPT_COUNT

Hi Linus,

On 04/06/2013 09:43 PM, Linus Torvalds wrote:
> This is all *COMPLETELY* wrong.
>
> Neither the normal preempt macros, nor the plain spinlocks, should
> protect anything at all against interrupts.
>
> The real protection should come from the spin_lock_irqsave() in
> lock_timer_base(), not from spinlocks, and not from preemption.
>
> It sounds like ARC is completely buggered, and hasn't made the irq
> disable be a compiler barrier. That's an ARC bug, and it's a big one,
> and can affect a lot more than just the timers.
>
> So the real fix is to add a "memory" clobber to
> arch_local_irq_save/restore() and friends, so that the compiler
> doesn't get to cache memory state from the irq-enabled region into the
> irq-disabled one.
>
> Fix ARC, don't try to blame generic code. You should have asked
> yourself why only ARC saw this bug, when the code apparently works
> fine for everybody else!
>
> Linus

Christian had already proposed that change - only I was reluctant to take it - as
local_irq_* is used heavily in a configuration of ARC700 linux where (!SMP) cpu
doesn't support load-locked/store-conditional instructions - hence atomic R-M-W
sequences need those routines. Adding a mem clobber would lead to pathetic
generated code hence the reason it was likely removed by me at some point in time.

Also I'd thought that given that barriers are already present in PREEMPT_COUNT
variant of preempt_* macros we might as well add them to !PREEMPT_COUNT ones.

But thinking about it more - you are right (as you always are) - the situation I
saw with timers code, could very well show up with vanilla local_irq_save/restore
when a piece of code races with itself (specially for our mostly !SMP configs).
Would you be OK if I send the single patch to ARC by email (for 3.9-rc7) or you'd
rather have the pull request.

Thx,
-Vineet

2013-04-08 04:48:24

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] [PATCH] Gaurantee spinlocks implicit barrier for !PREEMPT_COUNT

On Sun, Apr 7, 2013 at 9:20 PM, Vineet Gupta <[email protected]> wrote:
>
> Christian had already proposed that change - only I was reluctant to take it - as
> local_irq_* is used heavily in a configuration of ARC700 linux where (!SMP) cpu
> doesn't support load-locked/store-conditional instructions - hence atomic R-M-W
> sequences need those routines. Adding a mem clobber would lead to pathetic
> generated code hence the reason it was likely removed by me at some point in time.

Umm. The irq instructions absolutely *HAVE* to have the barriers. And
yes, the R-M-W instructions (if they are protected by them) obviously
need to have the load and the store inside the spinlocked region.

> Also I'd thought that given that barriers are already present in PREEMPT_COUNT
> variant of preempt_* macros we might as well add them to !PREEMPT_COUNT ones.

That's absolutely insane. It's insane for two reasons:

- it's not SUFFICIENT. Any user of irq-safety needs the loads and
stores to stay inside the irq-safe region, and the preempt macros have
absolutely nothing to do with that.

- it's POINTLESS and WRONG. If you run on UP, and have preemption
disabled, then there is no reason for a barrier in the preempt code,
since moving code around them doesn't matter - nothing is ever going
to preempt that.

The thing is, the irq disable/enable sequence absolutely has to have a
memory barrier in it, because if the compiler moves a load outside of
the irq-protected region, then that is a bug. Now, if you want to play
games with your atomics and do them with handcoded asm, go right
ahead, but that really isn't an argument for getting everything else
wrong.

That said, thinking about barriers and preemption made me realize that
we do have a potential issue between: (a) non-preemption UP kernel
(with no barriers in the preempt_enable/disable()) and (b)
architectures that use inline asm without a memory clobber for
get_user/put_user(). That includes x86.

The reason is that I could imagine code like

if (get_user(val, addr))
return -EFAULT;
preempt_disable();
... do something percpu ..
preempt_enable();

and it turns out that for non-preempt UP, we don't tell the compiler
anywhere that it can't move the get_user past the preempt_disable. But
the get_user() can cause a preemption point because of a page fault,
obviously.

I suspect that the best way to fix this ends up relying on the gcc
"asm volatile" rules, and make the rule be that:
- preempt_disable/enable have to generate an asm volatile() string
(preferably just a ASM comment saying "preempt disable/enable")
- get_user/put_user doesn't need to add a memory clobber, but needs
to be done with an asm volatile too.

Then the gcc "no reordering of volatile asms" should make the above be
ok, without having to add an actual compiler memory barrier.

Ingo? Peter? I'm not sure anybody really uses UP+no-preempt on x86,
but it does seem to be a bug.. Comments?

But the irq disable/enable sequences definitely need the memory
clobber. Because caching memory that could be changed by an interrupt
in registers is not good.

Linus

2013-04-08 04:49:31

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] [PATCH] Gaurantee spinlocks implicit barrier for !PREEMPT_COUNT

On Sun, Apr 7, 2013 at 9:20 PM, Vineet Gupta <[email protected]> wrote:
>
> Would you be OK if I send the single patch to ARC by email (for 3.9-rc7) or you'd
> rather have the pull request.

I got distracted by thinking about user-accesses vs preemption, but
yes, sending the ARC patch to fix things by email as a plain patch is
fine.

Linus

2013-04-08 13:38:08

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] [PATCH] Gaurantee spinlocks implicit barrier for !PREEMPT_COUNT

On Sun, 2013-04-07 at 21:48 -0700, Linus Torvalds wrote:
> That said, thinking about barriers and preemption made me realize that
> we do have a potential issue between: (a) non-preemption UP kernel
> (with no barriers in the preempt_enable/disable()) and (b)
> architectures that use inline asm without a memory clobber for
> get_user/put_user(). That includes x86.
>
> The reason is that I could imagine code like
>
> if (get_user(val, addr))
> return -EFAULT;
> preempt_disable();
> ... do something percpu ..
> preempt_enable();
>
> and it turns out that for non-preempt UP, we don't tell the compiler
> anywhere that it can't move the get_user past the preempt_disable. But
> the get_user() can cause a preemption point because of a page fault,
> obviously.
>
> I suspect that the best way to fix this ends up relying on the gcc
> "asm volatile" rules, and make the rule be that:
> - preempt_disable/enable have to generate an asm volatile() string
> (preferably just a ASM comment saying "preempt disable/enable")
> - get_user/put_user doesn't need to add a memory clobber, but needs
> to be done with an asm volatile too.
>
> Then the gcc "no reordering of volatile asms" should make the above be
> ok, without having to add an actual compiler memory barrier.
>
> Ingo? Peter? I'm not sure anybody really uses UP+no-preempt on x86,
> but it does seem to be a bug.. Comments?

Right, stuff between preempt_disable() and preempt_enable() is supposed
to appear atomic wrt scheduling contexts, allowing any schedule to
happen in between would violate this.

I'm not seeing how this would be UP only though, I can see the same
thing happening on SMP+no-preempt.

Also, is {get,put}_user() the only construct that can do this? If so,
using the "asm volatile" rules as described might be the best way,
otherwise making the PREEMPT_COUNT=n operations be compiler barriers
seems like the safer option.

That said, I can't remember ever having seen a BUG like this, even
though !PREEMPT is (or at least was) the most popular distro setting.

2013-04-08 14:05:52

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH] [PATCH] Gaurantee spinlocks implicit barrier for !PREEMPT_COUNT

On Sun, 2013-04-07 at 21:48 -0700, Linus Torvalds wrote:

> Ingo? Peter? I'm not sure anybody really uses UP+no-preempt on x86,
> but it does seem to be a bug.. Comments?

I believe a lot of people still use no-preempt. Well, at least voluntary
preemption, which would have the same bug.

I'm thinking that we may have just been lucky that gcc didn't move the
get_user() into a place that would cause issues.

Sounds like a bug to me.

-- Steve

2013-04-08 14:31:47

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH] [PATCH] Gaurantee spinlocks implicit barrier for !PREEMPT_COUNT

On Mon, 2013-04-08 at 15:37 +0200, Peter Zijlstra wrote:

> That said, I can't remember ever having seen a BUG like this, even
> though !PREEMPT is (or at least was) the most popular distro setting.

It requires gcc reordering the code to where a preempt can happen inside
preempt_disable. And also put in a position where the preempt_disable
code it gets added matters.

Then if gcc does this, we need a page fault to occur with a get_user()
operation, which in practice seldom happens as most get user operations
are done on freshly modified memory.

And then, it would require the page fault to cause a schedule. This is
the most likely of the things needed to occur, but itself is not a
problem.

Then, the schedule would have to cause the data that is being protect by
the preempt_disable() to be corrupted. Either by scheduling in another
process that monkeys with the data. Or if it protects per-cpu data,
scheduling to another CPU (for the SMP case only).

If any of the above does not occur, then you wont see a bug. This is
highly unlikely to happen, but that's no excuse to not fix it. But it
probably explains why we never saw a bug report. Heck, it may have
happened, but it would be hard to reproduce, and just forgotten about.

-- Steve


2013-04-08 14:50:26

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] [PATCH] Gaurantee spinlocks implicit barrier for !PREEMPT_COUNT

On Mon, Apr 8, 2013 at 7:31 AM, Steven Rostedt <[email protected]> wrote:
>
> It requires gcc reordering the code to where a preempt can happen inside
> preempt_disable. And also put in a position where the preempt_disable
> code it gets added matters.
>
> Then if gcc does this, we need a page fault to occur with a get_user()
> operation, which in practice seldom happens as most get user operations
> are done on freshly modified memory.
>
> And then, it would require the page fault to cause a schedule. This is
> the most likely of the things needed to occur, but itself is not a
> problem.
>
> Then, the schedule would have to cause the data that is being protect by
> the preempt_disable() to be corrupted. Either by scheduling in another
> process that monkeys with the data. Or if it protects per-cpu data,
> scheduling to another CPU (for the SMP case only).
>
> If any of the above does not occur, then you wont see a bug.

Right. The gcc reordering is also hard to actually notice if it does
happen, so even testing the fix for this looks nontrivial.

Something like the appended (whitespace-damaged and TOTALLY UNTESTED)
might be sufficient, but..

Comments?

Linus

---
diff --git a/include/linux/preempt.h b/include/linux/preempt.h
index 5a710b9c578e..465df1c13386 100644
--- a/include/linux/preempt.h
+++ b/include/linux/preempt.h
@@ -93,14 +93,17 @@ do { \

#else /* !CONFIG_PREEMPT_COUNT */

-#define preempt_disable() do { } while (0)
-#define sched_preempt_enable_no_resched() do { } while (0)
-#define preempt_enable_no_resched() do { } while (0)
-#define preempt_enable() do { } while (0)
-
-#define preempt_disable_notrace() do { } while (0)
-#define preempt_enable_no_resched_notrace() do { } while (0)
-#define preempt_enable_notrace() do { } while (0)
+/* This is only a barrier to other asms. Notably get_user/put_user */
+#define asm_barrier() asm volatile("")
+
+#define preempt_disable() asm_barrier()
+#define sched_preempt_enable_no_resched() asm_barrier()
+#define preempt_enable_no_resched() asm_barrier()
+#define preempt_enable() asm_barrier()
+
+#define preempt_disable_notrace() asm_barrier()
+#define preempt_enable_no_resched_notrace() asm_barrier()
+#define preempt_enable_notrace() asm_barrier()

#endif /* CONFIG_PREEMPT_COUNT */

2013-04-08 14:59:24

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH] [PATCH] Gaurantee spinlocks implicit barrier for !PREEMPT_COUNT

On Mon, 2013-04-08 at 07:50 -0700, Linus Torvalds wrote:

> ---
> diff --git a/include/linux/preempt.h b/include/linux/preempt.h
> index 5a710b9c578e..465df1c13386 100644
> --- a/include/linux/preempt.h
> +++ b/include/linux/preempt.h
> @@ -93,14 +93,17 @@ do { \
>
> #else /* !CONFIG_PREEMPT_COUNT */
>
> -#define preempt_disable() do { } while (0)
> -#define sched_preempt_enable_no_resched() do { } while (0)
> -#define preempt_enable_no_resched() do { } while (0)
> -#define preempt_enable() do { } while (0)
> -
> -#define preempt_disable_notrace() do { } while (0)
> -#define preempt_enable_no_resched_notrace() do { } while (0)
> -#define preempt_enable_notrace() do { } while (0)
> +/* This is only a barrier to other asms. Notably get_user/put_user */

Probably should add in the comment:

" or anything else that can cause a hidden schedule. "

-- Steve

> +#define asm_barrier() asm volatile("")
> +
> +#define preempt_disable() asm_barrier()
> +#define sched_preempt_enable_no_resched() asm_barrier()
> +#define preempt_enable_no_resched() asm_barrier()
> +#define preempt_enable() asm_barrier()
> +
> +#define preempt_disable_notrace() asm_barrier()
> +#define preempt_enable_no_resched_notrace() asm_barrier()
> +#define preempt_enable_notrace() asm_barrier()
>
> #endif /* CONFIG_PREEMPT_COUNT */

2013-04-08 15:07:46

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] [PATCH] Gaurantee spinlocks implicit barrier for !PREEMPT_COUNT

On Mon, Apr 8, 2013 at 7:59 AM, Steven Rostedt <[email protected]> wrote:
>> +/* This is only a barrier to other asms. Notably get_user/put_user */
>
> Probably should add in the comment:
>
> " or anything else that can cause a hidden schedule. "
>

Fair enough. And I just remembered why I thought UP was special - we
need to do the same thing about spinlocks, for the same reasons.

So that "asm_barrier()" should probably be in <linux/compiler.h> along
with the "normal" barrier() definition.

*AND* somebody should re-check the gcc documentation on "volatile
asm". I'm pretty sure it used to say "not moved significantly,
including against each other" or something like that, but the gcc asm
docs have changed over time.

I'd hate to have to add a memory clobber to the get_user/put_user
asms, because that would *really* hurt. But maybe we could add some
other clobber ("cc" or similar) to make sure they can't be re-ordered
if the "volatile" isn't sufficient to make sure asms don't get
re-ordered wrt each other.

Linus

2013-04-09 14:32:45

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] [PATCH] Gaurantee spinlocks implicit barrier for !PREEMPT_COUNT

On Mon, Apr 8, 2013 at 8:07 AM, Linus Torvalds
<[email protected]> wrote:
> On Mon, Apr 8, 2013 at 7:59 AM, Steven Rostedt <[email protected]> wrote:
>>> +/* This is only a barrier to other asms. Notably get_user/put_user */
>>
>> Probably should add in the comment:
>>
>> " or anything else that can cause a hidden schedule. "
>>
>
> Fair enough. And I just remembered why I thought UP was special - we
> need to do the same thing about spinlocks, for the same reasons.
>
> So that "asm_barrier()" should probably be in <linux/compiler.h> along
> with the "normal" barrier() definition.
>

I'm a moron.

Yes, "asm_barrier()" is a valid barrier for asms. But without the
"memory" clobber, it doesn't actually end up being a barrier to any
normal C loads and stores from memory, so it doesn't actually help.

So I suspect we need to just make UP spinlocks and preemption
enable/disable be full compiler barriers after all.

Something like the attached (still untested, although it seems to at
least compile) patch. Comments?

Linus


Attachments:
patch.diff (4.17 kB)

2013-04-09 16:38:17

by Chris Metcalf

[permalink] [raw]
Subject: [PATCH] tile: comment assumption about __insn_mtspr for <asm/irqflags.h>

The arch_local_irq_save(), etc., routines are required to function
as compiler barriers. They do, but it's subtle and requires knowing
that the gcc builtin __insn_mtspr() is marked as a memory clobber.
Provide a comment explaining the assumption.

Signed-off-by: Chris Metcalf <[email protected]>
---
arch/tile/include/asm/irqflags.h | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)

Linus wrote:
> Some other architectures (tile) have such subtle implementations
> (where is __insn_mtspr() defined?) that I have a hard time judging.

diff --git a/arch/tile/include/asm/irqflags.h b/arch/tile/include/asm/irqflags.h
index 241c0bb..c96f9bb 100644
--- a/arch/tile/include/asm/irqflags.h
+++ b/arch/tile/include/asm/irqflags.h
@@ -40,7 +40,15 @@
#include <asm/percpu.h>
#include <arch/spr_def.h>

-/* Set and clear kernel interrupt masks. */
+/*
+ * Set and clear kernel interrupt masks.
+ *
+ * NOTE: __insn_mtspr() is a compiler builtin marked as a memory
+ * clobber. We rely on it being equivalent to a compiler barrier in
+ * this code since arch_local_irq_save() and friends must act as
+ * compiler barriers. This compiler semantic is baked into enough
+ * places that the compiler will maintain it going forward.
+ */
#if CHIP_HAS_SPLIT_INTR_MASK()
#if INT_PERF_COUNT < 32 || INT_AUX_PERF_COUNT < 32 || INT_MEM_ERROR >= 32
# error Fix assumptions about which word various interrupts are in
--
1.7.10.3

2013-04-10 07:12:41

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] [PATCH] Gaurantee spinlocks implicit barrier for !PREEMPT_COUNT

On Tue, 2013-04-09 at 07:32 -0700, Linus Torvalds wrote:
> Something like the attached (still untested, although it seems to at
> least compile) patch. Comments?

Yes, makes me feel far more comfortable than this "asm volatile"
business (which as you noted has holes in it).