2005-01-10 00:37:44

by Thomas Gleixner

[permalink] [raw]
Subject: [PATCH 2.6.10-mm2] Fix preemption race [1/3] (Core)

The idle-thread-preemption-fix.patch introduced a race, which is not
critical, but might give us an extra turn through the scheduler. When
interrupts are reenabled in entry.c and an interrupt occures before we
reach the add_preempt_schedule() in preempt_schedule we get rescheduled
again in the return from interrupt path.

The patch prevents this by leaving preemption disabled (set to 1) and
providing a seperate function entry_preempt_schedule().
This was done by moving the inner loop of preempt_schedule() into a
seperate function do_preempt_schedule(), which has an increment argument.
The function is called from preempt_schedule() with PREEMPT_ACTIVE and
from entry_preempt_schedule() with PREEMPT_ACTIVE-1 to fixup the already
available 1 in preempt_count.

This split adds different plausibility checks for entry code and kernel
calls and provides simpler adjusting of other platforms esp. ARM to the
new code.

Signed-off-by: Thomas Gleixner <[email protected]>

---
sched.c | 44 ++++++++++++++++++++++++++++++++++----------
entry.S | 3 ++-
2 files changed, 36 insertions(+), 11 deletions(-)
---
Index: 2.6.10-mm1/kernel/sched.c
===================================================================
--- 2.6.10-mm1/kernel/sched.c (revision 141)
+++ 2.6.10-mm1/kernel/sched.c (working copy)
@@ -2836,22 +2836,15 @@
* off of preempt_enable. Kernel preemptions off return from interrupt
* occur there and call schedule directly.
*/
-asmlinkage void __sched preempt_schedule(void)
+static void __sched do_preempt_schedule(int incr)
{
- struct thread_info *ti = current_thread_info();
#ifdef CONFIG_PREEMPT_BKL
struct task_struct *task = current;
int saved_lock_depth;
#endif
- /*
- * If there is a non-zero preempt_count or interrupts are disabled,
- * we do not want to preempt the current task. Just return..
- */
- if (unlikely(ti->preempt_count || irqs_disabled()))
- return;

need_resched:
- add_preempt_count(PREEMPT_ACTIVE);
+ add_preempt_count(incr);
/*
* We keep the big kernel semaphore locked, but we
* clear ->lock_depth so that schedule() doesnt
@@ -2865,15 +2858,46 @@
#ifdef CONFIG_PREEMPT_BKL
task->lock_depth = saved_lock_depth;
#endif
- sub_preempt_count(PREEMPT_ACTIVE);
+ sub_preempt_count(incr);

/* we could miss a preemption opportunity between schedule and now */
barrier();
if (unlikely(test_thread_flag(TIF_NEED_RESCHED)))
goto need_resched;
}
+/*
+ * this is is the entry point to schedule() from in-kernel preemption
+ * off of preempt_enable. Kernel preemptions off return from interrupt
+ * occur there and call schedule directly.
+ */
+asmlinkage void __sched preempt_schedule(void)
+{
+ struct thread_info *ti = current_thread_info();
+ /*
+ * If there is a non-zero preempt_count or interrupts are disabled,
+ * we do not want to preempt the current task. Just return..
+ */
+ if (unlikely(ti->preempt_count || irqs_disabled()))
+ return;
+ do_preempt_schedule(PREEMPT_ACTIVE);
+}

EXPORT_SYMBOL(preempt_schedule);
+
+asmlinkage void __sched entry_preempt_schedule(void)
+{
+ struct thread_info *ti = current_thread_info();
+ /*
+ * If preempt_count != 1 or interrupts are disabled
+ * the calling code is broken.
+ */
+ BUG_ON((ti->preempt_count != 1 || irqs_disabled()));
+
+ do_preempt_schedule(PREEMPT_ACTIVE - 1);
+}
+
+EXPORT_SYMBOL(entry_preempt_schedule);
+
#endif /* CONFIG_PREEMPT */

int default_wake_function(wait_queue_t *curr, unsigned mode, int sync, void *key)
Index: 2.6.10-mm1/arch/i386/kernel/entry.S
===================================================================
--- 2.6.10-mm1/arch/i386/kernel/entry.S (revision 141)
+++ 2.6.10-mm1/arch/i386/kernel/entry.S (working copy)
@@ -197,8 +197,9 @@
jz restore_all
testl $IF_MASK,EFLAGS(%esp) # interrupts off (exception path) ?
jz restore_all
+ movl $1,TI_preempt_count(%ebp)
sti
- call preempt_schedule
+ call entry_preempt_schedule
cli
movl $0,TI_preempt_count(%ebp)
jmp need_resched


2005-01-10 00:53:32

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 2.6.10-mm2] Use the new preemption code [2/3]

This patch adjusts the ARM entry code to use the fixed up
preempt_schedule() handling in 2.6.10-mm2

Signed-off-by: Benedikt Spranger <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>

---
entry-armv.S | 4 +---
1 files changed, 1 insertion(+), 3 deletions(-)
---
Index: 2.6.10-mm1/arch/ppc/kernel/entry.S
===================================================================
--- 2.6.10-mm1/arch/ppc/kernel/entry.S (revision 141)
+++ 2.6.10-mm1/arch/ppc/kernel/entry.S (working copy)
@@ -624,12 +624,12 @@
beq+ restore
andi. r0,r3,MSR_EE /* interrupts off? */
beq restore /* don't schedule if so */
-1: lis r0,PREEMPT_ACTIVE@h
+1: li r0,1
stw r0,TI_PREEMPT(r9)
ori r10,r10,MSR_EE
SYNC
MTMSRD(r10) /* hard-enable interrupts */
- bl schedule
+ bl entry_preempt_schedule
LOAD_MSR_KERNEL(r10,MSR_KERNEL)
SYNC
MTMSRD(r10) /* disable interrupts */


2005-01-10 01:00:11

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 2.6.10-mm2] Use the new preemption code [3/3]

This patch adjusts the PPC entry code to use the fixed up
preempt_schedule() handling in 2.6.10-mm2

Signed-off-by: Thomas Gleixner <[email protected]>

---
entry.S | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)
---
Index: 2.6.10-mm1/arch/ppc/kernel/entry.S
===================================================================
--- 2.6.10-mm1/arch/ppc/kernel/entry.S (revision 141)
+++ 2.6.10-mm1/arch/ppc/kernel/entry.S (working copy)
@@ -624,12 +624,12 @@
beq+ restore
andi. r0,r3,MSR_EE /* interrupts off? */
beq restore /* don't schedule if so */
-1: lis r0,PREEMPT_ACTIVE@h
+1: li r0,1
stw r0,TI_PREEMPT(r9)
ori r10,r10,MSR_EE
SYNC
MTMSRD(r10) /* hard-enable interrupts */
- bl schedule
+ bl entry_preempt_schedule
LOAD_MSR_KERNEL(r10,MSR_KERNEL)
SYNC
MTMSRD(r10) /* disable interrupts */


2005-01-10 01:06:22

by Russell King

[permalink] [raw]
Subject: Re: [PATCH 2.6.10-mm2] Use the new preemption code [2/3]

On Mon, Jan 10, 2005 at 01:53:26AM +0100, Thomas Gleixner wrote:
> This patch adjusts the ARM entry code to use the fixed up
> preempt_schedule() handling in 2.6.10-mm2

Looks PPCish to me.

> Signed-off-by: Benedikt Spranger <[email protected]>
> Signed-off-by: Thomas Gleixner <[email protected]>
>
> ---
> entry-armv.S | 4 +---
> 1 files changed, 1 insertion(+), 3 deletions(-)
> ---
> Index: 2.6.10-mm1/arch/ppc/kernel/entry.S
> ===================================================================
> --- 2.6.10-mm1/arch/ppc/kernel/entry.S (revision 141)
> +++ 2.6.10-mm1/arch/ppc/kernel/entry.S (working copy)
> @@ -624,12 +624,12 @@
> beq+ restore
> andi. r0,r3,MSR_EE /* interrupts off? */
> beq restore /* don't schedule if so */
> -1: lis r0,PREEMPT_ACTIVE@h
> +1: li r0,1
> stw r0,TI_PREEMPT(r9)
> ori r10,r10,MSR_EE
> SYNC
> MTMSRD(r10) /* hard-enable interrupts */
> - bl schedule
> + bl entry_preempt_schedule
> LOAD_MSR_KERNEL(r10,MSR_KERNEL)
> SYNC
> MTMSRD(r10) /* disable interrupts */
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: 2.6 PCMCIA - http://pcmcia.arm.linux.org.uk/
2.6 Serial core

2005-01-10 01:18:41

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 2.6.10-mm2] Use the new preemption code [2/3] Resend

On Mon, 2005-01-10 at 01:06 +0000, Russell King wrote:
> On Mon, Jan 10, 2005 at 01:53:26AM +0100, Thomas Gleixner wrote:
> > This patch adjusts the ARM entry code to use the fixed up
> > preempt_schedule() handling in 2.6.10-mm2
>
> Looks PPCish to me.

Sorry I messed that up. Call me the idiot of today.

This patch adjusts the ARM entry code to use the fixed up
preempt_schedule() handling in 2.6.10-mm2

Signed-off-by: Benedikt Spranger <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>

---
entry-armv.S | 4 +---
1 files changed, 1 insertion(+), 3 deletions(-)
---
Index: 2.6.10-mm1/arch/arm/kernel/entry.S
===================================================================
--- 2.6.10-mm1/arch/arm/kernel/entry.S (revision 141)
+++ 2.6.10-mm1/arch/arm/kernel/entry.S (working copy)
@@ -136,10 +136,8 @@
ldr r1, [r6, #8] @ local_bh_count
adds r0, r0, r1
movne pc, lr
- mov r7, #PREEMPT_ACTIVE
- str r7, [r8, #TI_PREEMPT] @ set PREEMPT_ACTIVE
1: enable_irq r2 @ enable IRQs
- bl schedule
+ bl entry_preempt_schedule
disable_irq r0 @ disable IRQs
ldr r0, [r8, #TI_FLAGS] @ get new tasks TI_FLAGS
tst r0, #_TIF_NEED_RESCHED


2005-01-10 07:34:23

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 2.6.10-mm2] Use the new preemption code [2/3] Resend

Thomas Gleixner <[email protected]> wrote:
>
> This patch adjusts the ARM entry code to use the fixed up
> preempt_schedule() handling in 2.6.10-mm2
>
> ...
> Index: 2.6.10-mm1/arch/arm/kernel/entry.S

There's no such file. I assumed you meant entry-armv.S and ended up with
the below.

--- 25/arch/arm/kernel/entry-armv.S~use-the-new-preemption-code-arm 2005-01-09 23:30:34.794573320 -0800
+++ 25-akpm/arch/arm/kernel/entry-armv.S 2005-01-09 23:30:34.797572864 -0800
@@ -136,10 +136,8 @@ svc_preempt: teq r9, #0 @ was preempt
ldr r1, [r6, #8] @ local_bh_count
adds r0, r0, r1
movne pc, lr
- mov r7, #PREEMPT_ACTIVE
- str r7, [r8, #TI_PREEMPT] @ set PREEMPT_ACTIVE
1: enable_irq r2 @ enable IRQs
- bl schedule
+ bl entry_preempt_schedule
disable_irq r0 @ disable IRQs
ldr r0, [r8, #TI_FLAGS] @ get new tasks TI_FLAGS
tst r0, #_TIF_NEED_RESCHED
_

2005-01-10 09:16:18

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 2.6.10-mm2] Fix preemption race [1/3] (Core)


* [email protected] <[email protected]> wrote:

> The idle-thread-preemption-fix.patch introduced a race, which is not
> critical, but might give us an extra turn through the scheduler. When
> interrupts are reenabled in entry.c and an interrupt occures before we
> reach the add_preempt_schedule() in preempt_schedule we get
> rescheduled again in the return from interrupt path.

i agree that there's a race. I solved this in the -RT patchset a couple
of weeks ago, but in a different wasy. I introduced the
preempt_schedule_irq() function and this solves the problem via keeping
the whole IRQ-preemption path irqs-off. This has the advantage that if
an IRQ signals preemption of a task and the kernel is immediately
preemptable then we are able to hit that task atomically without
re-enabling IRQs again. I'll split out this patch - can you see any
problems with the preempt_schedule_irq() approach?

Ingo

2005-01-10 09:46:31

by Russell King

[permalink] [raw]
Subject: Re: [PATCH 2.6.10-mm2] Use the new preemption code [2/3] Resend

On Mon, Jan 10, 2005 at 02:18:35AM +0100, Thomas Gleixner wrote:
> On Mon, 2005-01-10 at 01:06 +0000, Russell King wrote:
> > On Mon, Jan 10, 2005 at 01:53:26AM +0100, Thomas Gleixner wrote:
> > > This patch adjusts the ARM entry code to use the fixed up
> > > preempt_schedule() handling in 2.6.10-mm2
> >
> > Looks PPCish to me.
>
> Sorry I messed that up. Call me the idiot of today.
>
> This patch adjusts the ARM entry code to use the fixed up
> preempt_schedule() handling in 2.6.10-mm2

Are you sure ARM suffers from this race condition? It sets preempt count
before enabling IRQs and doesn't use preempt_schedule().

--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: 2.6 PCMCIA - http://pcmcia.arm.linux.org.uk/
2.6 Serial core

2005-01-10 10:13:13

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 2.6.10-mm2] Use the new preemption code [2/3] Resend

On Mon, 2005-01-10 at 10:46, Russell King wrote:
> On Mon, Jan 10, 2005 at 02:18:35AM +0100, Thomas Gleixner wrote:
> > On Mon, 2005-01-10 at 01:06 +0000, Russell King wrote:
> > > On Mon, Jan 10, 2005 at 01:53:26AM +0100, Thomas Gleixner wrote:
> > > > This patch adjusts the ARM entry code to use the fixed up
> > > > preempt_schedule() handling in 2.6.10-mm2
> > >
> > > Looks PPCish to me.
> >
> > Sorry I messed that up. Call me the idiot of today.
> >
> > This patch adjusts the ARM entry code to use the fixed up
> > preempt_schedule() handling in 2.6.10-mm2
>
> Are you sure ARM suffers from this race condition? It sets preempt count
> before enabling IRQs and doesn't use preempt_schedule().

There is no race for arm, but using the preempt_schedule() interface is
the approach which Ingo suggested for common usage, but his x86
implementation was racy, so I fixed this first before modifying arm to
use the interface. Ingo pointed out that he will change it to
preempt_schedule_irq, but I'm not religious about the name.

tglx


2005-01-10 10:14:50

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 2.6.10-mm2] Fix preemption race [1/3] (Core)

On Mon, 2005-01-10 at 10:15, Ingo Molnar wrote:
> * [email protected] <[email protected]> wrote:
>
> > The idle-thread-preemption-fix.patch introduced a race, which is not
> > critical, but might give us an extra turn through the scheduler. When
> > interrupts are reenabled in entry.c and an interrupt occures before we
> > reach the add_preempt_schedule() in preempt_schedule we get
> > rescheduled again in the return from interrupt path.
>
> i agree that there's a race. I solved this in the -RT patchset a couple
> of weeks ago, but in a different wasy. I introduced the
> preempt_schedule_irq() function and this solves the problem via keeping
> the whole IRQ-preemption path irqs-off. This has the advantage that if
> an IRQ signals preemption of a task and the kernel is immediately
> preemptable then we are able to hit that task atomically without
> re-enabling IRQs again. I'll split out this patch - can you see any
> problems with the preempt_schedule_irq() approach?

No.
I did not look into your RT patch for this, but please have a look at
RMK's ARM code, as he is doing some sanity check on
thread_info->preemption down there.

tglx


2005-01-10 10:57:56

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 2.6.10-mm2] Use the new preemption code [2/3] Resend

On Mon, 2005-01-10 at 08:32, Andrew Morton wrote:
> Thomas Gleixner <[email protected]> wrote:
> >
> > This patch adjusts the ARM entry code to use the fixed up
> > preempt_schedule() handling in 2.6.10-mm2
> >
> > ...
> > Index: 2.6.10-mm1/arch/arm/kernel/entry.S
>
> There's no such file. I assumed you meant entry-armv.S and ended up with
> the below.

Oops. I messed that up twice. Sure it was entry-armv.S
Sorry. I'm just a perfect example for Murphy's law.

tglx

> --- 25/arch/arm/kernel/entry-armv.S~use-the-new-preemption-code-arm 2005-01-09 23:30:34.794573320 -0800
> +++ 25-akpm/arch/arm/kernel/entry-armv.S 2005-01-09 23:30:34.797572864 -0800
> @@ -136,10 +136,8 @@ svc_preempt: teq r9, #0 @ was preempt
> ldr r1, [r6, #8] @ local_bh_count
> adds r0, r0, r1
> movne pc, lr
> - mov r7, #PREEMPT_ACTIVE
> - str r7, [r8, #TI_PREEMPT] @ set PREEMPT_ACTIVE
> 1: enable_irq r2 @ enable IRQs
> - bl schedule
> + bl entry_preempt_schedule
> disable_irq r0 @ disable IRQs
> ldr r0, [r8, #TI_FLAGS] @ get new tasks TI_FLAGS
> tst r0, #_TIF_NEED_RESCHED
> _
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2005-01-10 11:03:07

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 2.6.10-mm2] Use the new preemption code [2/3] Resend


* Thomas Gleixner <[email protected]> wrote:

> > Are you sure ARM suffers from this race condition? It sets preempt count
> > before enabling IRQs and doesn't use preempt_schedule().
>
> There is no race for arm, but using the preempt_schedule() interface
> is the approach which Ingo suggested for common usage, but his x86
> implementation was racy, so I fixed this first before modifying arm to
> use the interface. Ingo pointed out that he will change it to
> preempt_schedule_irq, but I'm not religious about the name.

i wouldnt raise this issue if it was the name only, but there's more to
preempt_schedule_irq() than its name: it gets called with irqs off and
the scheduler returns with irqs off and with a guarantee that there is
no (irq-generated) pending preemption request for this task right now.
I.e. the checks for need_resched can be skipped, and interrupts dont
have to be disabled to do a safe return-to-usermode (as done on some
architectures).

as far as i can see do_preempt_schedule() doesnt have these properties:
what it guarantees is that it avoids the preemption recursion via the
lowlevel code doing the PREEMPT_ACTIVE setting.

lets agree upon a single, common approach. I went for splitting up
preempt_schedule() into two variants: the 'synchronous' one (called
preempt_schedule()) is only called from syscall level and has no
repeat-preemption and hence stack-recursion worries. The 'asynchronous'
one (called preempt_schedule_irq()) is called from asynchronous contexts
(hardirq events) and is fully ready to deal with all the reentrancy
situations that may occur. It's careful about not re-enabling
interrupts, etc.

Ingo

2005-01-10 13:46:31

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 2.6.10-mm2] Use the new preemption code [2/3] Resend

On Mon, 2005-01-10 at 12:02, Ingo Molnar wrote:
> i wouldnt raise this issue if it was the name only, but there's more to
> preempt_schedule_irq() than its name: it gets called with irqs off and
> the scheduler returns with irqs off and with a guarantee that there is
> no (irq-generated) pending preemption request for this task right now.
> I.e. the checks for need_resched can be skipped, and interrupts dont
> have to be disabled to do a safe return-to-usermode (as done on some
> architectures).
>
> as far as i can see do_preempt_schedule() doesnt have these properties:
> what it guarantees is that it avoids the preemption recursion via the
> lowlevel code doing the PREEMPT_ACTIVE setting.
>
> lets agree upon a single, common approach. I went for splitting up
> preempt_schedule() into two variants: the 'synchronous' one (called
> preempt_schedule()) is only called from syscall level and has no
> repeat-preemption and hence stack-recursion worries. The 'asynchronous'
> one (called preempt_schedule_irq()) is called from asynchronous contexts
> (hardirq events) and is fully ready to deal with all the reentrancy
> situations that may occur. It's careful about not re-enabling
> interrupts, etc.

Sure, I guessed that from your short description that it implies more
than the seperation I have done. I have no objection against your
approach at all.

tglx


2005-01-10 14:56:31

by Tom Rini

[permalink] [raw]
Subject: Re: [PATCH 2.6.10-mm2] Use the new preemption code [3/3]

On Mon, Jan 10, 2005 at 02:00:04AM +0100, Thomas Gleixner wrote:

> This patch adjusts the PPC entry code to use the fixed up
> preempt_schedule() handling in 2.6.10-mm2
>
> Signed-off-by: Thomas Gleixner <[email protected]>
>
> ---
> entry.S | 4 ++--
> 1 files changed, 2 insertions(+), 2 deletions(-)
> ---
> Index: 2.6.10-mm1/arch/ppc/kernel/entry.S
> ===================================================================
> --- 2.6.10-mm1/arch/ppc/kernel/entry.S (revision 141)
> +++ 2.6.10-mm1/arch/ppc/kernel/entry.S (working copy)
> @@ -624,12 +624,12 @@
> beq+ restore
> andi. r0,r3,MSR_EE /* interrupts off? */
> beq restore /* don't schedule if so */
> -1: lis r0,PREEMPT_ACTIVE@h
> +1: li r0,1

Perhaps I just don't have enough context, but is there good reason to
use a magic constant instead of a define ?

--
Tom Rini
http://gate.crashing.org/~trini/

2005-01-10 15:51:15

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 2.6.10-mm2] Use the new preemption code [3/3]

> On Mon, Jan 10, 2005 at 02:00:04AM +0100, Thomas Gleixner wrote:
>
>> This patch adjusts the PPC entry code to use the fixed up
>> preempt_schedule() handling in 2.6.10-mm2
>>
>> Signed-off-by: Thomas Gleixner <[email protected]>
>>
>> ---
>> entry.S | 4 ++--
>> 1 files changed, 2 insertions(+), 2 deletions(-)
>> ---
>> Index: 2.6.10-mm1/arch/ppc/kernel/entry.S
>> ===================================================================
>> --- 2.6.10-mm1/arch/ppc/kernel/entry.S (revision 141)
>> +++ 2.6.10-mm1/arch/ppc/kernel/entry.S (working copy)
>> @@ -624,12 +624,12 @@
>> beq+ restore
>> andi. r0,r3,MSR_EE /* interrupts off? */
>> beq restore /* don't schedule if so */
>> -1: lis r0,PREEMPT_ACTIVE@h
>> +1: li r0,1
>
> Perhaps I just don't have enough context, but is there good reason to
> use a magic constant instead of a define ?
>

True. I will wait for Ingo's final solution and fix this proper.

tglx