2003-08-29 20:56:48

by Hugh Dickins

[permalink] [raw]
Subject: [PATCH] 4G/4G preempt on vstack

Repeated -j3 kernel builds, run in tandem on dual PIII, have been
collapsing recently on -mm with 4G/4G split, SMP and preemption.
Typically 'make' fails with Error 139 because 'as' or another
got SIGSEGV; maybe within ten minutes, maybe after ten hours.

This patch seems to fix that (ran successfully overnight on test4-mm1,
will run over the weekend on test4-mm3-1). Please cast a critical eye
over it, I expect Ingo or someone else will find it can be improved.

The problem is that a task may be preempted just after it has entered
kernelspace, while using the transitional "virtual stack" i.e. %esp
pointing to high per-cpu kmap of the kernel stack. If the task resumes
on another cpu, that %esp needs to be repointed into the new cpu's kmap.

The corresponding returns to userspace look okay to me: interrupts are
disabled over the critical points. And in general no copy is taken of
%esp while on the virtual stack e.g. setting pointer to pt_regs is and
must be done after switching to real stack. But there's one place in
__SWITCH_KERNELSPACE itself where we need to check and repeat if moved.

Hugh

--- 2.6.0-test4-mm3-1/arch/i386/kernel/entry.S Fri Aug 29 16:31:30 2003
+++ linux/arch/i386/kernel/entry.S Fri Aug 29 20:53:33 2003
@@ -103,6 +103,20 @@

#ifdef CONFIG_X86_SWITCH_PAGETABLES

+#if defined(CONFIG_PREEMPT) && defined(CONFIG_SMP)
+/*
+ * If task is preempted in __SWITCH_KERNELSPACE, and moved to another cpu,
+ * __switch_to repoints %esp to the appropriate virtual stack; but %ebp is
+ * left stale, so we must check whether to repeat the real stack calculation.
+ */
+#define repeat_if_esp_changed \
+ xorl %esp, %ebp; \
+ testl $0xffffe000, %ebp; \
+ jnz 0b
+#else
+#define repeat_if_esp_changed
+#endif
+
/* clobbers ebx, edx and ebp */

#define __SWITCH_KERNELSPACE \
@@ -117,12 +131,13 @@
movl $swapper_pg_dir-__PAGE_OFFSET, %edx; \
\
/* GET_THREAD_INFO(%ebp) intermixed */ \
- \
+0: \
movl %esp, %ebp; \
movl %esp, %ebx; \
andl $0xffffe000, %ebp; \
andl $0x00001fff, %ebx; \
orl TI_real_stack(%ebp), %ebx; \
+ repeat_if_esp_changed; \
\
movl %edx, %cr3; \
movl %ebx, %esp; \
--- 2.6.0-test4-mm3-1/arch/i386/kernel/process.c Fri Aug 29 16:31:30 2003
+++ linux/arch/i386/kernel/process.c Fri Aug 29 20:53:33 2003
@@ -479,13 +479,27 @@
__kmap_atomic(next->stack_page1, KM_VSTACK1);

/*
- * Reload esp0:
- */
- /*
* NOTE: here we rely on the task being the stack as well
*/
next_p->thread_info->virtual_stack = (void *)__kmap_atomic_vaddr(KM_VSTACK0);
+
+#if defined(CONFIG_PREEMPT) && defined(CONFIG_SMP)
+ /*
+ * If next was preempted on entry from userspace to kernel,
+ * and now it's on a different cpu, we need to adjust %esp.
+ * This assumes that entry.S does not copy %esp while on the
+ * virtual stack (with interrupts enabled): which is so,
+ * except within __SWITCH_KERNELSPACE itself.
+ */
+ if (unlikely(next->esp >= TASK_SIZE)) {
+ next->esp &= THREAD_SIZE - 1;
+ next->esp |= (unsigned long) next_p->thread_info->virtual_stack;
+ }
+#endif
#endif
+ /*
+ * Reload esp0:
+ */
load_esp0(tss, virtual_esp0(next_p));

/*


2003-08-29 21:00:37

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] 4G/4G preempt on vstack


On Fri, 29 Aug 2003, Hugh Dickins wrote:

> This patch seems to fix that (ran successfully overnight on test4-mm1,
> will run over the weekend on test4-mm3-1). Please cast a critical eye
> over it, I expect Ingo or someone else will find it can be improved.

good catch - patch looks good.

Ingo

2003-08-29 21:53:14

by Mika Penttilä

[permalink] [raw]
Subject: Re: [PATCH] 4G/4G preempt on vstack

hmm. you make the window smaller, but can't the preempt happen even
inside repeat_if_esp_changed, after xorl? Disabling interrupts seems the
only solution.

---Mika


Hugh Dickins wrote:

>Repeated -j3 kernel builds, run in tandem on dual PIII, have been
>collapsing recently on -mm with 4G/4G split, SMP and preemption.
>Typically 'make' fails with Error 139 because 'as' or another
>got SIGSEGV; maybe within ten minutes, maybe after ten hours.
>
>This patch seems to fix that (ran successfully overnight on test4-mm1,
>will run over the weekend on test4-mm3-1). Please cast a critical eye
>over it, I expect Ingo or someone else will find it can be improved.
>
>The problem is that a task may be preempted just after it has entered
>kernelspace, while using the transitional "virtual stack" i.e. %esp
>pointing to high per-cpu kmap of the kernel stack. If the task resumes
>on another cpu, that %esp needs to be repointed into the new cpu's kmap.
>
>The corresponding returns to userspace look okay to me: interrupts are
>disabled over the critical points. And in general no copy is taken of
>%esp while on the virtual stack e.g. setting pointer to pt_regs is and
>must be done after switching to real stack. But there's one place in
>__SWITCH_KERNELSPACE itself where we need to check and repeat if moved.
>
>Hugh
>
>--- 2.6.0-test4-mm3-1/arch/i386/kernel/entry.S Fri Aug 29 16:31:30 2003
>+++ linux/arch/i386/kernel/entry.S Fri Aug 29 20:53:33 2003
>@@ -103,6 +103,20 @@
>
> #ifdef CONFIG_X86_SWITCH_PAGETABLES
>
>+#if defined(CONFIG_PREEMPT) && defined(CONFIG_SMP)
>+/*
>+ * If task is preempted in __SWITCH_KERNELSPACE, and moved to another cpu,
>+ * __switch_to repoints %esp to the appropriate virtual stack; but %ebp is
>+ * left stale, so we must check whether to repeat the real stack calculation.
>+ */
>+#define repeat_if_esp_changed \
>+ xorl %esp, %ebp; \
>+ testl $0xffffe000, %ebp; \
>+ jnz 0b
>+#else
>+#define repeat_if_esp_changed
>+#endif
>+
> /* clobbers ebx, edx and ebp */
>
> #define __SWITCH_KERNELSPACE \
>@@ -117,12 +131,13 @@
> movl $swapper_pg_dir-__PAGE_OFFSET, %edx; \
> \
> /* GET_THREAD_INFO(%ebp) intermixed */ \
>- \
>+0: \
> movl %esp, %ebp; \
> movl %esp, %ebx; \
> andl $0xffffe000, %ebp; \
> andl $0x00001fff, %ebx; \
> orl TI_real_stack(%ebp), %ebx; \
>+ repeat_if_esp_changed; \
> \
> movl %edx, %cr3; \
> movl %ebx, %esp; \
>--- 2.6.0-test4-mm3-1/arch/i386/kernel/process.c Fri Aug 29 16:31:30 2003
>+++ linux/arch/i386/kernel/process.c Fri Aug 29 20:53:33 2003
>@@ -479,13 +479,27 @@
> __kmap_atomic(next->stack_page1, KM_VSTACK1);
>
> /*
>- * Reload esp0:
>- */
>- /*
> * NOTE: here we rely on the task being the stack as well
> */
> next_p->thread_info->virtual_stack = (void *)__kmap_atomic_vaddr(KM_VSTACK0);
>+
>+#if defined(CONFIG_PREEMPT) && defined(CONFIG_SMP)
>+ /*
>+ * If next was preempted on entry from userspace to kernel,
>+ * and now it's on a different cpu, we need to adjust %esp.
>+ * This assumes that entry.S does not copy %esp while on the
>+ * virtual stack (with interrupts enabled): which is so,
>+ * except within __SWITCH_KERNELSPACE itself.
>+ */
>+ if (unlikely(next->esp >= TASK_SIZE)) {
>+ next->esp &= THREAD_SIZE - 1;
>+ next->esp |= (unsigned long) next_p->thread_info->virtual_stack;
>+ }
>+#endif
> #endif
>+ /*
>+ * Reload esp0:
>+ */
> load_esp0(tss, virtual_esp0(next_p));
>
> /*
>
>-
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>the body of a message to [email protected]
>More majordomo info at http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at http://www.tux.org/lkml/
>
>
>

2003-08-29 23:34:59

by Hugh Dickins

[permalink] [raw]
Subject: Re: [PATCH] 4G/4G preempt on vstack

On Sat, 30 Aug 2003, Mika Penttila wrote:
> hmm. you make the window smaller, but can't the preempt happen even
> inside repeat_if_esp_changed, after xorl? Disabling interrupts seems the
> only solution.

Your conclusion may very well be right.

Certainly the preempt can happen anywhere there, and I was going
to argue that it doesn't matter. All we need is to be looking up
TI_real_stack(%ebp) at a time when %ebp is sure to be pointing at
our task's virtual stack. That real stack will be valid whatever
cpu we're on, but we need to be sure we were looking through the
right window at the moment when we looked it up.

But my error is to say "the preempt": there could be more than one.
Then you can imagine that cpu and %esp at the end are the same as
cpu and %esp at the beginning; but in between, when looking up
TI_real_stack(%ebp), it was actually on another cpu and %ebp was
the wrong pointer. Very unlikely, but possible.

I was trying to avoid pushfl, cli, popfl, being ignorant of how
much overhead they add (into what I'm sure Ingo would like to keep
as fast a path as possible). Perhaps there's still a neat way they
can be avoided, perhaps I'm just being silly to try to avoid them.

Sorry, I'm sleepy, perhaps someone else will fix this up now?

Hugh