2008-03-11 01:24:49

by Andi Kleen

[permalink] [raw]
Subject: [PATCH REPOST] Run IST traps from user mode preemptive on process stack

Run IST traps from user mode preemptive on process stack

[Repost. Please ack or nack. -Andi]

x86-64 has a few exceptions which run on special architecture
supported IST exception stacks: these are nmi, double fault, stack fault,
int 3, debug.

Previously they would check for a scheduling event on returning
if the original CPU state was user mode and then switch to a
process stack to schedule.

But the actual trap handler would still run on the IST stack
with preemption disabled.

This patch changes these traps instead to always switch
to the process stack when the trap originated from user mode.
For kernel traps it keeps running non preemptive on the IST stack
because that is much safer (e.g. to still get nmi watchdog events
out even when the process stack is corrupted)

Then the actual trap handlers can run with preemption enabled
or schedule as needed (e.g. to take locks)

This fixes a regression I added earlier with print_vma_addr()
executing down() in these trap handlers from user space.

Strictly the change would have been only needed for debug
and int3, but since they share this code as macros it was
cleanest to just change all.

Cc: [email protected]
Cc: [email protected]

Signed-off-by: Andi Kleen <[email protected]>

---
arch/x86/kernel/entry_64.S | 17 ++++++++++-------
arch/x86/kernel/traps_64.c | 17 ++++++++++-------
2 files changed, 20 insertions(+), 14 deletions(-)

Index: linux/arch/x86/kernel/entry_64.S
===================================================================
--- linux.orig/arch/x86/kernel/entry_64.S
+++ linux/arch/x86/kernel/entry_64.S
@@ -770,12 +770,18 @@ END(spurious_interrupt)
.if \ist
movq %gs:pda_data_offset, %rbp
.endif
- movq %rsp,%rdi
movq ORIG_RAX(%rsp),%rsi
movq $-1,ORIG_RAX(%rsp)
.if \ist
subq $EXCEPTION_STKSZ, per_cpu__init_tss + TSS_ist + (\ist - 1) * 8(%rbp)
.endif
+ testl $3,CS(%rsp)
+ jz 2f /* in kernel? stay on exception stack for now */
+ movq %rsp,%rdi
+ call sync_regs /* Move all state over to process stack */
+ movq %rax,%rsp /* switch stack to process stack */
+2:
+ movq %rsp,%rdi
call \sym
.if \ist
addq $EXCEPTION_STKSZ, per_cpu__init_tss + TSS_ist + (\ist - 1) * 8(%rbp)
@@ -801,16 +807,16 @@ END(spurious_interrupt)
.macro paranoidexit trace=1
/* ebx: no swapgs flag */
paranoid_exit\trace:
- testl %ebx,%ebx /* swapgs needed? */
- jnz paranoid_restore\trace
testl $3,CS(%rsp)
jnz paranoid_userspace\trace
paranoid_swapgs\trace:
+ testl %ebx,%ebx /* swapgs needed? */
+ jnz 1f
.if \trace
TRACE_IRQS_IRETQ 0
.endif
SWAPGS_UNSAFE_STACK
-paranoid_restore\trace:
+1:
RESTORE_ALL 8
jmp irq_return
paranoid_userspace\trace:
@@ -818,9 +824,6 @@ paranoid_userspace\trace:
movl threadinfo_flags(%rcx),%ebx
andl $_TIF_WORK_MASK,%ebx
jz paranoid_swapgs\trace
- movq %rsp,%rdi /* &pt_regs */
- call sync_regs
- movq %rax,%rsp /* switch stack for scheduling */
testl $_TIF_NEED_RESCHED,%ebx
jnz paranoid_schedule\trace
movl %ebx,%edx /* arg3: thread flags */
Index: linux/arch/x86/kernel/traps_64.c
===================================================================
--- linux.orig/arch/x86/kernel/traps_64.c
+++ linux/arch/x86/kernel/traps_64.c
@@ -84,7 +84,8 @@ static inline void conditional_sti(struc

static inline void preempt_conditional_sti(struct pt_regs *regs)
{
- inc_preempt_count();
+ if (!user_mode(regs))
+ inc_preempt_count();
if (regs->flags & X86_EFLAGS_IF)
local_irq_enable();
}
@@ -95,7 +96,8 @@ static inline void preempt_conditional_c
local_irq_disable();
/* Make sure to not schedule here because we could be running
on an exception stack. */
- dec_preempt_count();
+ if (!user_mode(regs))
+ dec_preempt_count();
}

int kstack_depth_to_print = 12;
@@ -855,7 +857,7 @@ asmlinkage __kprobes void default_do_nmi
io_check_error(reason, regs);
}

-/* runs on IST stack. */
+/* May run on IST stack. */
asmlinkage void __kprobes do_int3(struct pt_regs * regs, long error_code)
{
trace_hardirqs_fixup();
@@ -868,9 +870,10 @@ asmlinkage void __kprobes do_int3(struct
preempt_conditional_cli(regs);
}

-/* Help handler running on IST stack to switch back to user stack
- for scheduling or signal handling. The actual stack switch is done in
- entry.S */
+/*
+ * Help handler running on IST stack to switch back to user stack.
+ * The actual stack switch is done in entry.S
+ */
asmlinkage __kprobes struct pt_regs *sync_regs(struct pt_regs *eregs)
{
struct pt_regs *regs = eregs;
@@ -889,7 +892,7 @@ asmlinkage __kprobes struct pt_regs *syn
return regs;
}

-/* runs on IST stack. */
+/* May run on IST stack. */
asmlinkage void __kprobes do_debug(struct pt_regs * regs,
unsigned long error_code)
{


2008-03-11 12:03:56

by Zdenek Kabelac

[permalink] [raw]
Subject: Re: [PATCH REPOST] Run IST traps from user mode preemptive on process stack

2008/3/11, Andi Kleen <[email protected]>:
> Run IST traps from user mode preemptive on process stack
>
> [Repost. Please ack or nack. -Andi]
>
> x86-64 has a few exceptions which run on special architecture
> supported IST exception stacks: these are nmi, double fault, stack fault,
> int 3, debug.
>
> Previously they would check for a scheduling event on returning
> if the original CPU state was user mode and then switch to a
> process stack to schedule.
>
> But the actual trap handler would still run on the IST stack
> with preemption disabled.
>

Hi Andi

In my post where I've been showing BUG trace:

http://lkml.org/lkml/2008/3/10/157

I've mentioned my problem with busy looping in Qemu.

I'm not sure whether this is good or bad sign - but with this patch
you have posted,
I do not have to wait in qemu for minutes to get the 'busy-loop' - I
get the exact same loop almost instantly when I start disk read of LV
partition and running my simple module insertion testcase.

> This patch changes these traps instead to always switch
> to the process stack when the trap originated from user mode.
> For kernel traps it keeps running non preemptive on the IST stack
> because that is much safer (e.g. to still get nmi watchdog events
> out even when the process stack is corrupted)
>

So what should I check to get fixed my problem in qemu ?

Zdenek

2008-03-11 13:24:32

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH REPOST] Run IST traps from user mode preemptive on process stack

> I'm not sure whether this is good or bad sign - but with this patch
> you have posted,
> I do not have to wait in qemu for minutes to get the 'busy-loop' - I
> get the exact same loop almost instantly when I start disk read of LV
> partition and running my simple module insertion testcase.

Hmm, my fix was intended to fix a gdb problem. Or rather the gdb
problem was already fixed by disabling some functionality and with
this patch the functionality would be reenabled again.

If it fixed a qemu problem too that's great but was unintended
on my part. In fact it is a little worrying because there should
be no user visible change. Can you double check by unapply/test/reapply/test
the patch really makes a difference?

>
> > This patch changes these traps instead to always switch
> > to the process stack when the trap originated from user mode.
> > For kernel traps it keeps running non preemptive on the IST stack
> > because that is much safer (e.g. to still get nmi watchdog events
> > out even when the process stack is corrupted)
> >
>
> So what should I check to get fixed my problem in qemu ?

I don't know. Someone has to debug it.

-Andi

2008-03-11 13:58:42

by Zdenek Kabelac

[permalink] [raw]
Subject: Re: [PATCH REPOST] Run IST traps from user mode preemptive on process stack

2008/3/11, Andi Kleen <[email protected]>:
> > I'm not sure whether this is good or bad sign - but with this patch
> > you have posted,
> > I do not have to wait in qemu for minutes to get the 'busy-loop' - I
> > get the exact same loop almost instantly when I start disk read of LV
> > partition and running my simple module insertion testcase.
>
>
> Hmm, my fix was intended to fix a gdb problem. Or rather the gdb
> problem was already fixed by disabling some functionality and with
> this patch the functionality would be reenabled again.
>
> If it fixed a qemu problem too that's great but was unintended
> on my part. In fact it is a little worrying because there should
> be no user visible change. Can you double check by unapply/test/reapply/test
> the patch really makes a difference?

I've not said it has 'fixed' my problem - I said it has exposed the
problem much faster.
But now when I've made a double test - I think I was wrong - I've been
probably judging too fast and its also probably depending on the
sunlight position, but now I can see that even with recent git kernel
without your patch I get the busy loop pretty quick :(.

> > > This patch changes these traps instead to always switch
> > > to the process stack when the trap originated from user mode.
> > > For kernel traps it keeps running non preemptive on the IST stack
> > > because that is much safer (e.g. to still get nmi watchdog events
> > > out even when the process stack is corrupted)
> > >
> >
> > So what should I check to get fixed my problem in qemu ?
>
>
> I don't know. Someone has to debug it.

But who :) ?

Zdenek