2015-07-08 19:24:15

by Andy Lutomirski

[permalink] [raw]
Subject: [RFC/PATCH 0/7] x86_32: Migrate to new entry/exit paths

This is a review version of the 32-bit asm-to-C migration. I think
it works, but it's not yet well enough tested. I'm a lot more
familiar with the 64-bit asm than the 32-bit asm.

Al is cc'd, because some of this partially reverts some of his old
changes.

The vm86 stuff especially needs much more careful testing. Brian,
since you're playing with vm86 now, can you take a look?

NB: Even if this code turns out to be perfect, patches 3 and 4 need
to be squashed together. Patch 3 is a hack to temporarily add an
assertion that the existing asm works the way I think it does. The
assertion doesn't apprear to fire but, again, I need to pound on it
harder.

Andy Lutomirski (7):
x86/entry/32: Remove 32-bit syscall audit optimizations
x86/entry/32: Fix an incorrect comment for work_notifysig_v86
[TEMPORARY] x86/entry/32: Sanity check for work_notifysig
x86/entry/32: Finish removing bogus kernel-mode check
x86/vm86: Teach handle_vm86_trap to return to 32bit mode directly
x86/entry/32: Use prepare_exit_to_usermode and syscall_return_slowpath
x86/entry: Remove do_notify_resume, syscall_trace_leave, and their TIF
masks

arch/x86/entry/common.c | 57 ----------------
arch/x86/entry/entry_32.S | 130 ++++---------------------------------
arch/x86/include/asm/ptrace.h | 1 -
arch/x86/include/asm/signal.h | 1 -
arch/x86/include/asm/thread_info.h | 16 -----
arch/x86/kernel/traps.c | 12 ++++
arch/x86/kernel/vm86_32.c | 8 +--
7 files changed, 26 insertions(+), 199 deletions(-)

--
2.4.3


2015-07-08 19:26:47

by Andy Lutomirski

[permalink] [raw]
Subject: [RFC/PATCH 1/7] x86/entry/32: Remove 32-bit syscall audit optimizations

The asm audit optimizations are ugly and obfuscate the code too
much. Remove them.

This will regress performance if syscall auditing is enabled on
32-bit kernels and sysenter is in use. If this becomes a problem,
interested parties are encouraged to implement the equivalent of the
64-bit opportunistic sysret optimization.

Alternatively, a case could be made that, on 32-bit kernels, a less
messy asm audit optimization could be done. 32-bit kernels don't have
the complicated partial register saving tricks that 64-bit kernels
have, so the sysenter post-syscall path could just call the audit
hooks directly. Any reimplementation of this ought to demonstrate
that it only calls the audit hook once per syscall, though, which does
not currently appear to be true. Someone would have to make the case
that doing so would be better than implementing opportunistic sysexit,
though.

Signed-off-by: Andy Lutomirski <[email protected]>
---
arch/x86/entry/entry_32.S | 48 ++---------------------------------------------
1 file changed, 2 insertions(+), 46 deletions(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index 21dc60a60b5f..90f9e7f6c15e 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -45,16 +45,6 @@
#include <asm/asm.h>
#include <asm/smap.h>

-/* Avoid __ASSEMBLER__'ifying <linux/audit.h> just for this. */
-#include <linux/elf-em.h>
-#define AUDIT_ARCH_I386 (EM_386|__AUDIT_ARCH_LE)
-#define __AUDIT_ARCH_LE 0x40000000
-
-#ifndef CONFIG_AUDITSYSCALL
-# define sysenter_audit syscall_trace_entry
-# define sysexit_audit syscall_exit_work
-#endif
-
.section .entry.text, "ax"

/*
@@ -339,7 +329,7 @@ sysenter_past_esp:
GET_THREAD_INFO(%ebp)

testl $_TIF_WORK_SYSCALL_ENTRY, TI_flags(%ebp)
- jnz sysenter_audit
+ jnz syscall_trace_entry
sysenter_do_call:
cmpl $(NR_syscalls), %eax
jae sysenter_badsys
@@ -351,7 +341,7 @@ sysenter_after_call:
TRACE_IRQS_OFF
movl TI_flags(%ebp), %ecx
testl $_TIF_ALLWORK_MASK, %ecx
- jnz sysexit_audit
+ jnz syscall_exit_work
sysenter_exit:
/* if something modifies registers it must also disable sysexit */
movl PT_EIP(%esp), %edx
@@ -362,40 +352,6 @@ sysenter_exit:
PTGS_TO_GS
ENABLE_INTERRUPTS_SYSEXIT

-#ifdef CONFIG_AUDITSYSCALL
-sysenter_audit:
- testl $(_TIF_WORK_SYSCALL_ENTRY & ~_TIF_SYSCALL_AUDIT), TI_flags(%ebp)
- jnz syscall_trace_entry
- /* movl PT_EAX(%esp), %eax already set, syscall number: 1st arg to audit */
- movl PT_EBX(%esp), %edx /* ebx/a0: 2nd arg to audit */
- /* movl PT_ECX(%esp), %ecx already set, a1: 3nd arg to audit */
- pushl PT_ESI(%esp) /* a3: 5th arg */
- pushl PT_EDX+4(%esp) /* a2: 4th arg */
- call __audit_syscall_entry
- popl %ecx /* get that remapped edx off the stack */
- popl %ecx /* get that remapped esi off the stack */
- movl PT_EAX(%esp), %eax /* reload syscall number */
- jmp sysenter_do_call
-
-sysexit_audit:
- testl $(_TIF_ALLWORK_MASK & ~_TIF_SYSCALL_AUDIT), %ecx
- jnz syscall_exit_work
- TRACE_IRQS_ON
- ENABLE_INTERRUPTS(CLBR_ANY)
- movl %eax, %edx /* second arg, syscall return value */
- cmpl $-MAX_ERRNO, %eax /* is it an error ? */
- setbe %al /* 1 if so, 0 if not */
- movzbl %al, %eax /* zero-extend that */
- call __audit_syscall_exit
- DISABLE_INTERRUPTS(CLBR_ANY)
- TRACE_IRQS_OFF
- movl TI_flags(%ebp), %ecx
- testl $(_TIF_ALLWORK_MASK & ~_TIF_SYSCALL_AUDIT), %ecx
- jnz syscall_exit_work
- movl PT_EAX(%esp), %eax /* reload syscall return value */
- jmp sysenter_exit
-#endif
-
.pushsection .fixup, "ax"
2: movl $0, PT_FS(%esp)
jmp 1b
--
2.4.3

2015-07-08 19:24:31

by Andy Lutomirski

[permalink] [raw]
Subject: [RFC/PATCH 2/7] x86/entry/32: Fix an incorrect comment for work_notifysig_v86

This code path is for returns to user mode only.

Signed-off-by: Andy Lutomirski <[email protected]>
---
arch/x86/entry/entry_32.S | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index 90f9e7f6c15e..a36d5df6a749 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -484,8 +484,7 @@ work_notifysig: # deal with pending signals and
#ifdef CONFIG_VM86
testl $X86_EFLAGS_VM, PT_EFLAGS(%esp)
movl %esp, %eax
- jnz work_notifysig_v86 # returning to kernel-space or
- # vm86-space
+ jnz work_notifysig_v86 # special case for v86
1:
#else
movl %esp, %eax
--
2.4.3

2015-07-08 19:25:18

by Andy Lutomirski

[permalink] [raw]
Subject: [RFC/PATCH 3/7] [TEMPORARY] x86/entry/32: Sanity check for work_notifysig

44fbbb3dc687c added an unnecessary check. Add a temporary assertion
to confirm that it's unnecessary.

Signed-off-by: Andy Lutomirski <[email protected]>
---
arch/x86/entry/entry_32.S | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index a36d5df6a749..d36afad80ad1 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -491,10 +491,13 @@ work_notifysig: # deal with pending signals and
#endif
TRACE_IRQS_ON
ENABLE_INTERRUPTS(CLBR_NONE)
+ /* Temporary sanity check */
movb PT_CS(%esp), %bl
andb $SEGMENT_RPL_MASK, %bl
cmpb $USER_RPL, %bl
- jb resume_kernel
+ jnb 2f
+ ud2
+2:
xorl %edx, %edx
call do_notify_resume
jmp resume_userspace
--
2.4.3

2015-07-08 19:25:13

by Andy Lutomirski

[permalink] [raw]
Subject: [RFC/PATCH 4/7] x86/entry/32: Finish removing bogus kernel-mode check

Signed-off-by: Andy Lutomirski <[email protected]>
---
arch/x86/entry/entry_32.S | 7 -------
1 file changed, 7 deletions(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index d36afad80ad1..66ff9c4055d7 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -491,13 +491,6 @@ work_notifysig: # deal with pending signals and
#endif
TRACE_IRQS_ON
ENABLE_INTERRUPTS(CLBR_NONE)
- /* Temporary sanity check */
- movb PT_CS(%esp), %bl
- andb $SEGMENT_RPL_MASK, %bl
- cmpb $USER_RPL, %bl
- jnb 2f
- ud2
-2:
xorl %edx, %edx
call do_notify_resume
jmp resume_userspace
--
2.4.3

2015-07-08 19:25:01

by Andy Lutomirski

[permalink] [raw]
Subject: [RFC/PATCH 5/7] x86/vm86: Teach handle_vm86_trap to return to 32bit mode directly

The TIF_NOTIFY_RESUME hack it was using was buggy and unsupportable.
vm86 mode was completely broken under ptrace, for example, because
we'd never make it to v8086 mode.

This code is still a huge, scary mess, but at least it's no longer
tangled with the exit-to-userspace loop.

Signed-off-by: Andy Lutomirski <[email protected]>
---
arch/x86/entry/entry_32.S | 17 -----------------
arch/x86/kernel/traps.c | 12 ++++++++++++
arch/x86/kernel/vm86_32.c | 8 ++------
3 files changed, 14 insertions(+), 23 deletions(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index 66ff9c4055d7..3afd201bce55 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -481,29 +481,12 @@ work_resched:

work_notifysig: # deal with pending signals and
# notify-resume requests
-#ifdef CONFIG_VM86
- testl $X86_EFLAGS_VM, PT_EFLAGS(%esp)
- movl %esp, %eax
- jnz work_notifysig_v86 # special case for v86
-1:
-#else
movl %esp, %eax
-#endif
TRACE_IRQS_ON
ENABLE_INTERRUPTS(CLBR_NONE)
xorl %edx, %edx
call do_notify_resume
jmp resume_userspace
-
-#ifdef CONFIG_VM86
- ALIGN
-work_notifysig_v86:
- pushl %ecx # save ti_flags for do_notify_resume
- call save_v86_state # %eax contains pt_regs pointer
- popl %ecx
- movl %eax, %esp
- jmp 1b
-#endif
END(work_pending)

# perform syscall exit tracing
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 8e65d8a9b8db..f86172d4f5db 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -190,6 +190,13 @@ do_trap_no_signal(struct task_struct *tsk, int trapnr, char *str,
* On nmi (interrupt 2), do_trap should not be called.
*/
if (trapnr < X86_TRAP_UD) {
+ /*
+ * handle_vm86_trap may not return. If that
+ * happens, then there is no debug stack counter
+ * (it's a 32-bit kernel) and handle_vm86_trap
+ * will clear the preempt counter.
+ */
+
if (!handle_vm86_trap((struct kernel_vm86_regs *) regs,
error_code, trapnr))
return 0;
@@ -650,6 +657,11 @@ dotraplinkage void do_debug(struct pt_regs *regs, long error_code)
preempt_conditional_sti(regs);

if (v8086_mode(regs)) {
+ /*
+ * handle_vm86_trap may not return. If that happens, then
+ * there is no debug stack counter (it's a 32-bit kernel)
+ * and handle_vm86_trap will clear the preempt counter.
+ */
handle_vm86_trap((struct kernel_vm86_regs *) regs, error_code,
X86_TRAP_DB);
preempt_conditional_cli(regs);
diff --git a/arch/x86/kernel/vm86_32.c b/arch/x86/kernel/vm86_32.c
index fc9db6ef2a95..c526ee34c22d 100644
--- a/arch/x86/kernel/vm86_32.c
+++ b/arch/x86/kernel/vm86_32.c
@@ -351,6 +351,7 @@ static inline void return_to_32bit(struct kernel_vm86_regs *regs16, int retval)
{
struct pt_regs *regs32;

+ preempt_count_set(0);
regs32 = save_v86_state(regs16);
regs32->ax = retval;
__asm__ __volatile__("movl %0,%%esp\n\t"
@@ -549,12 +550,7 @@ int handle_vm86_trap(struct kernel_vm86_regs *regs, long error_code, int trapno)
{
if (VMPI.is_vm86pus) {
if ((trapno == 3) || (trapno == 1)) {
- KVM86->regs32->ax = VM86_TRAP + (trapno << 8);
- /* setting this flag forces the code in entry_32.S to
- the path where we call save_v86_state() and change
- the stack pointer to KVM86->regs32 */
- set_thread_flag(TIF_NOTIFY_RESUME);
- return 0;
+ return_to_32bit(regs, VM86_TRAP + (trapno << 8));
}
do_int(regs, trapno, (unsigned char __user *) (regs->pt.ss << 4), SP(regs));
return 0;
--
2.4.3

2015-07-08 19:24:55

by Andy Lutomirski

[permalink] [raw]
Subject: [RFC/PATCH 6/7] x86/entry/32: Use prepare_exit_to_usermode and syscall_return_slowpath

This removes the hybrid asm-and-C implementation of exit work.

Signed-off-by: Andy Lutomirski <[email protected]>
---
arch/x86/entry/entry_32.S | 62 +++++++++--------------------------------------
1 file changed, 11 insertions(+), 51 deletions(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index 3afd201bce55..b2909bf8cf70 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -256,14 +256,10 @@ ret_from_intr:

ENTRY(resume_userspace)
LOCKDEP_SYS_EXIT
- DISABLE_INTERRUPTS(CLBR_ANY) # make sure we don't miss an interrupt
- # setting need_resched or sigpending
- # between sampling and the iret
+ DISABLE_INTERRUPTS(CLBR_ANY)
TRACE_IRQS_OFF
- movl TI_flags(%ebp), %ecx
- andl $_TIF_WORK_MASK, %ecx # is there any work to be done on
- # int/exception return?
- jne work_pending
+ movl %esp, %eax
+ call prepare_exit_to_usermode
jmp restore_all
END(ret_from_exception)

@@ -341,7 +337,7 @@ sysenter_after_call:
TRACE_IRQS_OFF
movl TI_flags(%ebp), %ecx
testl $_TIF_ALLWORK_MASK, %ecx
- jnz syscall_exit_work
+ jnz syscall_exit_work_irqs_off
sysenter_exit:
/* if something modifies registers it must also disable sysexit */
movl PT_EIP(%esp), %edx
@@ -377,13 +373,7 @@ syscall_after_call:
movl %eax, PT_EAX(%esp) # store the return value
syscall_exit:
LOCKDEP_SYS_EXIT
- DISABLE_INTERRUPTS(CLBR_ANY) # make sure we don't miss an interrupt
- # setting need_resched or sigpending
- # between sampling and the iret
- TRACE_IRQS_OFF
- movl TI_flags(%ebp), %ecx
- testl $_TIF_ALLWORK_MASK, %ecx # current->work
- jnz syscall_exit_work
+ jmp syscall_exit_work

restore_all:
TRACE_IRQS_IRET
@@ -460,35 +450,6 @@ ldt_ss:
#endif
ENDPROC(entry_INT80_32)

- # perform work that needs to be done immediately before resumption
- ALIGN
-work_pending:
- testb $_TIF_NEED_RESCHED, %cl
- jz work_notifysig
-work_resched:
- call schedule
- LOCKDEP_SYS_EXIT
- DISABLE_INTERRUPTS(CLBR_ANY) # make sure we don't miss an interrupt
- # setting need_resched or sigpending
- # between sampling and the iret
- TRACE_IRQS_OFF
- movl TI_flags(%ebp), %ecx
- andl $_TIF_WORK_MASK, %ecx # is there any work to be done other
- # than syscall tracing?
- jz restore_all
- testb $_TIF_NEED_RESCHED, %cl
- jnz work_resched
-
-work_notifysig: # deal with pending signals and
- # notify-resume requests
- movl %esp, %eax
- TRACE_IRQS_ON
- ENABLE_INTERRUPTS(CLBR_NONE)
- xorl %edx, %edx
- call do_notify_resume
- jmp resume_userspace
-END(work_pending)
-
# perform syscall exit tracing
ALIGN
syscall_trace_entry:
@@ -503,15 +464,14 @@ END(syscall_trace_entry)

# perform syscall exit tracing
ALIGN
-syscall_exit_work:
- testl $_TIF_WORK_SYSCALL_EXIT, %ecx
- jz work_pending
+syscall_exit_work_irqs_off:
TRACE_IRQS_ON
- ENABLE_INTERRUPTS(CLBR_ANY) # could let syscall_trace_leave() call
- # schedule() instead
+ ENABLE_INTERRUPTS(CLBR_ANY)
+
+syscall_exit_work:
movl %esp, %eax
- call syscall_trace_leave
- jmp resume_userspace
+ call syscall_return_slowpath
+ jmp restore_all
END(syscall_exit_work)

syscall_fault:
--
2.4.3

2015-07-08 19:24:44

by Andy Lutomirski

[permalink] [raw]
Subject: [RFC/PATCH 7/7] x86/entry: Remove do_notify_resume, syscall_trace_leave, and their TIF masks

They are no longer used. Good riddance!

Deleting the TIF_ macros is really nice. It was never clear why
there were so many variants.

Signed-off-by: Andy Lutomirski <[email protected]>
---
arch/x86/entry/common.c | 57 --------------------------------------
arch/x86/include/asm/ptrace.h | 1 -
arch/x86/include/asm/signal.h | 1 -
arch/x86/include/asm/thread_info.h | 16 -----------
4 files changed, 75 deletions(-)

diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
index febc53086a69..1e8a2901fa00 100644
--- a/arch/x86/entry/common.c
+++ b/arch/x86/entry/common.c
@@ -207,37 +207,6 @@ long syscall_trace_enter(struct pt_regs *regs)
return syscall_trace_enter_phase2(regs, arch, phase1_result);
}

-/* Deprecated. */
-void syscall_trace_leave(struct pt_regs *regs)
-{
- bool step;
-
- /*
- * We may come here right after calling schedule_user()
- * or do_notify_resume(), in which case we can be in RCU
- * user mode.
- */
- user_exit();
-
- audit_syscall_exit(regs);
-
- if (unlikely(test_thread_flag(TIF_SYSCALL_TRACEPOINT)))
- trace_sys_exit(regs, regs->ax);
-
- /*
- * If TIF_SYSCALL_EMU is set, we only get here because of
- * TIF_SINGLESTEP (i.e. this is PTRACE_SYSEMU_SINGLESTEP).
- * We already reported this syscall instruction in
- * syscall_trace_enter().
- */
- step = unlikely(test_thread_flag(TIF_SINGLESTEP)) &&
- !test_thread_flag(TIF_SYSCALL_EMU);
- if (step || test_thread_flag(TIF_SYSCALL_TRACE))
- tracehook_report_syscall_exit(regs, step);
-
- user_enter();
-}
-
static struct thread_info *pt_regs_to_thread_info(struct pt_regs *regs)
{
unsigned long top_of_stack =
@@ -346,29 +315,3 @@ __visible void syscall_return_slowpath(struct pt_regs *regs)
local_irq_disable();
prepare_exit_to_usermode(regs);
}
-
-/*
- * Deprecated notification of userspace execution resumption
- * - triggered by the TIF_WORK_MASK flags
- */
-__visible void
-do_notify_resume(struct pt_regs *regs, void *unused, __u32 thread_info_flags)
-{
- user_exit();
-
- if (thread_info_flags & _TIF_UPROBE)
- uprobe_notify_resume(regs);
-
- /* deal with pending signal delivery */
- if (thread_info_flags & _TIF_SIGPENDING)
- do_signal(regs);
-
- if (thread_info_flags & _TIF_NOTIFY_RESUME) {
- clear_thread_flag(TIF_NOTIFY_RESUME);
- tracehook_notify_resume(regs);
- }
- if (thread_info_flags & _TIF_USER_RETURN_NOTIFY)
- fire_user_return_notifiers();
-
- user_enter();
-}
diff --git a/arch/x86/include/asm/ptrace.h b/arch/x86/include/asm/ptrace.h
index 5fabf1362942..6271281f947d 100644
--- a/arch/x86/include/asm/ptrace.h
+++ b/arch/x86/include/asm/ptrace.h
@@ -88,7 +88,6 @@ extern long syscall_trace_enter_phase2(struct pt_regs *, u32 arch,
unsigned long phase1_result);

extern long syscall_trace_enter(struct pt_regs *);
-extern void syscall_trace_leave(struct pt_regs *);

static inline unsigned long regs_return_value(struct pt_regs *regs)
{
diff --git a/arch/x86/include/asm/signal.h b/arch/x86/include/asm/signal.h
index b42408bcf6b5..c481be78fcf1 100644
--- a/arch/x86/include/asm/signal.h
+++ b/arch/x86/include/asm/signal.h
@@ -31,7 +31,6 @@ typedef sigset_t compat_sigset_t;
#include <uapi/asm/signal.h>
#ifndef __ASSEMBLY__
extern void do_signal(struct pt_regs *regs);
-extern void do_notify_resume(struct pt_regs *, void *, __u32);

#define __ARCH_HAS_SA_RESTORER

diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
index 225ee545e1a0..aa12c5bbf192 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -140,27 +140,11 @@ struct thread_info {
_TIF_SECCOMP | _TIF_SINGLESTEP | _TIF_SYSCALL_TRACEPOINT | \
_TIF_NOHZ)

-/* work to do in syscall_trace_leave() */
-#define _TIF_WORK_SYSCALL_EXIT \
- (_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT | _TIF_SINGLESTEP | \
- _TIF_SYSCALL_TRACEPOINT | _TIF_NOHZ)
-
-/* work to do on interrupt/exception return */
-#define _TIF_WORK_MASK \
- (0x0000FFFF & \
- ~(_TIF_SYSCALL_TRACE|_TIF_SYSCALL_AUDIT| \
- _TIF_SINGLESTEP|_TIF_SECCOMP|_TIF_SYSCALL_EMU))
-
/* work to do on any return to user space */
#define _TIF_ALLWORK_MASK \
((0x0000FFFF & ~_TIF_SECCOMP) | _TIF_SYSCALL_TRACEPOINT | \
_TIF_NOHZ)

-/* Only used for 64 bit */
-#define _TIF_DO_NOTIFY_MASK \
- (_TIF_SIGPENDING | _TIF_NOTIFY_RESUME | \
- _TIF_USER_RETURN_NOTIFY | _TIF_UPROBE)
-
/* flags to check in __switch_to() */
#define _TIF_WORK_CTXSW \
(_TIF_IO_BITMAP|_TIF_NOTSC|_TIF_BLOCKSTEP)
--
2.4.3

2015-07-08 19:26:05

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [RFC/PATCH 3/7] [TEMPORARY] x86/entry/32: Sanity check for work_notifysig

[cc: Al -- saying I'm cc-ing you in the patch 0 description doesn't make it so.]

On Wed, Jul 8, 2015 at 12:24 PM, Andy Lutomirski <[email protected]> wrote:
> 44fbbb3dc687c added an unnecessary check. Add a temporary assertion
> to confirm that it's unnecessary.
>
> Signed-off-by: Andy Lutomirski <[email protected]>
> ---
> arch/x86/entry/entry_32.S | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
> index a36d5df6a749..d36afad80ad1 100644
> --- a/arch/x86/entry/entry_32.S
> +++ b/arch/x86/entry/entry_32.S
> @@ -491,10 +491,13 @@ work_notifysig: # deal with pending signals and
> #endif
> TRACE_IRQS_ON
> ENABLE_INTERRUPTS(CLBR_NONE)
> + /* Temporary sanity check */
> movb PT_CS(%esp), %bl
> andb $SEGMENT_RPL_MASK, %bl
> cmpb $USER_RPL, %bl
> - jb resume_kernel
> + jnb 2f
> + ud2
> +2:
> xorl %edx, %edx
> call do_notify_resume
> jmp resume_userspace
> --
> 2.4.3
>



--
Andy Lutomirski
AMA Capital Management, LLC

2015-07-08 19:26:13

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [RFC/PATCH 4/7] x86/entry/32: Finish removing bogus kernel-mode check

[cc: Al -- saying I'm cc-ing you in the patch 0 description doesn't make it so.]

On Wed, Jul 8, 2015 at 12:24 PM, Andy Lutomirski <[email protected]> wrote:
> Signed-off-by: Andy Lutomirski <[email protected]>
> ---
> arch/x86/entry/entry_32.S | 7 -------
> 1 file changed, 7 deletions(-)
>
> diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
> index d36afad80ad1..66ff9c4055d7 100644
> --- a/arch/x86/entry/entry_32.S
> +++ b/arch/x86/entry/entry_32.S
> @@ -491,13 +491,6 @@ work_notifysig: # deal with pending signals and
> #endif
> TRACE_IRQS_ON
> ENABLE_INTERRUPTS(CLBR_NONE)
> - /* Temporary sanity check */
> - movb PT_CS(%esp), %bl
> - andb $SEGMENT_RPL_MASK, %bl
> - cmpb $USER_RPL, %bl
> - jnb 2f
> - ud2
> -2:
> xorl %edx, %edx
> call do_notify_resume
> jmp resume_userspace
> --
> 2.4.3
>



--
Andy Lutomirski
AMA Capital Management, LLC

2015-07-09 22:42:23

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [RFC/PATCH 5/7] x86/vm86: Teach handle_vm86_trap to return to 32bit mode directly

On Wed, Jul 8, 2015 at 12:24 PM, Andy Lutomirski <[email protected]> wrote:
> The TIF_NOTIFY_RESUME hack it was using was buggy and unsupportable.
> vm86 mode was completely broken under ptrace, for example, because
> we'd never make it to v8086 mode.
>
> This code is still a huge, scary mess, but at least it's no longer
> tangled with the exit-to-userspace loop.

This patch is incorrect. Brian, what's the ETA for your vm86 cleanup?
If it's very soon, then I'll see if I can rely on it. If not, I'll
have to come up with a way to fix this patch.

Grr. The kernel state when handle_vm86_trap is called is absurd right
now. Somehow we're supposed to survive do_trap, send a signal
corresponding to the outside-vm86 state, and exit vm86 cleanly (with
ax = 0), all before returning to user mode. I doubt these semantics
are even intentional.

This code sucks.

--Andy

2015-07-10 01:34:29

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [RFC/PATCH 5/7] x86/vm86: Teach handle_vm86_trap to return to 32bit mode directly

On Thu, Jul 9, 2015 at 3:41 PM, Andy Lutomirski <[email protected]> wrote:
> On Wed, Jul 8, 2015 at 12:24 PM, Andy Lutomirski <[email protected]> wrote:
>> The TIF_NOTIFY_RESUME hack it was using was buggy and unsupportable.
>> vm86 mode was completely broken under ptrace, for example, because
>> we'd never make it to v8086 mode.
>>
>> This code is still a huge, scary mess, but at least it's no longer
>> tangled with the exit-to-userspace loop.
>
> This patch is incorrect. Brian, what's the ETA for your vm86 cleanup?
> If it's very soon, then I'll see if I can rely on it. If not, I'll
> have to come up with a way to fix this patch.
>
> Grr. The kernel state when handle_vm86_trap is called is absurd right
> now. Somehow we're supposed to survive do_trap, send a signal
> corresponding to the outside-vm86 state, and exit vm86 cleanly (with
> ax = 0), all before returning to user mode. I doubt these semantics
> are even intentional.
>
> This code sucks.

OK, I have a version that seems to work. It comes with a much better
selftest, too. I'll send it shortly.

Brian, would it make sense to base your work on top of it?

Now that I've looked at this stuff, if I were designing Linux support
for v8086 mode, I'd do it very differently. There wouldn't be a vm86
syscall at all. Instead you'd call sigaltstack, then raise a signal,
set X86_EFLAGS_VM, and return.

The kernel would handle X86_EFLAGS_VM being set by setting TIF_V8086
and adjusting sp0. On entry, TIF_V8086 would move the segment
registers from the hardware frame into pt_regs and, on exit, TIF_V8086
would move them back. Clearing X86_EFLAGS_VM (via ptrace, signal
delivery, or sigreturn) would sanitize the segment registers.

SYSENTER would be safe, so the SYSENTER_CS hack wouldn't be needed.
Of course, we'd lose the CPU state, so the user would have to be
careful.

And that's it. There wouldn't be any emulation -- user code could
emulate syscalls all by itself in a signal handler. Exiting v8086
mode would be straightforward -- just do anything that would raise a
signal.

Of course, this isn't at all ABI-compatible with the current turd, and
v8086 mode isn't really that useful, so this is just idle retroactive
speculation. But the TIF_V8086 trick would still be useful to let us
get rid of all the awful hacks in the trap and exit code.

--Andy

2015-07-10 15:27:28

by Brian Gerst

[permalink] [raw]
Subject: Re: [RFC/PATCH 5/7] x86/vm86: Teach handle_vm86_trap to return to 32bit mode directly

On Thu, Jul 9, 2015 at 9:33 PM, Andy Lutomirski <[email protected]> wrote:
> On Thu, Jul 9, 2015 at 3:41 PM, Andy Lutomirski <[email protected]> wrote:
>> On Wed, Jul 8, 2015 at 12:24 PM, Andy Lutomirski <[email protected]> wrote:
>>> The TIF_NOTIFY_RESUME hack it was using was buggy and unsupportable.
>>> vm86 mode was completely broken under ptrace, for example, because
>>> we'd never make it to v8086 mode.
>>>
>>> This code is still a huge, scary mess, but at least it's no longer
>>> tangled with the exit-to-userspace loop.
>>
>> This patch is incorrect. Brian, what's the ETA for your vm86 cleanup?
>> If it's very soon, then I'll see if I can rely on it. If not, I'll
>> have to come up with a way to fix this patch.
>>
>> Grr. The kernel state when handle_vm86_trap is called is absurd right
>> now. Somehow we're supposed to survive do_trap, send a signal
>> corresponding to the outside-vm86 state, and exit vm86 cleanly (with
>> ax = 0), all before returning to user mode. I doubt these semantics
>> are even intentional.
>>
>> This code sucks.
>
> OK, I have a version that seems to work. It comes with a much better
> selftest, too. I'll send it shortly.
>
> Brian, would it make sense to base your work on top of it?
>
> Now that I've looked at this stuff, if I were designing Linux support
> for v8086 mode, I'd do it very differently. There wouldn't be a vm86
> syscall at all. Instead you'd call sigaltstack, then raise a signal,
> set X86_EFLAGS_VM, and return.
>
> The kernel would handle X86_EFLAGS_VM being set by setting TIF_V8086
> and adjusting sp0. On entry, TIF_V8086 would move the segment
> registers from the hardware frame into pt_regs and, on exit, TIF_V8086
> would move them back. Clearing X86_EFLAGS_VM (via ptrace, signal
> delivery, or sigreturn) would sanitize the segment registers.
>
> SYSENTER would be safe, so the SYSENTER_CS hack wouldn't be needed.
> Of course, we'd lose the CPU state, so the user would have to be
> careful.
>
> And that's it. There wouldn't be any emulation -- user code could
> emulate syscalls all by itself in a signal handler. Exiting v8086
> mode would be straightforward -- just do anything that would raise a
> signal.
>
> Of course, this isn't at all ABI-compatible with the current turd, and
> v8086 mode isn't really that useful, so this is just idle retroactive
> speculation. But the TIF_V8086 trick would still be useful to let us
> get rid of all the awful hacks in the trap and exit code.

I'll post my patches tonight when I get home. It would probably make
more sense for you to base off mine, since it should eliminate the
need for you to touch any vm86 code.

I fixed the signal issue by checking if the VM flag is set in
handle_signal(), and swap the register state there before pushing the
signal frame, but that is only possible after removing the need to
jump back into the exit asm routines. work_notifysig_v86 is gone too
as a result.

--
Brian Gerst