2015-07-20 01:09:18

by Brian Gerst

[permalink] [raw]
Subject: [PATCH v3] x86: vm86 cleanups

The goal of this set of patches is to change vm86 support to return to
userspace with the normal exit paths instead of leaving data on the kernel
stack and jumping directly into the exit asm routines. This fixes issues
like ptrace and syscall auditing not working with vm86, and makes possible
cleanups in the syscall exit work code.

Changes from v2:
- Use gs slot of regs32 (present but unused in lazy mode)
- Add access_ok() checks before get_user_try/put_user_try

Changes from v1:
- Added first two patches
- Changed userspace access to copy each field explicitly instead of relying
on the same order of members in the structure.


2015-07-20 01:11:04

by Brian Gerst

[permalink] [raw]
Subject: [PATCH 1/7] x86/vm86: Clean up saved_fs/gs

There is no need to save FS and non-lazy GS outside the 32-bit regs. Lazy GS
still needs to be saved because it wasn't saved on syscall entry. Save it in
the gs slot of regs32, which is present but unused.

Signed-off-by: Brian Gerst <[email protected]>
---
arch/x86/include/asm/processor.h | 2 --
arch/x86/kernel/vm86_32.c | 6 ++----
2 files changed, 2 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 43e6519..f4e4e3f 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -410,8 +410,6 @@ struct thread_struct {
unsigned long v86flags;
unsigned long v86mask;
unsigned long saved_sp0;
- unsigned int saved_fs;
- unsigned int saved_gs;
#endif
/* IO permissions: */
unsigned long *io_bitmap_ptr;
diff --git a/arch/x86/kernel/vm86_32.c b/arch/x86/kernel/vm86_32.c
index fc9db6e..761a2f9 100644
--- a/arch/x86/kernel/vm86_32.c
+++ b/arch/x86/kernel/vm86_32.c
@@ -159,8 +159,7 @@ struct pt_regs *save_v86_state(struct kernel_vm86_regs *regs)

ret = KVM86->regs32;

- ret->fs = current->thread.saved_fs;
- set_user_gs(ret, current->thread.saved_gs);
+ lazy_load_gs(ret->gs);

return ret;
}
@@ -315,8 +314,7 @@ static void do_sys_vm86(struct kernel_vm86_struct *info, struct task_struct *tsk
*/
info->regs32->ax = VM86_SIGNAL;
tsk->thread.saved_sp0 = tsk->thread.sp0;
- tsk->thread.saved_fs = info->regs32->fs;
- tsk->thread.saved_gs = get_user_gs(info->regs32);
+ lazy_save_gs(info->regs32->gs);

tss = &per_cpu(cpu_tss, get_cpu());
tsk->thread.sp0 = (unsigned long) &info->VM86_TSS_ESP0;
--
2.4.3

2015-07-20 01:09:21

by Brian Gerst

[permalink] [raw]
Subject: [PATCH 2/7] x86/vm86: Preserve orig_ax

There is no legitimate reason for usermode to modify the orig_ax field on
entry to vm86 mode, so copy it from the 32-bit regs.

Signed-off-by: Brian Gerst <[email protected]>
---
arch/x86/kernel/vm86_32.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/arch/x86/kernel/vm86_32.c b/arch/x86/kernel/vm86_32.c
index 761a2f9..9a2dc80 100644
--- a/arch/x86/kernel/vm86_32.c
+++ b/arch/x86/kernel/vm86_32.c
@@ -294,6 +294,8 @@ static void do_sys_vm86(struct kernel_vm86_struct *info, struct task_struct *tsk
info->regs.pt.flags |= info->regs32->flags & ~SAFE_MASK;
info->regs.pt.flags |= X86_VM_MASK;

+ info->regs.pt.orig_ax = info->regs32->orig_ax;
+
switch (info->cpu_type) {
case CPU_286:
tsk->thread.v86mask = 0;
--
2.4.3

2015-07-20 01:10:47

by Brian Gerst

[permalink] [raw]
Subject: [PATCH 3/7] x86/vm86: Move userspace accesses to do_sys_vm86()

Move the userspace accesses down into the common function in preparation for
the next set of patches. Also change to copying the fields explicitly instead
of assuming a fixed order in pt_regs and the kernel data structures.

Signed-off-by: Brian Gerst <[email protected]>
---
arch/x86/include/asm/processor.h | 2 +-
arch/x86/kernel/vm86_32.c | 189 +++++++++++++++++++++------------------
2 files changed, 102 insertions(+), 89 deletions(-)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index f4e4e3f..35ad554 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -405,7 +405,7 @@ struct thread_struct {
unsigned long error_code;
#ifdef CONFIG_X86_32
/* Virtual 86 mode info */
- struct vm86_struct __user *vm86_info;
+ struct vm86plus_struct __user *vm86_info;
unsigned long screen_bitmap;
unsigned long v86flags;
unsigned long v86mask;
diff --git a/arch/x86/kernel/vm86_32.c b/arch/x86/kernel/vm86_32.c
index 9a2dc80..e6c2b47 100644
--- a/arch/x86/kernel/vm86_32.c
+++ b/arch/x86/kernel/vm86_32.c
@@ -90,46 +90,13 @@
#define SAFE_MASK (0xDD5)
#define RETURN_MASK (0xDFF)

-/* convert kernel_vm86_regs to vm86_regs */
-static int copy_vm86_regs_to_user(struct vm86_regs __user *user,
- const struct kernel_vm86_regs *regs)
-{
- int ret = 0;
-
- /*
- * kernel_vm86_regs is missing gs, so copy everything up to
- * (but not including) orig_eax, and then rest including orig_eax.
- */
- ret += copy_to_user(user, regs, offsetof(struct kernel_vm86_regs, pt.orig_ax));
- ret += copy_to_user(&user->orig_eax, &regs->pt.orig_ax,
- sizeof(struct kernel_vm86_regs) -
- offsetof(struct kernel_vm86_regs, pt.orig_ax));
-
- return ret;
-}
-
-/* convert vm86_regs to kernel_vm86_regs */
-static int copy_vm86_regs_from_user(struct kernel_vm86_regs *regs,
- const struct vm86_regs __user *user,
- unsigned extra)
-{
- int ret = 0;
-
- /* copy ax-fs inclusive */
- ret += copy_from_user(regs, user, offsetof(struct kernel_vm86_regs, pt.orig_ax));
- /* copy orig_ax-__gsh+extra */
- ret += copy_from_user(&regs->pt.orig_ax, &user->orig_eax,
- sizeof(struct kernel_vm86_regs) -
- offsetof(struct kernel_vm86_regs, pt.orig_ax) +
- extra);
- return ret;
-}
-
struct pt_regs *save_v86_state(struct kernel_vm86_regs *regs)
{
struct tss_struct *tss;
struct pt_regs *ret;
- unsigned long tmp;
+ struct task_struct *tsk = current;
+ struct vm86plus_struct __user *user;
+ long err = 0;

/*
* This gets called from entry.S with interrupts disabled, but
@@ -138,23 +105,50 @@ struct pt_regs *save_v86_state(struct kernel_vm86_regs *regs)
*/
local_irq_enable();

- if (!current->thread.vm86_info) {
+ if (!tsk->thread.vm86_info) {
pr_alert("no vm86_info: BAD\n");
do_exit(SIGSEGV);
}
- set_flags(regs->pt.flags, VEFLAGS, X86_EFLAGS_VIF | current->thread.v86mask);
- tmp = copy_vm86_regs_to_user(&current->thread.vm86_info->regs, regs);
- tmp += put_user(current->thread.screen_bitmap, &current->thread.vm86_info->screen_bitmap);
- if (tmp) {
+ set_flags(regs->pt.flags, VEFLAGS, X86_EFLAGS_VIF | tsk->thread.v86mask);
+ user = tsk->thread.vm86_info;
+
+ if (!access_ok(VERIFY_WRITE, user, VMPI.is_vm86pus ?
+ sizeof(struct vm86plus_struct) :
+ sizeof(struct vm86_struct))) {
+ pr_alert("could not access userspace vm86_info\n");
+ do_exit(SIGSEGV);
+ }
+
+ put_user_try {
+ put_user_ex(regs->pt.bx, &user->regs.ebx);
+ put_user_ex(regs->pt.cx, &user->regs.ecx);
+ put_user_ex(regs->pt.dx, &user->regs.edx);
+ put_user_ex(regs->pt.si, &user->regs.esi);
+ put_user_ex(regs->pt.di, &user->regs.edi);
+ put_user_ex(regs->pt.bp, &user->regs.ebp);
+ put_user_ex(regs->pt.ax, &user->regs.eax);
+ put_user_ex(regs->pt.ip, &user->regs.eip);
+ put_user_ex(regs->pt.cs, &user->regs.cs);
+ put_user_ex(regs->pt.flags, &user->regs.eflags);
+ put_user_ex(regs->pt.sp, &user->regs.esp);
+ put_user_ex(regs->pt.ss, &user->regs.ss);
+ put_user_ex(regs->es, &user->regs.es);
+ put_user_ex(regs->ds, &user->regs.ds);
+ put_user_ex(regs->fs, &user->regs.fs);
+ put_user_ex(regs->gs, &user->regs.gs);
+
+ put_user_ex(tsk->thread.screen_bitmap, &user->screen_bitmap);
+ } put_user_catch(err);
+ if (err) {
pr_alert("could not access userspace vm86_info\n");
do_exit(SIGSEGV);
}

tss = &per_cpu(cpu_tss, get_cpu());
- current->thread.sp0 = current->thread.saved_sp0;
- current->thread.sysenter_cs = __KERNEL_CS;
- load_sp0(tss, &current->thread);
- current->thread.saved_sp0 = 0;
+ tsk->thread.sp0 = tsk->thread.saved_sp0;
+ tsk->thread.sysenter_cs = __KERNEL_CS;
+ load_sp0(tss, &tsk->thread);
+ tsk->thread.saved_sp0 = 0;
put_cpu();

ret = KVM86->regs32;
@@ -199,7 +193,8 @@ out:


static int do_vm86_irq_handling(int subfunction, int irqnumber);
-static void do_sys_vm86(struct kernel_vm86_struct *info, struct task_struct *tsk);
+static long do_sys_vm86(struct vm86plus_struct __user *v86, bool plus,
+ struct kernel_vm86_struct *info);

SYSCALL_DEFINE1(vm86old, struct vm86_struct __user *, v86)
{
@@ -208,21 +203,8 @@ SYSCALL_DEFINE1(vm86old, struct vm86_struct __user *, v86)
* This remains on the stack until we
* return to 32 bit user space.
*/
- struct task_struct *tsk = current;
- int tmp;

- if (tsk->thread.saved_sp0)
- return -EPERM;
- tmp = copy_vm86_regs_from_user(&info.regs, &v86->regs,
- offsetof(struct kernel_vm86_struct, vm86plus) -
- sizeof(info.regs));
- if (tmp)
- return -EFAULT;
- memset(&info.vm86plus, 0, (int)&info.regs32 - (int)&info.vm86plus);
- info.regs32 = current_pt_regs();
- tsk->thread.vm86_info = v86;
- do_sys_vm86(&info, tsk);
- return 0; /* we never return here */
+ return do_sys_vm86((struct vm86plus_struct __user *) v86, false, &info);
}


@@ -233,11 +215,7 @@ SYSCALL_DEFINE2(vm86, unsigned long, cmd, unsigned long, arg)
* This remains on the stack until we
* return to 32 bit user space.
*/
- struct task_struct *tsk;
- int tmp;
- struct vm86plus_struct __user *v86;

- tsk = current;
switch (cmd) {
case VM86_REQUEST_IRQ:
case VM86_FREE_IRQ:
@@ -255,34 +233,69 @@ SYSCALL_DEFINE2(vm86, unsigned long, cmd, unsigned long, arg)
}

/* we come here only for functions VM86_ENTER, VM86_ENTER_NO_BYPASS */
- if (tsk->thread.saved_sp0)
- return -EPERM;
- v86 = (struct vm86plus_struct __user *)arg;
- tmp = copy_vm86_regs_from_user(&info.regs, &v86->regs,
- offsetof(struct kernel_vm86_struct, regs32) -
- sizeof(info.regs));
- if (tmp)
- return -EFAULT;
- info.regs32 = current_pt_regs();
- info.vm86plus.is_vm86pus = 1;
- tsk->thread.vm86_info = (struct vm86_struct __user *)v86;
- do_sys_vm86(&info, tsk);
- return 0; /* we never return here */
+ return do_sys_vm86((struct vm86plus_struct __user *) arg, true, &info);
}


-static void do_sys_vm86(struct kernel_vm86_struct *info, struct task_struct *tsk)
+static long do_sys_vm86(struct vm86plus_struct __user *v86, bool plus,
+ struct kernel_vm86_struct *info)
{
struct tss_struct *tss;
-/*
- * make sure the vm86() system call doesn't try to do anything silly
- */
- info->regs.pt.ds = 0;
- info->regs.pt.es = 0;
- info->regs.pt.fs = 0;
-#ifndef CONFIG_X86_32_LAZY_GS
- info->regs.pt.gs = 0;
-#endif
+ struct task_struct *tsk = current;
+ unsigned long err = 0;
+
+ if (tsk->thread.saved_sp0)
+ return -EPERM;
+
+ if (!access_ok(VERIFY_READ, v86, plus ?
+ sizeof(struct vm86_struct) :
+ sizeof(struct vm86plus_struct)))
+ return -EFAULT;
+
+ memset(info, 0, sizeof(*info));
+ get_user_try {
+ unsigned short seg;
+ get_user_ex(info->regs.pt.bx, &v86->regs.ebx);
+ get_user_ex(info->regs.pt.cx, &v86->regs.ecx);
+ get_user_ex(info->regs.pt.dx, &v86->regs.edx);
+ get_user_ex(info->regs.pt.si, &v86->regs.esi);
+ get_user_ex(info->regs.pt.di, &v86->regs.edi);
+ get_user_ex(info->regs.pt.bp, &v86->regs.ebp);
+ get_user_ex(info->regs.pt.ax, &v86->regs.eax);
+ get_user_ex(info->regs.pt.ip, &v86->regs.eip);
+ get_user_ex(seg, &v86->regs.cs);
+ info->regs.pt.cs = seg;
+ get_user_ex(info->regs.pt.flags, &v86->regs.eflags);
+ get_user_ex(info->regs.pt.sp, &v86->regs.esp);
+ get_user_ex(seg, &v86->regs.ss);
+ info->regs.pt.ss = seg;
+ get_user_ex(info->regs.es, &v86->regs.es);
+ get_user_ex(info->regs.ds, &v86->regs.ds);
+ get_user_ex(info->regs.fs, &v86->regs.fs);
+ get_user_ex(info->regs.gs, &v86->regs.gs);
+
+ get_user_ex(info->flags, &v86->flags);
+ get_user_ex(info->screen_bitmap, &v86->screen_bitmap);
+ get_user_ex(info->cpu_type, &v86->cpu_type);
+ } get_user_catch(err);
+ if (err)
+ return err;
+
+ if (copy_from_user(&info->int_revectored, &v86->int_revectored,
+ sizeof(struct revectored_struct)))
+ return -EFAULT;
+ if (copy_from_user(&info->int21_revectored, &v86->int21_revectored,
+ sizeof(struct revectored_struct)))
+ return -EFAULT;
+ if (plus) {
+ if (copy_from_user(&info->vm86plus, &v86->vm86plus,
+ sizeof(struct vm86plus_info_struct)))
+ return -EFAULT;
+ info->vm86plus.is_vm86pus = 1;
+ }
+
+ info->regs32 = current_pt_regs();
+ tsk->thread.vm86_info = v86;

/*
* The flags register is also special: we cannot trust that the user
@@ -344,7 +357,7 @@ static void do_sys_vm86(struct kernel_vm86_struct *info, struct task_struct *tsk
"jmp resume_userspace"
: /* no outputs */
:"r" (&info->regs), "r" (task_thread_info(tsk)), "r" (0));
- /* we never return here */
+ unreachable(); /* we never return here */
}

static inline void return_to_32bit(struct kernel_vm86_regs *regs16, int retval)
--
2.4.3

2015-07-20 01:10:06

by Brian Gerst

[permalink] [raw]
Subject: [PATCH 4/7] x86/vm86: Move vm86 fields out of thread_struct

Allocate a separate structure for the vm86 fields.

Signed-off-by: Brian Gerst <[email protected]>
---
arch/x86/include/asm/processor.h | 9 ++------
arch/x86/include/asm/vm86.h | 8 +++++++
arch/x86/kernel/process.c | 7 ++++++
arch/x86/kernel/vm86_32.c | 47 ++++++++++++++++++++++++----------------
arch/x86/mm/fault.c | 4 ++--
5 files changed, 47 insertions(+), 28 deletions(-)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 35ad554..d636fb8 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -403,13 +403,9 @@ struct thread_struct {
unsigned long cr2;
unsigned long trap_nr;
unsigned long error_code;
-#ifdef CONFIG_X86_32
+#ifdef CONFIG_VM86
/* Virtual 86 mode info */
- struct vm86plus_struct __user *vm86_info;
- unsigned long screen_bitmap;
- unsigned long v86flags;
- unsigned long v86mask;
- unsigned long saved_sp0;
+ struct kernel_vm86_info *vm86;
#endif
/* IO permissions: */
unsigned long *io_bitmap_ptr;
@@ -714,7 +710,6 @@ static inline void spin_lock_prefetch(const void *x)

#define INIT_THREAD { \
.sp0 = TOP_OF_INIT_STACK, \
- .vm86_info = NULL, \
.sysenter_cs = __KERNEL_CS, \
.io_bitmap_ptr = NULL, \
}
diff --git a/arch/x86/include/asm/vm86.h b/arch/x86/include/asm/vm86.h
index 1d8de3f..a02967d 100644
--- a/arch/x86/include/asm/vm86.h
+++ b/arch/x86/include/asm/vm86.h
@@ -58,6 +58,14 @@ struct kernel_vm86_struct {
*/
};

+struct kernel_vm86_info {
+ struct vm86plus_struct __user *vm86_info;
+ unsigned long screen_bitmap;
+ unsigned long v86flags;
+ unsigned long v86mask;
+ unsigned long saved_sp0;
+};
+
#ifdef CONFIG_VM86

void handle_vm86_fault(struct kernel_vm86_regs *, long);
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 9cad694..5dcd037 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -110,6 +110,13 @@ void exit_thread(void)
kfree(bp);
}

+#ifdef CONFIG_VM86
+ if (t->vm86) {
+ kfree(t->vm86);
+ t->vm86 = NULL;
+ }
+#endif
+
fpu__drop(fpu);
}

diff --git a/arch/x86/kernel/vm86_32.c b/arch/x86/kernel/vm86_32.c
index e6c2b47..dce0a1c 100644
--- a/arch/x86/kernel/vm86_32.c
+++ b/arch/x86/kernel/vm86_32.c
@@ -44,6 +44,7 @@
#include <linux/ptrace.h>
#include <linux/audit.h>
#include <linux/stddef.h>
+#include <linux/slab.h>

#include <asm/uaccess.h>
#include <asm/io.h>
@@ -81,8 +82,8 @@
/*
* virtual flags (16 and 32-bit versions)
*/
-#define VFLAGS (*(unsigned short *)&(current->thread.v86flags))
-#define VEFLAGS (current->thread.v86flags)
+#define VFLAGS (*(unsigned short *)&(current->thread.vm86->v86flags))
+#define VEFLAGS (current->thread.vm86->v86flags)

#define set_flags(X, new, mask) \
((X) = ((X) & ~(mask)) | ((new) & (mask)))
@@ -96,6 +97,7 @@ struct pt_regs *save_v86_state(struct kernel_vm86_regs *regs)
struct pt_regs *ret;
struct task_struct *tsk = current;
struct vm86plus_struct __user *user;
+ struct kernel_vm86_info *vm86 = current->thread.vm86;
long err = 0;

/*
@@ -105,12 +107,12 @@ struct pt_regs *save_v86_state(struct kernel_vm86_regs *regs)
*/
local_irq_enable();

- if (!tsk->thread.vm86_info) {
+ if (!vm86 || !vm86->vm86_info) {
pr_alert("no vm86_info: BAD\n");
do_exit(SIGSEGV);
}
- set_flags(regs->pt.flags, VEFLAGS, X86_EFLAGS_VIF | tsk->thread.v86mask);
- user = tsk->thread.vm86_info;
+ set_flags(regs->pt.flags, VEFLAGS, X86_EFLAGS_VIF | vm86->v86mask);
+ user = vm86->vm86_info;

if (!access_ok(VERIFY_WRITE, user, VMPI.is_vm86pus ?
sizeof(struct vm86plus_struct) :
@@ -137,7 +139,7 @@ struct pt_regs *save_v86_state(struct kernel_vm86_regs *regs)
put_user_ex(regs->fs, &user->regs.fs);
put_user_ex(regs->gs, &user->regs.gs);

- put_user_ex(tsk->thread.screen_bitmap, &user->screen_bitmap);
+ put_user_ex(vm86->screen_bitmap, &user->screen_bitmap);
} put_user_catch(err);
if (err) {
pr_alert("could not access userspace vm86_info\n");
@@ -145,10 +147,10 @@ struct pt_regs *save_v86_state(struct kernel_vm86_regs *regs)
}

tss = &per_cpu(cpu_tss, get_cpu());
- tsk->thread.sp0 = tsk->thread.saved_sp0;
+ tsk->thread.sp0 = vm86->saved_sp0;
tsk->thread.sysenter_cs = __KERNEL_CS;
load_sp0(tss, &tsk->thread);
- tsk->thread.saved_sp0 = 0;
+ vm86->saved_sp0 = 0;
put_cpu();

ret = KVM86->regs32;
@@ -242,9 +244,16 @@ static long do_sys_vm86(struct vm86plus_struct __user *v86, bool plus,
{
struct tss_struct *tss;
struct task_struct *tsk = current;
+ struct kernel_vm86_info *vm86 = tsk->thread.vm86;
unsigned long err = 0;

- if (tsk->thread.saved_sp0)
+ if (!vm86)
+ {
+ if (!(vm86 = kzalloc(sizeof(*vm86), GFP_KERNEL)))
+ return -ENOMEM;
+ tsk->thread.vm86 = vm86;
+ }
+ if (vm86->saved_sp0)
return -EPERM;

if (!access_ok(VERIFY_READ, v86, plus ?
@@ -295,7 +304,7 @@ static long do_sys_vm86(struct vm86plus_struct __user *v86, bool plus,
}

info->regs32 = current_pt_regs();
- tsk->thread.vm86_info = v86;
+ vm86->vm86_info = v86;

/*
* The flags register is also special: we cannot trust that the user
@@ -311,16 +320,16 @@ static long do_sys_vm86(struct vm86plus_struct __user *v86, bool plus,

switch (info->cpu_type) {
case CPU_286:
- tsk->thread.v86mask = 0;
+ vm86->v86mask = 0;
break;
case CPU_386:
- tsk->thread.v86mask = X86_EFLAGS_NT | X86_EFLAGS_IOPL;
+ vm86->v86mask = X86_EFLAGS_NT | X86_EFLAGS_IOPL;
break;
case CPU_486:
- tsk->thread.v86mask = X86_EFLAGS_AC | X86_EFLAGS_NT | X86_EFLAGS_IOPL;
+ vm86->v86mask = X86_EFLAGS_AC | X86_EFLAGS_NT | X86_EFLAGS_IOPL;
break;
default:
- tsk->thread.v86mask = X86_EFLAGS_ID | X86_EFLAGS_AC | X86_EFLAGS_NT | X86_EFLAGS_IOPL;
+ vm86->v86mask = X86_EFLAGS_ID | X86_EFLAGS_AC | X86_EFLAGS_NT | X86_EFLAGS_IOPL;
break;
}

@@ -328,7 +337,7 @@ static long do_sys_vm86(struct vm86plus_struct __user *v86, bool plus,
* Save old state, set default return value (%ax) to 0 (VM86_SIGNAL)
*/
info->regs32->ax = VM86_SIGNAL;
- tsk->thread.saved_sp0 = tsk->thread.sp0;
+ vm86->saved_sp0 = tsk->thread.sp0;
lazy_save_gs(info->regs32->gs);

tss = &per_cpu(cpu_tss, get_cpu());
@@ -338,7 +347,7 @@ static long do_sys_vm86(struct vm86plus_struct __user *v86, bool plus,
load_sp0(tss, &tsk->thread);
put_cpu();

- tsk->thread.screen_bitmap = info->screen_bitmap;
+ vm86->screen_bitmap = info->screen_bitmap;
if (info->flags & VM86_SCREEN_BITMAP)
mark_screen_rdonly(tsk->mm);

@@ -408,7 +417,7 @@ static inline void clear_AC(struct kernel_vm86_regs *regs)

static inline void set_vflags_long(unsigned long flags, struct kernel_vm86_regs *regs)
{
- set_flags(VEFLAGS, flags, current->thread.v86mask);
+ set_flags(VEFLAGS, flags, current->thread.vm86->v86mask);
set_flags(regs->pt.flags, flags, SAFE_MASK);
if (flags & X86_EFLAGS_IF)
set_IF(regs);
@@ -418,7 +427,7 @@ static inline void set_vflags_long(unsigned long flags, struct kernel_vm86_regs

static inline void set_vflags_short(unsigned short flags, struct kernel_vm86_regs *regs)
{
- set_flags(VFLAGS, flags, current->thread.v86mask);
+ set_flags(VFLAGS, flags, current->thread.vm86->v86mask);
set_flags(regs->pt.flags, flags, SAFE_MASK);
if (flags & X86_EFLAGS_IF)
set_IF(regs);
@@ -433,7 +442,7 @@ static inline unsigned long get_vflags(struct kernel_vm86_regs *regs)
if (VEFLAGS & X86_EFLAGS_VIF)
flags |= X86_EFLAGS_IF;
flags |= X86_EFLAGS_IOPL;
- return flags | (VEFLAGS & current->thread.v86mask);
+ return flags | (VEFLAGS & current->thread.vm86->v86mask);
}

static inline int is_revectored(int nr, struct revectored_struct *bitmap)
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 81dcebf..5196ac4 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -315,12 +315,12 @@ check_v8086_mode(struct pt_regs *regs, unsigned long address,
{
unsigned long bit;

- if (!v8086_mode(regs))
+ if (!v8086_mode(regs) || !tsk->thread.vm86)
return;

bit = (address - 0xA0000) >> PAGE_SHIFT;
if (bit < 32)
- tsk->thread.screen_bitmap |= 1 << bit;
+ tsk->thread.vm86->screen_bitmap |= 1 << bit;
}

static bool low_pfn(unsigned long pfn)
--
2.4.3

2015-07-20 01:09:24

by Brian Gerst

[permalink] [raw]
Subject: [PATCH 5/7] x86/vm86: Move fields from kernel_vm86_struct

Move the non-regs fields to the off-stack data.

Signed-off-by: Brian Gerst <[email protected]>
---
arch/x86/include/asm/vm86.h | 16 ++++++++--------
arch/x86/kernel/vm86_32.c | 42 ++++++++++++++++++++++--------------------
2 files changed, 30 insertions(+), 28 deletions(-)

diff --git a/arch/x86/include/asm/vm86.h b/arch/x86/include/asm/vm86.h
index a02967d..ff64c27 100644
--- a/arch/x86/include/asm/vm86.h
+++ b/arch/x86/include/asm/vm86.h
@@ -38,13 +38,7 @@ struct kernel_vm86_struct {
* Therefore, pt_regs in fact points to a complete 'kernel_vm86_struct'
* in kernelspace, hence we need not reget the data from userspace.
*/
-#define VM86_TSS_ESP0 flags
- unsigned long flags;
- unsigned long screen_bitmap;
- unsigned long cpu_type;
- struct revectored_struct int_revectored;
- struct revectored_struct int21_revectored;
- struct vm86plus_info_struct vm86plus;
+#define VM86_TSS_ESP0 regs32
struct pt_regs *regs32; /* here we save the pointer to the old regs */
/*
* The below is not part of the structure, but the stack layout continues
@@ -60,10 +54,16 @@ struct kernel_vm86_struct {

struct kernel_vm86_info {
struct vm86plus_struct __user *vm86_info;
- unsigned long screen_bitmap;
unsigned long v86flags;
unsigned long v86mask;
unsigned long saved_sp0;
+
+ unsigned long flags;
+ unsigned long screen_bitmap;
+ unsigned long cpu_type;
+ struct revectored_struct int_revectored;
+ struct revectored_struct int21_revectored;
+ struct vm86plus_info_struct vm86plus;
};

#ifdef CONFIG_VM86
diff --git a/arch/x86/kernel/vm86_32.c b/arch/x86/kernel/vm86_32.c
index dce0a1c..bfaa59b 100644
--- a/arch/x86/kernel/vm86_32.c
+++ b/arch/x86/kernel/vm86_32.c
@@ -68,7 +68,6 @@


#define KVM86 ((struct kernel_vm86_struct *)regs)
-#define VMPI KVM86->vm86plus


/*
@@ -114,7 +113,7 @@ struct pt_regs *save_v86_state(struct kernel_vm86_regs *regs)
set_flags(regs->pt.flags, VEFLAGS, X86_EFLAGS_VIF | vm86->v86mask);
user = vm86->vm86_info;

- if (!access_ok(VERIFY_WRITE, user, VMPI.is_vm86pus ?
+ if (!access_ok(VERIFY_WRITE, user, vm86->vm86plus.is_vm86pus ?
sizeof(struct vm86plus_struct) :
sizeof(struct vm86_struct))) {
pr_alert("could not access userspace vm86_info\n");
@@ -283,25 +282,27 @@ static long do_sys_vm86(struct vm86plus_struct __user *v86, bool plus,
get_user_ex(info->regs.fs, &v86->regs.fs);
get_user_ex(info->regs.gs, &v86->regs.gs);

- get_user_ex(info->flags, &v86->flags);
- get_user_ex(info->screen_bitmap, &v86->screen_bitmap);
- get_user_ex(info->cpu_type, &v86->cpu_type);
+ get_user_ex(vm86->flags, &v86->flags);
+ get_user_ex(vm86->screen_bitmap, &v86->screen_bitmap);
+ get_user_ex(vm86->cpu_type, &v86->cpu_type);
} get_user_catch(err);
if (err)
return err;

- if (copy_from_user(&info->int_revectored, &v86->int_revectored,
+ if (copy_from_user(&vm86->int_revectored, &v86->int_revectored,
sizeof(struct revectored_struct)))
return -EFAULT;
- if (copy_from_user(&info->int21_revectored, &v86->int21_revectored,
+ if (copy_from_user(&vm86->int21_revectored, &v86->int21_revectored,
sizeof(struct revectored_struct)))
return -EFAULT;
if (plus) {
- if (copy_from_user(&info->vm86plus, &v86->vm86plus,
+ if (copy_from_user(&vm86->vm86plus, &v86->vm86plus,
sizeof(struct vm86plus_info_struct)))
return -EFAULT;
- info->vm86plus.is_vm86pus = 1;
- }
+ vm86->vm86plus.is_vm86pus = 1;
+ } else
+ memset(&vm86->vm86plus, 0,
+ sizeof(struct vm86plus_info_struct));

info->regs32 = current_pt_regs();
vm86->vm86_info = v86;
@@ -318,7 +319,7 @@ static long do_sys_vm86(struct vm86plus_struct __user *v86, bool plus,

info->regs.pt.orig_ax = info->regs32->orig_ax;

- switch (info->cpu_type) {
+ switch (vm86->cpu_type) {
case CPU_286:
vm86->v86mask = 0;
break;
@@ -347,8 +348,7 @@ static long do_sys_vm86(struct vm86plus_struct __user *v86, bool plus,
load_sp0(tss, &tsk->thread);
put_cpu();

- vm86->screen_bitmap = info->screen_bitmap;
- if (info->flags & VM86_SCREEN_BITMAP)
+ if (vm86->flags & VM86_SCREEN_BITMAP)
mark_screen_rdonly(tsk->mm);

/*call __audit_syscall_exit since we do not exit via the normal paths */
@@ -540,12 +540,13 @@ static void do_int(struct kernel_vm86_regs *regs, int i,
{
unsigned long __user *intr_ptr;
unsigned long segoffs;
+ struct kernel_vm86_info *vm86 = current->thread.vm86;

if (regs->pt.cs == BIOSSEG)
goto cannot_handle;
- if (is_revectored(i, &KVM86->int_revectored))
+ if (is_revectored(i, &vm86->int_revectored))
goto cannot_handle;
- if (i == 0x21 && is_revectored(AH(regs), &KVM86->int21_revectored))
+ if (i == 0x21 && is_revectored(AH(regs), &vm86->int21_revectored))
goto cannot_handle;
intr_ptr = (unsigned long __user *) (i << 2);
if (get_user(segoffs, intr_ptr))
@@ -569,7 +570,7 @@ cannot_handle:

int handle_vm86_trap(struct kernel_vm86_regs *regs, long error_code, int trapno)
{
- if (VMPI.is_vm86pus) {
+ if (current->thread.vm86->vm86plus.is_vm86pus) {
if ((trapno == 3) || (trapno == 1)) {
KVM86->regs32->ax = VM86_TRAP + (trapno << 8);
/* setting this flag forces the code in entry_32.S to
@@ -596,12 +597,13 @@ void handle_vm86_fault(struct kernel_vm86_regs *regs, long error_code)
unsigned char __user *ssp;
unsigned short ip, sp, orig_flags;
int data32, pref_done;
+ struct vm86plus_info_struct *vmpi = &current->thread.vm86->vm86plus;

#define CHECK_IF_IN_TRAP \
- if (VMPI.vm86dbg_active && VMPI.vm86dbg_TFpendig) \
+ if (vmpi->vm86dbg_active && vmpi->vm86dbg_TFpendig) \
newflags |= X86_EFLAGS_TF
#define VM86_FAULT_RETURN do { \
- if (VMPI.force_return_for_pic && (VEFLAGS & (X86_EFLAGS_IF | X86_EFLAGS_VIF))) \
+ if (vmpi->force_return_for_pic && (VEFLAGS & (X86_EFLAGS_IF | X86_EFLAGS_VIF))) \
return_to_32bit(regs, VM86_PICRETURN); \
if (orig_flags & X86_EFLAGS_TF) \
handle_vm86_trap(regs, 0, 1); \
@@ -671,8 +673,8 @@ void handle_vm86_fault(struct kernel_vm86_regs *regs, long error_code)
case 0xcd: {
int intno = popb(csp, ip, simulate_sigsegv);
IP(regs) = ip;
- if (VMPI.vm86dbg_active) {
- if ((1 << (intno & 7)) & VMPI.vm86dbg_intxxtab[intno >> 3])
+ if (vmpi->vm86dbg_active) {
+ if ((1 << (intno & 7)) & vmpi->vm86dbg_intxxtab[intno >> 3])
return_to_32bit(regs, VM86_INTx + (intno << 8));
}
do_int(regs, intno, ssp, sp);
--
2.4.3

2015-07-20 01:09:27

by Brian Gerst

[permalink] [raw]
Subject: [PATCH 6/7] x86/vm86: Eliminate kernel_vm86_struct

Now there is no vm86-specific data left on the kernel stack while in
userspace, except for the 32-bit regs.

Signed-off-by: Brian Gerst <[email protected]>
---
arch/x86/include/asm/vm86.h | 25 +-----------
arch/x86/kernel/vm86_32.c | 93 +++++++++++++++++++--------------------------
2 files changed, 41 insertions(+), 77 deletions(-)

diff --git a/arch/x86/include/asm/vm86.h b/arch/x86/include/asm/vm86.h
index ff64c27..1740655 100644
--- a/arch/x86/include/asm/vm86.h
+++ b/arch/x86/include/asm/vm86.h
@@ -28,32 +28,9 @@ struct kernel_vm86_regs {
unsigned short gs, __gsh;
};

-struct kernel_vm86_struct {
- struct kernel_vm86_regs regs;
-/*
- * the below part remains on the kernel stack while we are in VM86 mode.
- * 'tss.esp0' then contains the address of VM86_TSS_ESP0 below, and when we
- * get forced back from VM86, the CPU and "SAVE_ALL" will restore the above
- * 'struct kernel_vm86_regs' with the then actual values.
- * Therefore, pt_regs in fact points to a complete 'kernel_vm86_struct'
- * in kernelspace, hence we need not reget the data from userspace.
- */
-#define VM86_TSS_ESP0 regs32
- struct pt_regs *regs32; /* here we save the pointer to the old regs */
-/*
- * The below is not part of the structure, but the stack layout continues
- * this way. In front of 'return-eip' may be some data, depending on
- * compilation, so we don't rely on this and save the pointer to 'oldregs'
- * in 'regs32' above.
- * However, with GCC-2.7.2 and the current CFLAGS you see exactly this:
-
- long return-eip; from call to vm86()
- struct pt_regs oldregs; user space registers as saved by syscall
- */
-};
-
struct kernel_vm86_info {
struct vm86plus_struct __user *vm86_info;
+ struct pt_regs *regs32;
unsigned long v86flags;
unsigned long v86mask;
unsigned long saved_sp0;
diff --git a/arch/x86/kernel/vm86_32.c b/arch/x86/kernel/vm86_32.c
index bfaa59b..f61c77e 100644
--- a/arch/x86/kernel/vm86_32.c
+++ b/arch/x86/kernel/vm86_32.c
@@ -67,9 +67,6 @@
*/


-#define KVM86 ((struct kernel_vm86_struct *)regs)
-
-
/*
* 8- and 16-bit register defines..
*/
@@ -152,7 +149,7 @@ struct pt_regs *save_v86_state(struct kernel_vm86_regs *regs)
vm86->saved_sp0 = 0;
put_cpu();

- ret = KVM86->regs32;
+ ret = vm86->regs32;

lazy_load_gs(ret->gs);

@@ -194,29 +191,16 @@ out:


static int do_vm86_irq_handling(int subfunction, int irqnumber);
-static long do_sys_vm86(struct vm86plus_struct __user *v86, bool plus,
- struct kernel_vm86_struct *info);
+static long do_sys_vm86(struct vm86plus_struct __user *v86, bool plus);

SYSCALL_DEFINE1(vm86old, struct vm86_struct __user *, v86)
{
- struct kernel_vm86_struct info; /* declare this _on top_,
- * this avoids wasting of stack space.
- * This remains on the stack until we
- * return to 32 bit user space.
- */
-
- return do_sys_vm86((struct vm86plus_struct __user *) v86, false, &info);
+ return do_sys_vm86((struct vm86plus_struct __user *) v86, false);
}


SYSCALL_DEFINE2(vm86, unsigned long, cmd, unsigned long, arg)
{
- struct kernel_vm86_struct info; /* declare this _on top_,
- * this avoids wasting of stack space.
- * This remains on the stack until we
- * return to 32 bit user space.
- */
-
switch (cmd) {
case VM86_REQUEST_IRQ:
case VM86_FREE_IRQ:
@@ -234,16 +218,17 @@ SYSCALL_DEFINE2(vm86, unsigned long, cmd, unsigned long, arg)
}

/* we come here only for functions VM86_ENTER, VM86_ENTER_NO_BYPASS */
- return do_sys_vm86((struct vm86plus_struct __user *) arg, true, &info);
+ return do_sys_vm86((struct vm86plus_struct __user *) arg, true);
}


-static long do_sys_vm86(struct vm86plus_struct __user *v86, bool plus,
- struct kernel_vm86_struct *info)
+static long do_sys_vm86(struct vm86plus_struct __user *v86, bool plus)
{
struct tss_struct *tss;
struct task_struct *tsk = current;
struct kernel_vm86_info *vm86 = tsk->thread.vm86;
+ struct kernel_vm86_regs vm86regs;
+ struct pt_regs *regs32 = current_pt_regs();
unsigned long err = 0;

if (!vm86)
@@ -260,27 +245,27 @@ static long do_sys_vm86(struct vm86plus_struct __user *v86, bool plus,
sizeof(struct vm86plus_struct)))
return -EFAULT;

- memset(info, 0, sizeof(*info));
+ memset(&vm86regs, 0, sizeof(vm86regs));
get_user_try {
unsigned short seg;
- get_user_ex(info->regs.pt.bx, &v86->regs.ebx);
- get_user_ex(info->regs.pt.cx, &v86->regs.ecx);
- get_user_ex(info->regs.pt.dx, &v86->regs.edx);
- get_user_ex(info->regs.pt.si, &v86->regs.esi);
- get_user_ex(info->regs.pt.di, &v86->regs.edi);
- get_user_ex(info->regs.pt.bp, &v86->regs.ebp);
- get_user_ex(info->regs.pt.ax, &v86->regs.eax);
- get_user_ex(info->regs.pt.ip, &v86->regs.eip);
+ get_user_ex(vm86regs.pt.bx, &v86->regs.ebx);
+ get_user_ex(vm86regs.pt.cx, &v86->regs.ecx);
+ get_user_ex(vm86regs.pt.dx, &v86->regs.edx);
+ get_user_ex(vm86regs.pt.si, &v86->regs.esi);
+ get_user_ex(vm86regs.pt.di, &v86->regs.edi);
+ get_user_ex(vm86regs.pt.bp, &v86->regs.ebp);
+ get_user_ex(vm86regs.pt.ax, &v86->regs.eax);
+ get_user_ex(vm86regs.pt.ip, &v86->regs.eip);
get_user_ex(seg, &v86->regs.cs);
- info->regs.pt.cs = seg;
- get_user_ex(info->regs.pt.flags, &v86->regs.eflags);
- get_user_ex(info->regs.pt.sp, &v86->regs.esp);
+ vm86regs.pt.cs = seg;
+ get_user_ex(vm86regs.pt.flags, &v86->regs.eflags);
+ get_user_ex(vm86regs.pt.sp, &v86->regs.esp);
get_user_ex(seg, &v86->regs.ss);
- info->regs.pt.ss = seg;
- get_user_ex(info->regs.es, &v86->regs.es);
- get_user_ex(info->regs.ds, &v86->regs.ds);
- get_user_ex(info->regs.fs, &v86->regs.fs);
- get_user_ex(info->regs.gs, &v86->regs.gs);
+ vm86regs.pt.ss = seg;
+ get_user_ex(vm86regs.es, &v86->regs.es);
+ get_user_ex(vm86regs.ds, &v86->regs.ds);
+ get_user_ex(vm86regs.fs, &v86->regs.fs);
+ get_user_ex(vm86regs.gs, &v86->regs.gs);

get_user_ex(vm86->flags, &v86->flags);
get_user_ex(vm86->screen_bitmap, &v86->screen_bitmap);
@@ -303,8 +288,7 @@ static long do_sys_vm86(struct vm86plus_struct __user *v86, bool plus,
} else
memset(&vm86->vm86plus, 0,
sizeof(struct vm86plus_info_struct));
-
- info->regs32 = current_pt_regs();
+ vm86->regs32 = regs32;
vm86->vm86_info = v86;

/*
@@ -312,12 +296,12 @@ static long do_sys_vm86(struct vm86plus_struct __user *v86, bool plus,
* has set it up safely, so this makes sure interrupt etc flags are
* inherited from protected mode.
*/
- VEFLAGS = info->regs.pt.flags;
- info->regs.pt.flags &= SAFE_MASK;
- info->regs.pt.flags |= info->regs32->flags & ~SAFE_MASK;
- info->regs.pt.flags |= X86_VM_MASK;
+ VEFLAGS = vm86regs.pt.flags;
+ vm86regs.pt.flags &= SAFE_MASK;
+ vm86regs.pt.flags |= regs32->flags & ~SAFE_MASK;
+ vm86regs.pt.flags |= X86_VM_MASK;

- info->regs.pt.orig_ax = info->regs32->orig_ax;
+ vm86regs.pt.orig_ax = regs32->orig_ax;

switch (vm86->cpu_type) {
case CPU_286:
@@ -337,12 +321,13 @@ static long do_sys_vm86(struct vm86plus_struct __user *v86, bool plus,
/*
* Save old state, set default return value (%ax) to 0 (VM86_SIGNAL)
*/
- info->regs32->ax = VM86_SIGNAL;
+ regs32->ax = VM86_SIGNAL;
vm86->saved_sp0 = tsk->thread.sp0;
- lazy_save_gs(info->regs32->gs);
+ lazy_save_gs(regs32->gs);

tss = &per_cpu(cpu_tss, get_cpu());
- tsk->thread.sp0 = (unsigned long) &info->VM86_TSS_ESP0;
+ /* Set new sp0 right below 32-bit regs */
+ tsk->thread.sp0 = (unsigned long) regs32;
if (cpu_has_sep)
tsk->thread.sysenter_cs = 0;
load_sp0(tss, &tsk->thread);
@@ -365,7 +350,7 @@ static long do_sys_vm86(struct vm86plus_struct __user *v86, bool plus,
#endif
"jmp resume_userspace"
: /* no outputs */
- :"r" (&info->regs), "r" (task_thread_info(tsk)), "r" (0));
+ :"r" (&vm86regs), "r" (task_thread_info(tsk)), "r" (0));
unreachable(); /* we never return here */
}

@@ -570,12 +555,14 @@ cannot_handle:

int handle_vm86_trap(struct kernel_vm86_regs *regs, long error_code, int trapno)
{
- if (current->thread.vm86->vm86plus.is_vm86pus) {
+ struct kernel_vm86_info *vm86 = current->thread.vm86;
+
+ if (vm86->vm86plus.is_vm86pus) {
if ((trapno == 3) || (trapno == 1)) {
- KVM86->regs32->ax = VM86_TRAP + (trapno << 8);
+ vm86->regs32->ax = VM86_TRAP + (trapno << 8);
/* setting this flag forces the code in entry_32.S to
the path where we call save_v86_state() and change
- the stack pointer to KVM86->regs32 */
+ the stack pointer to regs32 */
set_thread_flag(TIF_NOTIFY_RESUME);
return 0;
}
--
2.4.3

2015-07-20 01:09:47

by Brian Gerst

[permalink] [raw]
Subject: [PATCH 7/7] x86/vm86: Use the normal pt_regs area for vm86

Change to use the normal pt_regs area to enter and exit vm86 mode. This is
done by increasing the padding at the top of the stack to make room for the
extra vm86 segment slots in the IRET frame. It then saves the 32-bit regs
in the off-stack vm86 data, and copies in the vm86 regs. Exiting back to
32-bit mode does the reverse. This allows removing the hacks to jump directly
into the exit asm code due to having to change the stack pointer. Returning
normally from the vm86 syscall and the exception handlers allows things like
ptrace and auditing to work properly.

Signed-off-by: Brian Gerst <[email protected]>
---
arch/x86/entry/entry_32.S | 24 +--------
arch/x86/include/asm/thread_info.h | 11 ++--
arch/x86/include/asm/vm86.h | 6 ++-
arch/x86/kernel/signal.c | 3 ++
arch/x86/kernel/vm86_32.c | 106 +++++++++++++++----------------------
5 files changed, 58 insertions(+), 92 deletions(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index 21dc60a..f940e24 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -525,34 +525,12 @@ work_resched:

work_notifysig: # deal with pending signals and
# notify-resume requests
-#ifdef CONFIG_VM86
- testl $X86_EFLAGS_VM, PT_EFLAGS(%esp)
- movl %esp, %eax
- jnz work_notifysig_v86 # returning to kernel-space or
- # vm86-space
-1:
-#else
- movl %esp, %eax
-#endif
TRACE_IRQS_ON
ENABLE_INTERRUPTS(CLBR_NONE)
- movb PT_CS(%esp), %bl
- andb $SEGMENT_RPL_MASK, %bl
- cmpb $USER_RPL, %bl
- jb resume_kernel
+ movl %esp, %eax
xorl %edx, %edx
call do_notify_resume
jmp resume_userspace
-
-#ifdef CONFIG_VM86
- ALIGN
-work_notifysig_v86:
- pushl %ecx # save ti_flags for do_notify_resume
- call save_v86_state # %eax contains pt_regs pointer
- popl %ecx
- movl %eax, %esp
- jmp 1b
-#endif
END(work_pending)

# perform syscall exit tracing
diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
index 225ee54..fdad5c2 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -27,14 +27,17 @@
* Without this offset, that can result in a page fault. (We are
* careful that, in this case, the value we read doesn't matter.)
*
- * In vm86 mode, the hardware frame is much longer still, but we neither
- * access the extra members from NMI context, nor do we write such a
- * frame at sp0 at all.
+ * In vm86 mode, the hardware frame is much longer still, so add 16
+ * bytes to make room for the real-mode segments.
*
* x86_64 has a fixed-length stack frame.
*/
#ifdef CONFIG_X86_32
-# define TOP_OF_KERNEL_STACK_PADDING 8
+# ifdef CONFIG_VM86
+# define TOP_OF_KERNEL_STACK_PADDING 16
+# else
+# define TOP_OF_KERNEL_STACK_PADDING 8
+# endif
#else
# define TOP_OF_KERNEL_STACK_PADDING 0
#endif
diff --git a/arch/x86/include/asm/vm86.h b/arch/x86/include/asm/vm86.h
index 1740655..8cc4a1a 100644
--- a/arch/x86/include/asm/vm86.h
+++ b/arch/x86/include/asm/vm86.h
@@ -30,7 +30,7 @@ struct kernel_vm86_regs {

struct kernel_vm86_info {
struct vm86plus_struct __user *vm86_info;
- struct pt_regs *regs32;
+ struct pt_regs regs32;
unsigned long v86flags;
unsigned long v86mask;
unsigned long saved_sp0;
@@ -47,7 +47,7 @@ struct kernel_vm86_info {

void handle_vm86_fault(struct kernel_vm86_regs *, long);
int handle_vm86_trap(struct kernel_vm86_regs *, long, int);
-struct pt_regs *save_v86_state(struct kernel_vm86_regs *);
+void save_v86_state(struct kernel_vm86_regs *, int);

struct task_struct;
void release_vm86_irqs(struct task_struct *);
@@ -62,6 +62,8 @@ static inline int handle_vm86_trap(struct kernel_vm86_regs *a, long b, int c)
return 0;
}

+static inline void save_v86_state(struct kernel_vm86_regs *, int) { }
+
#endif /* CONFIG_VM86 */

#endif /* _ASM_X86_VM86_H */
diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c
index 7e88cc7..bfd736e 100644
--- a/arch/x86/kernel/signal.c
+++ b/arch/x86/kernel/signal.c
@@ -635,6 +635,9 @@ handle_signal(struct ksignal *ksig, struct pt_regs *regs)
bool stepping, failed;
struct fpu *fpu = &current->thread.fpu;

+ if (v8086_mode(regs))
+ save_v86_state((struct kernel_vm86_regs *) regs, VM86_SIGNAL);
+
/* Are we from a system call? */
if (syscall_get_nr(current, regs) >= 0) {
/* If so, check system call restarting.. */
diff --git a/arch/x86/kernel/vm86_32.c b/arch/x86/kernel/vm86_32.c
index f61c77e..9fd50ea 100644
--- a/arch/x86/kernel/vm86_32.c
+++ b/arch/x86/kernel/vm86_32.c
@@ -50,6 +50,7 @@
#include <asm/io.h>
#include <asm/tlbflush.h>
#include <asm/irq.h>
+#include <asm/traps.h>

/*
* Known problems:
@@ -87,10 +88,9 @@
#define SAFE_MASK (0xDD5)
#define RETURN_MASK (0xDFF)

-struct pt_regs *save_v86_state(struct kernel_vm86_regs *regs)
+void save_v86_state(struct kernel_vm86_regs *regs, int retval)
{
struct tss_struct *tss;
- struct pt_regs *ret;
struct task_struct *tsk = current;
struct vm86plus_struct __user *user;
struct kernel_vm86_info *vm86 = current->thread.vm86;
@@ -149,11 +149,11 @@ struct pt_regs *save_v86_state(struct kernel_vm86_regs *regs)
vm86->saved_sp0 = 0;
put_cpu();

- ret = vm86->regs32;
+ memcpy(&regs->pt, &vm86->regs32, sizeof(struct pt_regs));

lazy_load_gs(ret->gs);

- return ret;
+ regs->pt.ax = retval;
}

static void mark_screen_rdonly(struct mm_struct *mm)
@@ -228,7 +228,7 @@ static long do_sys_vm86(struct vm86plus_struct __user *v86, bool plus)
struct task_struct *tsk = current;
struct kernel_vm86_info *vm86 = tsk->thread.vm86;
struct kernel_vm86_regs vm86regs;
- struct pt_regs *regs32 = current_pt_regs();
+ struct pt_regs *regs = current_pt_regs();
unsigned long err = 0;

if (!vm86)
@@ -288,7 +288,8 @@ static long do_sys_vm86(struct vm86plus_struct __user *v86, bool plus)
} else
memset(&vm86->vm86plus, 0,
sizeof(struct vm86plus_info_struct));
- vm86->regs32 = regs32;
+
+ memcpy(&vm86->regs32, regs, sizeof(struct pt_regs));
vm86->vm86_info = v86;

/*
@@ -298,10 +299,10 @@ static long do_sys_vm86(struct vm86plus_struct __user *v86, bool plus)
*/
VEFLAGS = vm86regs.pt.flags;
vm86regs.pt.flags &= SAFE_MASK;
- vm86regs.pt.flags |= regs32->flags & ~SAFE_MASK;
+ vm86regs.pt.flags |= regs->flags & ~SAFE_MASK;
vm86regs.pt.flags |= X86_VM_MASK;

- vm86regs.pt.orig_ax = regs32->orig_ax;
+ vm86regs.pt.orig_ax = regs->orig_ax;

switch (vm86->cpu_type) {
case CPU_286:
@@ -319,15 +320,14 @@ static long do_sys_vm86(struct vm86plus_struct __user *v86, bool plus)
}

/*
- * Save old state, set default return value (%ax) to 0 (VM86_SIGNAL)
+ * Save old state
*/
- regs32->ax = VM86_SIGNAL;
vm86->saved_sp0 = tsk->thread.sp0;
lazy_save_gs(regs32->gs);

tss = &per_cpu(cpu_tss, get_cpu());
- /* Set new sp0 right below 32-bit regs */
- tsk->thread.sp0 = (unsigned long) regs32;
+ /* make room for real-mode segments */
+ tsk->thread.sp0 += 16;
if (cpu_has_sep)
tsk->thread.sysenter_cs = 0;
load_sp0(tss, &tsk->thread);
@@ -336,41 +336,14 @@ static long do_sys_vm86(struct vm86plus_struct __user *v86, bool plus)
if (vm86->flags & VM86_SCREEN_BITMAP)
mark_screen_rdonly(tsk->mm);

- /*call __audit_syscall_exit since we do not exit via the normal paths */
-#ifdef CONFIG_AUDITSYSCALL
- if (unlikely(current->audit_context))
- __audit_syscall_exit(1, 0);
-#endif
-
- __asm__ __volatile__(
- "movl %0,%%esp\n\t"
- "movl %1,%%ebp\n\t"
-#ifdef CONFIG_X86_32_LAZY_GS
- "mov %2, %%gs\n\t"
-#endif
- "jmp resume_userspace"
- : /* no outputs */
- :"r" (&vm86regs), "r" (task_thread_info(tsk)), "r" (0));
- unreachable(); /* we never return here */
-}
-
-static inline void return_to_32bit(struct kernel_vm86_regs *regs16, int retval)
-{
- struct pt_regs *regs32;
-
- regs32 = save_v86_state(regs16);
- regs32->ax = retval;
- __asm__ __volatile__("movl %0,%%esp\n\t"
- "movl %1,%%ebp\n\t"
- "jmp resume_userspace"
- : : "r" (regs32), "r" (current_thread_info()));
+ memcpy((struct kernel_vm86_regs *)regs, &vm86regs, sizeof(vm86regs));
+ force_iret();
+ return regs->ax;
}

static inline void set_IF(struct kernel_vm86_regs *regs)
{
VEFLAGS |= X86_EFLAGS_VIF;
- if (VEFLAGS & X86_EFLAGS_VIP)
- return_to_32bit(regs, VM86_STI);
}

static inline void clear_IF(struct kernel_vm86_regs *regs)
@@ -550,7 +523,7 @@ static void do_int(struct kernel_vm86_regs *regs, int i,
return;

cannot_handle:
- return_to_32bit(regs, VM86_INTx + (i << 8));
+ save_v86_state(regs, VM86_INTx + (i << 8));
}

int handle_vm86_trap(struct kernel_vm86_regs *regs, long error_code, int trapno)
@@ -559,11 +532,7 @@ int handle_vm86_trap(struct kernel_vm86_regs *regs, long error_code, int trapno)

if (vm86->vm86plus.is_vm86pus) {
if ((trapno == 3) || (trapno == 1)) {
- vm86->regs32->ax = VM86_TRAP + (trapno << 8);
- /* setting this flag forces the code in entry_32.S to
- the path where we call save_v86_state() and change
- the stack pointer to regs32 */
- set_thread_flag(TIF_NOTIFY_RESUME);
+ save_v86_state(regs, VM86_TRAP + (trapno << 8));
return 0;
}
do_int(regs, trapno, (unsigned char __user *) (regs->pt.ss << 4), SP(regs));
@@ -589,12 +558,6 @@ void handle_vm86_fault(struct kernel_vm86_regs *regs, long error_code)
#define CHECK_IF_IN_TRAP \
if (vmpi->vm86dbg_active && vmpi->vm86dbg_TFpendig) \
newflags |= X86_EFLAGS_TF
-#define VM86_FAULT_RETURN do { \
- if (vmpi->force_return_for_pic && (VEFLAGS & (X86_EFLAGS_IF | X86_EFLAGS_VIF))) \
- return_to_32bit(regs, VM86_PICRETURN); \
- if (orig_flags & X86_EFLAGS_TF) \
- handle_vm86_trap(regs, 0, 1); \
- return; } while (0)

orig_flags = *(unsigned short *)&regs->pt.flags;

@@ -633,7 +596,7 @@ void handle_vm86_fault(struct kernel_vm86_regs *regs, long error_code)
SP(regs) -= 2;
}
IP(regs) = ip;
- VM86_FAULT_RETURN;
+ goto vm86_fault_return;

/* popf */
case 0x9d:
@@ -653,7 +616,7 @@ void handle_vm86_fault(struct kernel_vm86_regs *regs, long error_code)
else
set_vflags_short(newflags, regs);

- VM86_FAULT_RETURN;
+ goto check_vip;
}

/* int xx */
@@ -661,8 +624,10 @@ void handle_vm86_fault(struct kernel_vm86_regs *regs, long error_code)
int intno = popb(csp, ip, simulate_sigsegv);
IP(regs) = ip;
if (vmpi->vm86dbg_active) {
- if ((1 << (intno & 7)) & vmpi->vm86dbg_intxxtab[intno >> 3])
- return_to_32bit(regs, VM86_INTx + (intno << 8));
+ if ((1 << (intno & 7)) & vmpi->vm86dbg_intxxtab[intno >> 3]) {
+ save_v86_state(regs, VM86_INTx + (intno << 8));
+ return;
+ }
}
do_int(regs, intno, ssp, sp);
return;
@@ -693,14 +658,14 @@ void handle_vm86_fault(struct kernel_vm86_regs *regs, long error_code)
} else {
set_vflags_short(newflags, regs);
}
- VM86_FAULT_RETURN;
+ goto check_vip;
}

/* cli */
case 0xfa:
IP(regs) = ip;
clear_IF(regs);
- VM86_FAULT_RETURN;
+ goto vm86_fault_return;

/* sti */
/*
@@ -712,12 +677,27 @@ void handle_vm86_fault(struct kernel_vm86_regs *regs, long error_code)
case 0xfb:
IP(regs) = ip;
set_IF(regs);
- VM86_FAULT_RETURN;
+ goto check_vip;

default:
- return_to_32bit(regs, VM86_UNKNOWN);
+ save_v86_state(regs, VM86_UNKNOWN);
+ }
+
+ return;
+
+check_vip:
+ if (VEFLAGS & X86_EFLAGS_VIP) {
+ save_v86_state(regs, VM86_STI);
+ return;
}

+vm86_fault_return:
+ if (vmpi->force_return_for_pic && (VEFLAGS & (X86_EFLAGS_IF | X86_EFLAGS_VIF))) {
+ save_v86_state(regs, VM86_PICRETURN);
+ return;
+ }
+ if (orig_flags & X86_EFLAGS_TF)
+ handle_vm86_trap(regs, 0, X86_TRAP_DB);
return;

simulate_sigsegv:
@@ -731,7 +711,7 @@ simulate_sigsegv:
* should be a mixture of the two, but how do we
* get the information? [KD]
*/
- return_to_32bit(regs, VM86_UNKNOWN);
+ save_v86_state(regs, VM86_UNKNOWN);
}

/* ---------------- vm86 special IRQ passing stuff ----------------- */
--
2.4.3

2015-07-21 06:29:08

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 4/7] x86/vm86: Move vm86 fields out of thread_struct


* Brian Gerst <[email protected]> wrote:

> Allocate a separate structure for the vm86 fields.

Why is this allocated dynamically? This structure is not very large, and a hole in
thread_struct isn't that big of an issue - compared to additional fragility
introduced by the (mostly untested by normal apps) dynamic allocation here ...

I don't mind the introduction of the sub-structure in itself, but please embedd it
in thread_struct.

Thanks,

Ingo

2015-07-21 06:31:10

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH 4/7] x86/vm86: Move vm86 fields out of thread_struct

On Mon, Jul 20, 2015 at 11:28 PM, Ingo Molnar <[email protected]> wrote:
>
> * Brian Gerst <[email protected]> wrote:
>
>> Allocate a separate structure for the vm86 fields.
>
> Why is this allocated dynamically? This structure is not very large, and a hole in
> thread_struct isn't that big of an issue - compared to additional fragility
> introduced by the (mostly untested by normal apps) dynamic allocation here ...
>
> I don't mind the introduction of the sub-structure in itself, but please embedd it
> in thread_struct.
>

This ends up being several hundred bytes, I think, due to including an
entire struct pt_regs. Do we really want to enlarge thread_struct
that much for something that's essentially never used?

--Andy

2015-07-21 06:32:28

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 4/7] x86/vm86: Move vm86 fields out of thread_struct


* Ingo Molnar <[email protected]> wrote:

> * Brian Gerst <[email protected]> wrote:
>
> > Allocate a separate structure for the vm86 fields.
>
> Why is this allocated dynamically? This structure is not very large, and a hole
> in thread_struct isn't that big of an issue - compared to additional fragility
> introduced by the (mostly untested by normal apps) dynamic allocation here ...
>
> I don't mind the introduction of the sub-structure in itself, but please embedd
> it in thread_struct.

Otherwise I have no objections to the rest of the series, nice fixes!

Thanks,

Ingo

2015-07-21 06:35:16

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 4/7] x86/vm86: Move vm86 fields out of thread_struct


* Andy Lutomirski <[email protected]> wrote:

> On Mon, Jul 20, 2015 at 11:28 PM, Ingo Molnar <[email protected]> wrote:
> >
> > * Brian Gerst <[email protected]> wrote:
> >
> >> Allocate a separate structure for the vm86 fields.
> >
> > Why is this allocated dynamically? This structure is not very large, and a
> > hole in thread_struct isn't that big of an issue - compared to additional
> > fragility introduced by the (mostly untested by normal apps) dynamic
> > allocation here ...
> >
> > I don't mind the introduction of the sub-structure in itself, but please
> > embedd it in thread_struct.
>
> This ends up being several hundred bytes, I think, due to including an entire
> struct pt_regs. Do we really want to enlarge thread_struct that much for
> something that's essentially never used?

Ok, I only judged by the first patch, I did not realize it becomes that much
larger in later patches.

I've extended the changelog to explain this properly.

Thanks,

Ingo

2015-07-21 07:12:03

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 4/7] x86/vm86: Move vm86 fields out of thread_struct


* Brian Gerst <[email protected]> wrote:

> --- a/arch/x86/kernel/process.c
> +++ b/arch/x86/kernel/process.c
> @@ -110,6 +110,13 @@ void exit_thread(void)
> kfree(bp);
> }
>
> +#ifdef CONFIG_VM86
> + if (t->vm86) {
> + kfree(t->vm86);
> + t->vm86 = NULL;
> + }
> +#endif

This should be a helper:

vm86__free(t->vm86)

or so.

> diff --git a/arch/x86/kernel/vm86_32.c b/arch/x86/kernel/vm86_32.c
> index e6c2b47..dce0a1c 100644
> --- a/arch/x86/kernel/vm86_32.c
> +++ b/arch/x86/kernel/vm86_32.c

> @@ -242,9 +244,16 @@ static long do_sys_vm86(struct vm86plus_struct __user *v86, bool plus,
> {
> struct tss_struct *tss;
> struct task_struct *tsk = current;
> + struct kernel_vm86_info *vm86 = tsk->thread.vm86;
> unsigned long err = 0;
>
> - if (tsk->thread.saved_sp0)
> + if (!vm86)
> + {

Non-standard style.

> + if (!(vm86 = kzalloc(sizeof(*vm86), GFP_KERNEL)))
> + return -ENOMEM;
> + tsk->thread.vm86 = vm86;
> + }
> + if (vm86->saved_sp0)
> return -EPERM;


Btw., the variable names here are crazy. I had to look twice to realize that we
have 'v86' and 'vm86' which are two different things.

Also, vm86plus_struct variables and fields are named wildly inconsistently:
sometimes it's 'vm86.vm86_info', sometimes it's 'v86', sometimes 'user'. Ugh.

Other fields have naming inconsistencies as well: for example we have
thread.vm86->vm86plus.vm86dbg_active. 'vm86' is repeated _three_ times in that
name, for no good reason.

So please clean up the naming to make this all easier to read. Only the highest
level field should have 'vm86' in it - all subsequent fields will inherit that
name one way or another.

At a quick glance I'd do the following renames:

struct kernel_vm86_info *vm86; => struct vm86 *vm86;

- it's obviously 'information' so the _info is superfluous.

- and it's obviously embedded in a kernel structure, so the kernel_ is
superfluous as well.

Then let's look at other fields of the main structure:

struct kernel_vm86_info {
struct vm86plus_struct __user *vm86_info;
struct pt_regs regs32;
unsigned long v86flags;
unsigned long v86mask;
unsigned long saved_sp0;

unsigned long flags;
unsigned long screen_bitmap;
unsigned long cpu_type;
struct revectored_struct int_revectored;
struct revectored_struct int21_revectored;
struct vm86plus_info_struct vm86plus;
};

- Why is there a vm86.flags and a vm86.v86flags field? What's the difference
and can we eliminate the confusion factor?

- The fields flags..vm86plus seems to be an as-is copy of 'struct
vm86plus_struct'. Could this be organized in a smarter fashion?.

- 'struct vm86_regs' appears to be an as-is copy of 32-bit pt_regs, plus:

unsigned short es, __esh;
unsigned short ds, __dsh;
unsigned short fs, __fsh;
unsigned short gs, __gsh;

Instead of a slightly different structure copying pt_regs, why not express it
as:

struct vm86_regs {
struct pt_regs regs;

unsigned short es, __esh;
unsigned short ds, __dsh;
unsigned short fs, __fsh;
unsigned short gs, __gsh;
};

?

- There's a number of 'long' fields which are always 32-bit, which is pretty
confusing even if it's only ever built on 32-bit kerenls, can we use
u8/u16/u32/u64 for ABI components instead?

Thanks,

Ingo

Subject: [tip:x86/asm] x86/entry/vm86: Clean up saved_fs/gs

Commit-ID: 0233606ce5cf12c1a0e27cb197066ea5bc2bb488
Gitweb: http://git.kernel.org/tip/0233606ce5cf12c1a0e27cb197066ea5bc2bb488
Author: Brian Gerst <[email protected]>
AuthorDate: Sun, 19 Jul 2015 21:09:04 -0400
Committer: Ingo Molnar <[email protected]>
CommitDate: Tue, 21 Jul 2015 09:12:23 +0200

x86/entry/vm86: Clean up saved_fs/gs

There is no need to save FS and non-lazy GS outside the 32-bit
regs. Lazy GS still needs to be saved because it wasn't saved
on syscall entry. Save it in the gs slot of regs32, which is
present but unused.

Signed-off-by: Brian Gerst <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/include/asm/processor.h | 2 --
arch/x86/kernel/vm86_32.c | 6 ++----
2 files changed, 2 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 43e6519..f4e4e3f 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -410,8 +410,6 @@ struct thread_struct {
unsigned long v86flags;
unsigned long v86mask;
unsigned long saved_sp0;
- unsigned int saved_fs;
- unsigned int saved_gs;
#endif
/* IO permissions: */
unsigned long *io_bitmap_ptr;
diff --git a/arch/x86/kernel/vm86_32.c b/arch/x86/kernel/vm86_32.c
index fc9db6e..761a2f9 100644
--- a/arch/x86/kernel/vm86_32.c
+++ b/arch/x86/kernel/vm86_32.c
@@ -159,8 +159,7 @@ struct pt_regs *save_v86_state(struct kernel_vm86_regs *regs)

ret = KVM86->regs32;

- ret->fs = current->thread.saved_fs;
- set_user_gs(ret, current->thread.saved_gs);
+ lazy_load_gs(ret->gs);

return ret;
}
@@ -315,8 +314,7 @@ static void do_sys_vm86(struct kernel_vm86_struct *info, struct task_struct *tsk
*/
info->regs32->ax = VM86_SIGNAL;
tsk->thread.saved_sp0 = tsk->thread.sp0;
- tsk->thread.saved_fs = info->regs32->fs;
- tsk->thread.saved_gs = get_user_gs(info->regs32);
+ lazy_save_gs(info->regs32->gs);

tss = &per_cpu(cpu_tss, get_cpu());
tsk->thread.sp0 = (unsigned long) &info->VM86_TSS_ESP0;

Subject: [tip:x86/asm] x86/entry/vm86: Preserve 'orig_ax'

Commit-ID: df1ae9a5dc66d9fd57109240042372b1065d984a
Gitweb: http://git.kernel.org/tip/df1ae9a5dc66d9fd57109240042372b1065d984a
Author: Brian Gerst <[email protected]>
AuthorDate: Sun, 19 Jul 2015 21:09:05 -0400
Committer: Ingo Molnar <[email protected]>
CommitDate: Tue, 21 Jul 2015 09:12:23 +0200

x86/entry/vm86: Preserve 'orig_ax'

There is no legitimate reason for usermode to modify the 'orig_ax'
field on entry to vm86 mode, so copy it from the 32-bit regs.

Signed-off-by: Brian Gerst <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/kernel/vm86_32.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/arch/x86/kernel/vm86_32.c b/arch/x86/kernel/vm86_32.c
index 761a2f9..9a2dc80 100644
--- a/arch/x86/kernel/vm86_32.c
+++ b/arch/x86/kernel/vm86_32.c
@@ -294,6 +294,8 @@ static void do_sys_vm86(struct kernel_vm86_struct *info, struct task_struct *tsk
info->regs.pt.flags |= info->regs32->flags & ~SAFE_MASK;
info->regs.pt.flags |= X86_VM_MASK;

+ info->regs.pt.orig_ax = info->regs32->orig_ax;
+
switch (info->cpu_type) {
case CPU_286:
tsk->thread.v86mask = 0;

Subject: [tip:x86/asm] x86/entry/vm86: Move userspace accesses to do_sys_vm86()

Commit-ID: ed0b2edb61ba4e557de759093d965654186f28b2
Gitweb: http://git.kernel.org/tip/ed0b2edb61ba4e557de759093d965654186f28b2
Author: Brian Gerst <[email protected]>
AuthorDate: Sun, 19 Jul 2015 21:09:06 -0400
Committer: Ingo Molnar <[email protected]>
CommitDate: Tue, 21 Jul 2015 09:12:24 +0200

x86/entry/vm86: Move userspace accesses to do_sys_vm86()

Move the userspace accesses down into the common function in
preparation for the next set of patches. Also change to copying
the fields explicitly instead of assuming a fixed order in
pt_regs and the kernel data structures.

Signed-off-by: Brian Gerst <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/include/asm/processor.h | 2 +-
arch/x86/kernel/vm86_32.c | 189 +++++++++++++++++++++------------------
2 files changed, 102 insertions(+), 89 deletions(-)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index f4e4e3f..35ad554 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -405,7 +405,7 @@ struct thread_struct {
unsigned long error_code;
#ifdef CONFIG_X86_32
/* Virtual 86 mode info */
- struct vm86_struct __user *vm86_info;
+ struct vm86plus_struct __user *vm86_info;
unsigned long screen_bitmap;
unsigned long v86flags;
unsigned long v86mask;
diff --git a/arch/x86/kernel/vm86_32.c b/arch/x86/kernel/vm86_32.c
index 9a2dc80..e6c2b47 100644
--- a/arch/x86/kernel/vm86_32.c
+++ b/arch/x86/kernel/vm86_32.c
@@ -90,46 +90,13 @@
#define SAFE_MASK (0xDD5)
#define RETURN_MASK (0xDFF)

-/* convert kernel_vm86_regs to vm86_regs */
-static int copy_vm86_regs_to_user(struct vm86_regs __user *user,
- const struct kernel_vm86_regs *regs)
-{
- int ret = 0;
-
- /*
- * kernel_vm86_regs is missing gs, so copy everything up to
- * (but not including) orig_eax, and then rest including orig_eax.
- */
- ret += copy_to_user(user, regs, offsetof(struct kernel_vm86_regs, pt.orig_ax));
- ret += copy_to_user(&user->orig_eax, &regs->pt.orig_ax,
- sizeof(struct kernel_vm86_regs) -
- offsetof(struct kernel_vm86_regs, pt.orig_ax));
-
- return ret;
-}
-
-/* convert vm86_regs to kernel_vm86_regs */
-static int copy_vm86_regs_from_user(struct kernel_vm86_regs *regs,
- const struct vm86_regs __user *user,
- unsigned extra)
-{
- int ret = 0;
-
- /* copy ax-fs inclusive */
- ret += copy_from_user(regs, user, offsetof(struct kernel_vm86_regs, pt.orig_ax));
- /* copy orig_ax-__gsh+extra */
- ret += copy_from_user(&regs->pt.orig_ax, &user->orig_eax,
- sizeof(struct kernel_vm86_regs) -
- offsetof(struct kernel_vm86_regs, pt.orig_ax) +
- extra);
- return ret;
-}
-
struct pt_regs *save_v86_state(struct kernel_vm86_regs *regs)
{
struct tss_struct *tss;
struct pt_regs *ret;
- unsigned long tmp;
+ struct task_struct *tsk = current;
+ struct vm86plus_struct __user *user;
+ long err = 0;

/*
* This gets called from entry.S with interrupts disabled, but
@@ -138,23 +105,50 @@ struct pt_regs *save_v86_state(struct kernel_vm86_regs *regs)
*/
local_irq_enable();

- if (!current->thread.vm86_info) {
+ if (!tsk->thread.vm86_info) {
pr_alert("no vm86_info: BAD\n");
do_exit(SIGSEGV);
}
- set_flags(regs->pt.flags, VEFLAGS, X86_EFLAGS_VIF | current->thread.v86mask);
- tmp = copy_vm86_regs_to_user(&current->thread.vm86_info->regs, regs);
- tmp += put_user(current->thread.screen_bitmap, &current->thread.vm86_info->screen_bitmap);
- if (tmp) {
+ set_flags(regs->pt.flags, VEFLAGS, X86_EFLAGS_VIF | tsk->thread.v86mask);
+ user = tsk->thread.vm86_info;
+
+ if (!access_ok(VERIFY_WRITE, user, VMPI.is_vm86pus ?
+ sizeof(struct vm86plus_struct) :
+ sizeof(struct vm86_struct))) {
+ pr_alert("could not access userspace vm86_info\n");
+ do_exit(SIGSEGV);
+ }
+
+ put_user_try {
+ put_user_ex(regs->pt.bx, &user->regs.ebx);
+ put_user_ex(regs->pt.cx, &user->regs.ecx);
+ put_user_ex(regs->pt.dx, &user->regs.edx);
+ put_user_ex(regs->pt.si, &user->regs.esi);
+ put_user_ex(regs->pt.di, &user->regs.edi);
+ put_user_ex(regs->pt.bp, &user->regs.ebp);
+ put_user_ex(regs->pt.ax, &user->regs.eax);
+ put_user_ex(regs->pt.ip, &user->regs.eip);
+ put_user_ex(regs->pt.cs, &user->regs.cs);
+ put_user_ex(regs->pt.flags, &user->regs.eflags);
+ put_user_ex(regs->pt.sp, &user->regs.esp);
+ put_user_ex(regs->pt.ss, &user->regs.ss);
+ put_user_ex(regs->es, &user->regs.es);
+ put_user_ex(regs->ds, &user->regs.ds);
+ put_user_ex(regs->fs, &user->regs.fs);
+ put_user_ex(regs->gs, &user->regs.gs);
+
+ put_user_ex(tsk->thread.screen_bitmap, &user->screen_bitmap);
+ } put_user_catch(err);
+ if (err) {
pr_alert("could not access userspace vm86_info\n");
do_exit(SIGSEGV);
}

tss = &per_cpu(cpu_tss, get_cpu());
- current->thread.sp0 = current->thread.saved_sp0;
- current->thread.sysenter_cs = __KERNEL_CS;
- load_sp0(tss, &current->thread);
- current->thread.saved_sp0 = 0;
+ tsk->thread.sp0 = tsk->thread.saved_sp0;
+ tsk->thread.sysenter_cs = __KERNEL_CS;
+ load_sp0(tss, &tsk->thread);
+ tsk->thread.saved_sp0 = 0;
put_cpu();

ret = KVM86->regs32;
@@ -199,7 +193,8 @@ out:


static int do_vm86_irq_handling(int subfunction, int irqnumber);
-static void do_sys_vm86(struct kernel_vm86_struct *info, struct task_struct *tsk);
+static long do_sys_vm86(struct vm86plus_struct __user *v86, bool plus,
+ struct kernel_vm86_struct *info);

SYSCALL_DEFINE1(vm86old, struct vm86_struct __user *, v86)
{
@@ -208,21 +203,8 @@ SYSCALL_DEFINE1(vm86old, struct vm86_struct __user *, v86)
* This remains on the stack until we
* return to 32 bit user space.
*/
- struct task_struct *tsk = current;
- int tmp;

- if (tsk->thread.saved_sp0)
- return -EPERM;
- tmp = copy_vm86_regs_from_user(&info.regs, &v86->regs,
- offsetof(struct kernel_vm86_struct, vm86plus) -
- sizeof(info.regs));
- if (tmp)
- return -EFAULT;
- memset(&info.vm86plus, 0, (int)&info.regs32 - (int)&info.vm86plus);
- info.regs32 = current_pt_regs();
- tsk->thread.vm86_info = v86;
- do_sys_vm86(&info, tsk);
- return 0; /* we never return here */
+ return do_sys_vm86((struct vm86plus_struct __user *) v86, false, &info);
}


@@ -233,11 +215,7 @@ SYSCALL_DEFINE2(vm86, unsigned long, cmd, unsigned long, arg)
* This remains on the stack until we
* return to 32 bit user space.
*/
- struct task_struct *tsk;
- int tmp;
- struct vm86plus_struct __user *v86;

- tsk = current;
switch (cmd) {
case VM86_REQUEST_IRQ:
case VM86_FREE_IRQ:
@@ -255,34 +233,69 @@ SYSCALL_DEFINE2(vm86, unsigned long, cmd, unsigned long, arg)
}

/* we come here only for functions VM86_ENTER, VM86_ENTER_NO_BYPASS */
- if (tsk->thread.saved_sp0)
- return -EPERM;
- v86 = (struct vm86plus_struct __user *)arg;
- tmp = copy_vm86_regs_from_user(&info.regs, &v86->regs,
- offsetof(struct kernel_vm86_struct, regs32) -
- sizeof(info.regs));
- if (tmp)
- return -EFAULT;
- info.regs32 = current_pt_regs();
- info.vm86plus.is_vm86pus = 1;
- tsk->thread.vm86_info = (struct vm86_struct __user *)v86;
- do_sys_vm86(&info, tsk);
- return 0; /* we never return here */
+ return do_sys_vm86((struct vm86plus_struct __user *) arg, true, &info);
}


-static void do_sys_vm86(struct kernel_vm86_struct *info, struct task_struct *tsk)
+static long do_sys_vm86(struct vm86plus_struct __user *v86, bool plus,
+ struct kernel_vm86_struct *info)
{
struct tss_struct *tss;
-/*
- * make sure the vm86() system call doesn't try to do anything silly
- */
- info->regs.pt.ds = 0;
- info->regs.pt.es = 0;
- info->regs.pt.fs = 0;
-#ifndef CONFIG_X86_32_LAZY_GS
- info->regs.pt.gs = 0;
-#endif
+ struct task_struct *tsk = current;
+ unsigned long err = 0;
+
+ if (tsk->thread.saved_sp0)
+ return -EPERM;
+
+ if (!access_ok(VERIFY_READ, v86, plus ?
+ sizeof(struct vm86_struct) :
+ sizeof(struct vm86plus_struct)))
+ return -EFAULT;
+
+ memset(info, 0, sizeof(*info));
+ get_user_try {
+ unsigned short seg;
+ get_user_ex(info->regs.pt.bx, &v86->regs.ebx);
+ get_user_ex(info->regs.pt.cx, &v86->regs.ecx);
+ get_user_ex(info->regs.pt.dx, &v86->regs.edx);
+ get_user_ex(info->regs.pt.si, &v86->regs.esi);
+ get_user_ex(info->regs.pt.di, &v86->regs.edi);
+ get_user_ex(info->regs.pt.bp, &v86->regs.ebp);
+ get_user_ex(info->regs.pt.ax, &v86->regs.eax);
+ get_user_ex(info->regs.pt.ip, &v86->regs.eip);
+ get_user_ex(seg, &v86->regs.cs);
+ info->regs.pt.cs = seg;
+ get_user_ex(info->regs.pt.flags, &v86->regs.eflags);
+ get_user_ex(info->regs.pt.sp, &v86->regs.esp);
+ get_user_ex(seg, &v86->regs.ss);
+ info->regs.pt.ss = seg;
+ get_user_ex(info->regs.es, &v86->regs.es);
+ get_user_ex(info->regs.ds, &v86->regs.ds);
+ get_user_ex(info->regs.fs, &v86->regs.fs);
+ get_user_ex(info->regs.gs, &v86->regs.gs);
+
+ get_user_ex(info->flags, &v86->flags);
+ get_user_ex(info->screen_bitmap, &v86->screen_bitmap);
+ get_user_ex(info->cpu_type, &v86->cpu_type);
+ } get_user_catch(err);
+ if (err)
+ return err;
+
+ if (copy_from_user(&info->int_revectored, &v86->int_revectored,
+ sizeof(struct revectored_struct)))
+ return -EFAULT;
+ if (copy_from_user(&info->int21_revectored, &v86->int21_revectored,
+ sizeof(struct revectored_struct)))
+ return -EFAULT;
+ if (plus) {
+ if (copy_from_user(&info->vm86plus, &v86->vm86plus,
+ sizeof(struct vm86plus_info_struct)))
+ return -EFAULT;
+ info->vm86plus.is_vm86pus = 1;
+ }
+
+ info->regs32 = current_pt_regs();
+ tsk->thread.vm86_info = v86;

/*
* The flags register is also special: we cannot trust that the user
@@ -344,7 +357,7 @@ static void do_sys_vm86(struct kernel_vm86_struct *info, struct task_struct *tsk
"jmp resume_userspace"
: /* no outputs */
:"r" (&info->regs), "r" (task_thread_info(tsk)), "r" (0));
- /* we never return here */
+ unreachable(); /* we never return here */
}

static inline void return_to_32bit(struct kernel_vm86_regs *regs16, int retval)

2015-07-21 15:30:12

by Brian Gerst

[permalink] [raw]
Subject: Re: [PATCH 4/7] x86/vm86: Move vm86 fields out of thread_struct

On Tue, Jul 21, 2015 at 3:11 AM, Ingo Molnar <[email protected]> wrote:
>
> * Brian Gerst <[email protected]> wrote:
>
>> --- a/arch/x86/kernel/process.c
>> +++ b/arch/x86/kernel/process.c
>> @@ -110,6 +110,13 @@ void exit_thread(void)
>> kfree(bp);
>> }
>>
>> +#ifdef CONFIG_VM86
>> + if (t->vm86) {
>> + kfree(t->vm86);
>> + t->vm86 = NULL;
>> + }
>> +#endif
>
> This should be a helper:
>
> vm86__free(t->vm86)
>
> or so.
>
>> diff --git a/arch/x86/kernel/vm86_32.c b/arch/x86/kernel/vm86_32.c
>> index e6c2b47..dce0a1c 100644
>> --- a/arch/x86/kernel/vm86_32.c
>> +++ b/arch/x86/kernel/vm86_32.c
>
>> @@ -242,9 +244,16 @@ static long do_sys_vm86(struct vm86plus_struct __user *v86, bool plus,
>> {
>> struct tss_struct *tss;
>> struct task_struct *tsk = current;
>> + struct kernel_vm86_info *vm86 = tsk->thread.vm86;
>> unsigned long err = 0;
>>
>> - if (tsk->thread.saved_sp0)
>> + if (!vm86)
>> + {
>
> Non-standard style.
>
>> + if (!(vm86 = kzalloc(sizeof(*vm86), GFP_KERNEL)))
>> + return -ENOMEM;
>> + tsk->thread.vm86 = vm86;
>> + }
>> + if (vm86->saved_sp0)
>> return -EPERM;
>
>
> Btw., the variable names here are crazy. I had to look twice to realize that we
> have 'v86' and 'vm86' which are two different things.
>
> Also, vm86plus_struct variables and fields are named wildly inconsistently:
> sometimes it's 'vm86.vm86_info', sometimes it's 'v86', sometimes 'user'. Ugh.
>
> Other fields have naming inconsistencies as well: for example we have
> thread.vm86->vm86plus.vm86dbg_active. 'vm86' is repeated _three_ times in that
> name, for no good reason.
>
> So please clean up the naming to make this all easier to read. Only the highest
> level field should have 'vm86' in it - all subsequent fields will inherit that
> name one way or another.

Some of these field names are visible to userspace and can't change.

> At a quick glance I'd do the following renames:
>
> struct kernel_vm86_info *vm86; => struct vm86 *vm86;
>
> - it's obviously 'information' so the _info is superfluous.
>
> - and it's obviously embedded in a kernel structure, so the kernel_ is
> superfluous as well.

I think there is some merit to keeping the kernel prefix to make it
clear it is not the userspace version, which is similar,

> Then let's look at other fields of the main structure:
>
> struct kernel_vm86_info {
> struct vm86plus_struct __user *vm86_info;
> struct pt_regs regs32;
> unsigned long v86flags;
> unsigned long v86mask;
> unsigned long saved_sp0;
>
> unsigned long flags;
> unsigned long screen_bitmap;
> unsigned long cpu_type;
> struct revectored_struct int_revectored;
> struct revectored_struct int21_revectored;
> struct vm86plus_info_struct vm86plus;
> };
>
> - Why is there a vm86.flags and a vm86.v86flags field? What's the difference
> and can we eliminate the confusion factor?

The first is a feature flag (only used for screen bitmap), and the
latter is a shadow copy of EFLAGS.

> - The fields flags..vm86plus seems to be an as-is copy of 'struct
> vm86plus_struct'. Could this be organized in a smarter fashion?.

I specifically want to exclude retaining a copy of vm86_regs in the
kernel data. It's not needed after the syscall returns to vm86 mode.
I had originally copied all the fields after regs as a whole, but Andy
objected to that. So I changed it to copy all the fields individually
instead, so that the order of kernel vs. userspace doesn't matter.

> - 'struct vm86_regs' appears to be an as-is copy of 32-bit pt_regs, plus:
>
> unsigned short es, __esh;
> unsigned short ds, __dsh;
> unsigned short fs, __fsh;
> unsigned short gs, __gsh;
>
> Instead of a slightly different structure copying pt_regs, why not express it
> as:
>
> struct vm86_regs {
> struct pt_regs regs;
>
> unsigned short es, __esh;
> unsigned short ds, __dsh;
> unsigned short fs, __fsh;
> unsigned short gs, __gsh;
> };

This would be a userspace visible change. The register order and
field names from userspace are fixed, and I don't want it to depend on
the order of fields of the in-kernel pt_regs. It was a bad ABI design
choice, but it's too late to fix it now.

> - There's a number of 'long' fields which are always 32-bit, which is pretty
> confusing even if it's only ever built on 32-bit kerenls, can we use
> u8/u16/u32/u64 for ABI components instead?

Will do.

>
> Thanks,
>
> Ingo

--
Brian Gerst

2015-08-05 08:48:39

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 4/7] x86/vm86: Move vm86 fields out of thread_struct


* Brian Gerst <[email protected]> wrote:

> > Btw., the variable names here are crazy. I had to look twice to realize that
> > we have 'v86' and 'vm86' which are two different things.
> >
> > Also, vm86plus_struct variables and fields are named wildly inconsistently:
> > sometimes it's 'vm86.vm86_info', sometimes it's 'v86', sometimes 'user'. Ugh.
> >
> > Other fields have naming inconsistencies as well: for example we have
> > thread.vm86->vm86plus.vm86dbg_active. 'vm86' is repeated _three_ times in that
> > name, for no good reason.
> >
> > So please clean up the naming to make this all easier to read. Only the
> > highest level field should have 'vm86' in it - all subsequent fields will
> > inherit that name one way or another.
>
> Some of these field names are visible to userspace and can't change.

That's a misconception: bits in the uapi headers can be renamed just fine.

The kernel ABI is that _semantics_ that user-space code relies on must not change.

Cleaning up field names is absolutely legit to do, and we've done it numerous
times in the past. Especially where they are so confusing and inconsistent as in
the vm86 code.

Thanks,

Ingo

2015-08-05 08:53:25

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH 4/7] x86/vm86: Move vm86 fields out of thread_struct

On Wed, Aug 5, 2015 at 10:48 AM, Ingo Molnar <[email protected]> wrote:
>>
>> Some of these field names are visible to userspace and can't change.
>
> That's a misconception: bits in the uapi headers can be renamed just fine.

I disagree. If it causes pain for user space, we just shouldn't do it.
Some people copy the headers (preferred), others include the kernel
header directory with a include path or a symlink, and it's damn
painful if we start changing things that user space depends on.

I'd say that it's like the ABI - if people don't really notice, you
can do it, but if it breaks the build of a user app, we should be very
very careful. The breakage is often hard to fix because of nasty
versioning issues..

Our goal in life really is "don't screw up user space".

Linus

2015-08-23 11:51:19

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 4/7] x86/vm86: Move vm86 fields out of thread_struct


* Linus Torvalds <[email protected]> wrote:

> On Wed, Aug 5, 2015 at 10:48 AM, Ingo Molnar <[email protected]> wrote:
> >>
> >> Some of these field names are visible to userspace and can't change.
> >
> > That's a misconception: bits in the uapi headers can be renamed just fine.
>
> I disagree. If it causes pain for user space, we just shouldn't do it.

Ok, agreed - I wanted to argue against the "can't change" statement and went
overboard with my own statement ...

Quite often headers can change and we've changed a number of fields in the past -
but if they cause pain (as in this case) we don't do it.

So I'd say that based on past experience:

- changing small details in uapi headers is usually fine.
- changing small details in the ABI is usually not fine.

If anything breaks then the policy is the same, regardless of likelihood:
reverting the change.

> Some people copy the headers (preferred), others include the kernel header
> directory with a include path or a symlink, and it's damn painful if we start
> changing things that user space depends on.
>
> I'd say that it's like the ABI - if people don't really notice, you can do it,
> but if it breaks the build of a user app, we should be very very careful. The
> breakage is often hard to fix because of nasty versioning issues..
>
> Our goal in life really is "don't screw up user space".

Yeah, agreed 100%.

Thanks,

Ingo