2015-07-08 01:26:10

by Andy Lutomirski

[permalink] [raw]
Subject: [PATCH] x86/kconfig/32: Mark CONFIG_VM86 as BROKEN

VM86 is entirely broken if ptrace, syscall auditing, or NOHZ_FULL is
in use. The code is a big undocumented mess, it's a real PITA to
test, and it looks like a big chunk of vm86_32.c is dead code. It
also plays awful games with the entry asm.

No one should be using it anyway. Use DOSBOX or KVM instead.

Mark it BROKEN. I want to remove some (obviously incorrect) exit
asm that it depends on, and I don't want to figure out how to run
severely obsolete programs just to test something that no one uses
for anything other than exploits anyway.

Signed-off-by: Andy Lutomirski <[email protected]>
---

I find it implausible that vm86_32.c isn't full or root holes. It's
also full of hilariously ugly code, it does terrible things to the
kernel stack, and its interaction with the syscall slowpath is
blatantly incorrect.

It really shouldn't have any users, anyway. It doesn't (and can't!)
work on 64-bit kernels, and the only program that even knows how it
works appears to be DOSEMU. DOSEMU doesn't even need it for most
programs (it uses modify_ldt instead if possible), and DOSBOX and
KVM are better choices anyway.

I think that even DOSEMU might be able to emulate vm86 (by emulating
instruction-by-instruction) if the vm86 syscall isn't there.

Want to be terrified? Read copy_vm86_regs_from_user. Or
mark_screen_rdonly. Or return_to_32bit. Or VM86_REQUEST_IRQ.

What do you all think? This code is a maintenance disaster, and I'd
love to see it go. This would be a nice first step.

This patch is intended for tip/x86/asm. The 32-bit part of my big
cleanup will interfere with vm86, and, while I think I fixed it up
right, I'd rather not expose everyone to the high probability of
crazy security bugs in this mess.

arch/x86/Kconfig | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index aa94fd014fa2..080228bdbcda 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -997,8 +997,8 @@ config X86_THERMAL_VECTOR
depends on X86_MCE_INTEL

config VM86
- bool "Enable VM86 support" if EXPERT
- default y
+ bool "Enable VM86 support" if BROKEN
+ default n
depends on X86_32
---help---
This option is required by programs like DOSEMU to run
@@ -1006,6 +1006,12 @@ config VM86
be needed by software like XFree86 to initialize some video
cards via BIOS. Disabling this option saves about 6K.

+ Linux's vm86 support is poorly maintained, essentially never
+ tested by upstream kernel developers, has quite a few known
+ bugs, and is probably full of security holes. The only thing
+ that appears to use it is DOSEMU, and DOSBOX and KVM are
+ better options these days. Don't enable it.
+
config X86_16BIT
bool "Enable support for 16-bit segments" if EXPERT
default y
--
2.4.3


2015-07-08 02:33:44

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [PATCH] x86/kconfig/32: Mark CONFIG_VM86 as BROKEN

On 7/7/2015 6:25 PM, Andy Lutomirski wrote:
> VM86 is entirely broken if ptrace, syscall auditing, or NOHZ_FULL is
> in use. The code is a big undocumented mess, it's a real PITA to
> test, and it looks like a big chunk of vm86_32.c is dead code. It
> also plays awful games with the entry asm.
>
> No one should be using it anyway. Use DOSBOX or KVM instead.
>
> Mark it BROKEN. I want to remove some (obviously incorrect) exit
> asm that it depends on, and I don't want to figure out how to run
> severely obsolete programs just to test something that no one uses
> for anything other than exploits anyway.
>

while it is never great to deprecate features, in this case I am not sure
there is another choice unless someone steps up to seriously revamp this code.
(and look at it from a PREEMPT, NO_HZ etc etc angle)

if this patch would not be acceptable, at minimum we need some sort of "off by default
unless the sysadmin flips a sysfs thing", which is really just a huge hack.


so for me this is

Acked-by: Arjan van de Ven <[email protected]>

Subject: [tip:x86/asm] x86/kconfig/32: Mark CONFIG_VM86 as BROKEN

Commit-ID: 0b02e20767a3b4d843d2c58cf031d9e31f60e39d
Gitweb: http://git.kernel.org/tip/0b02e20767a3b4d843d2c58cf031d9e31f60e39d
Author: Andy Lutomirski <[email protected]>
AuthorDate: Tue, 7 Jul 2015 18:25:56 -0700
Committer: Ingo Molnar <[email protected]>
CommitDate: Wed, 8 Jul 2015 11:04:45 +0200

x86/kconfig/32: Mark CONFIG_VM86 as BROKEN

VM86 is entirely broken if ptrace, syscall auditing, or
NOHZ_FULL is in use. The code is a big undocumented mess, it's
a real PITA to test, and it looks like a big chunk of vm86_32.c
is dead code. It also plays awful games with the entry asm.

No one should be using it anyway. Use DOSBOX or KVM instead.

Mark it BROKEN. I want to remove some (obviously incorrect)
exit asm that it depends on, and I don't want to figure out how
to run severely obsolete programs just to test something that no
one uses for anything other than exploits anyway.

Signed-off-by: Andy Lutomirski <[email protected]>
Acked-by: Arjan van de Ven <[email protected]>
Cc: <[email protected]> # Backport it as far back as possible
Cc: Andy Lutomirski <[email protected]>
Cc: Arjan van de Ven <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/23d4709cee2fe92c32d41b99c7a3c1823725925a.1436312944.git.luto@kernel.org
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/Kconfig | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index aa94fd0..a7648f9b 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -997,8 +997,8 @@ config X86_THERMAL_VECTOR
depends on X86_MCE_INTEL

config VM86
- bool "Enable VM86 support" if EXPERT
- default y
+ bool "Enable VM86 support" if BROKEN
+ default n
depends on X86_32
---help---
This option is required by programs like DOSEMU to run
@@ -1006,6 +1006,12 @@ config VM86
be needed by software like XFree86 to initialize some video
cards via BIOS. Disabling this option saves about 6K.

+ Linux's VM86 support is poorly maintained, essentially never
+ tested by upstream kernel developers, has quite a few known
+ bugs, and is probably full of security holes. The only thing
+ that appears to use it is DOSEMU, and DOSBOX and KVM are
+ better options these days. Don't enable it.
+
config X86_16BIT
bool "Enable support for 16-bit segments" if EXPERT
default y

2015-07-08 14:00:59

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] x86/kconfig/32: Mark CONFIG_VM86 as BROKEN

On Tue, 7 Jul 2015, Arjan van de Ven wrote:

> On 7/7/2015 6:25 PM, Andy Lutomirski wrote:
> > VM86 is entirely broken if ptrace, syscall auditing, or NOHZ_FULL is
> > in use. The code is a big undocumented mess, it's a real PITA to
> > test, and it looks like a big chunk of vm86_32.c is dead code. It
> > also plays awful games with the entry asm.
> >
> > No one should be using it anyway. Use DOSBOX or KVM instead.
> >
> > Mark it BROKEN. I want to remove some (obviously incorrect) exit
> > asm that it depends on, and I don't want to figure out how to run
> > severely obsolete programs just to test something that no one uses
> > for anything other than exploits anyway.
> >
>
> while it is never great to deprecate features, in this case I am not sure
> there is another choice unless someone steps up to seriously revamp this code.
> (and look at it from a PREEMPT, NO_HZ etc etc angle)

Aside of being broken in so many aspects it's even more obsolete than
386 support, we should just remove it right away.

Thanks,

tglx

2015-07-08 14:04:10

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] x86/kconfig/32: Mark CONFIG_VM86 as BROKEN


* Thomas Gleixner <[email protected]> wrote:

> On Tue, 7 Jul 2015, Arjan van de Ven wrote:
>
> > On 7/7/2015 6:25 PM, Andy Lutomirski wrote:
> > > VM86 is entirely broken if ptrace, syscall auditing, or NOHZ_FULL is
> > > in use. The code is a big undocumented mess, it's a real PITA to
> > > test, and it looks like a big chunk of vm86_32.c is dead code. It
> > > also plays awful games with the entry asm.
> > >
> > > No one should be using it anyway. Use DOSBOX or KVM instead.
> > >
> > > Mark it BROKEN. I want to remove some (obviously incorrect) exit
> > > asm that it depends on, and I don't want to figure out how to run
> > > severely obsolete programs just to test something that no one uses
> > > for anything other than exploits anyway.
> > >
> >
> > while it is never great to deprecate features, in this case I am not sure
> > there is another choice unless someone steps up to seriously revamp this code.
> > (and look at it from a PREEMPT, NO_HZ etc etc angle)
>
> Aside of being broken in so many aspects it's even more obsolete than
> 386 support, we should just remove it right away.

Yes - marking is BROKEN essentially makes it impossible to build it without
changing the kernel source, so the next patch(es) could remove it.

But the 'marking BROKEN' patch will be much easier to backport, so I'd like to
keep it separate.

Thanks,

Ingo

2015-07-08 15:32:07

by Brian Gerst

[permalink] [raw]
Subject: Re: [PATCH] x86/kconfig/32: Mark CONFIG_VM86 as BROKEN

On Tue, Jul 7, 2015 at 9:25 PM, Andy Lutomirski <[email protected]> wrote:
> VM86 is entirely broken if ptrace, syscall auditing, or NOHZ_FULL is
> in use. The code is a big undocumented mess, it's a real PITA to
> test, and it looks like a big chunk of vm86_32.c is dead code. It
> also plays awful games with the entry asm.
>
> No one should be using it anyway. Use DOSBOX or KVM instead.
>
> Mark it BROKEN. I want to remove some (obviously incorrect) exit
> asm that it depends on, and I don't want to figure out how to run
> severely obsolete programs just to test something that no one uses
> for anything other than exploits anyway.
>
> Signed-off-by: Andy Lutomirski <[email protected]>
> ---
>
> I find it implausible that vm86_32.c isn't full or root holes. It's
> also full of hilariously ugly code, it does terrible things to the
> kernel stack, and its interaction with the syscall slowpath is
> blatantly incorrect.
>
> It really shouldn't have any users, anyway. It doesn't (and can't!)
> work on 64-bit kernels, and the only program that even knows how it
> works appears to be DOSEMU. DOSEMU doesn't even need it for most
> programs (it uses modify_ldt instead if possible), and DOSBOX and
> KVM are better choices anyway.
>
> I think that even DOSEMU might be able to emulate vm86 (by emulating
> instruction-by-instruction) if the vm86 syscall isn't there.
>
> Want to be terrified? Read copy_vm86_regs_from_user. Or
> mark_screen_rdonly. Or return_to_32bit. Or VM86_REQUEST_IRQ.
>
> What do you all think? This code is a maintenance disaster, and I'd
> love to see it go. This would be a nice first step.
>
> This patch is intended for tip/x86/asm. The 32-bit part of my big
> cleanup will interfere with vm86, and, while I think I fixed it up
> right, I'd rather not expose everyone to the high probability of
> crazy security bugs in this mess.

I have been working on some patches to fix the ugly hacks vm86 uses
and make it more easily maintainable. The general idea is to make it
use the regular pt_regs area and save the 32-bit regs and other data
off-stack. That would allow a normal kernel exit route instead of
jumping directly into the exit asm code. It should also allow ptrace
to work with a few tweaks.

One other place to check for usage is Wine. I recall there being some
DOS compatibility stuff in there.

--
Brian Gerst

2015-07-08 16:59:24

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] x86/kconfig/32: Mark CONFIG_VM86 as BROKEN

On Tue, Jul 7, 2015 at 7:33 PM, Arjan van de Ven <[email protected]> wrote:
>
> if this patch would not be acceptable, at minimum we need some sort of "off
> by default
> unless the sysadmin flips a sysfs thing", which is really just a huge hack.

The only thing that matters is whether people use this or not.

If people use vm86 mode, we can't just disable it. It's that simple.
"It's poorly maintained" isn't an argument for removal. Only "nobody
cares" works as an argument for that.

My suspicion is that people still do use vm86 mode, but who knows..
Quite frankly, rather than disable it, I'd much rather see people who
modify low-level x86 code (yes, that means you, Luto) *test* it. If
you aren't willign to test the modifications you make, I don't think
those modifications should be merged, regardless of how nice a cleanup
they are.

Linus

2015-07-08 17:30:50

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH] x86/kconfig/32: Mark CONFIG_VM86 as BROKEN

On Wed, Jul 8, 2015 at 9:59 AM, Linus Torvalds
<[email protected]> wrote:
> On Tue, Jul 7, 2015 at 7:33 PM, Arjan van de Ven <[email protected]> wrote:
>>
>> if this patch would not be acceptable, at minimum we need some sort of "off
>> by default
>> unless the sysadmin flips a sysfs thing", which is really just a huge hack.
>
> The only thing that matters is whether people use this or not.
>

I think that the world contains precisely two programs that use the
vm86 syscalls. One is dosemu, and one is a test case I wrote. (There
are probably some exploits written by other people that I don't know
about. Certainly Spender has been patching vm86 for long enough that
he must have an exploit or two up his sleeve.)

As far as I can tell (and I'll try to test this better for real later
this week), dosemu already knows how to emulate real mode if vm86 is
unavailable. So it's unclear that turning off the vm86 syscalls
actually breaks anything whatsoever.

On the other hand, sys_vm86 fails if the syscall slow path is in use.
That means that quite a few Fedora versions (auditing), anything with
ptrace, seccomp (before 3.16 IIRC), and anything with context tracking
is probably actually *improved* by turning off the vm86 syscalls even
for dosemu users.

And apparently Ubuntu has had CONFIG_VM86 disabled forever.

IOW, vm86 really is broken.

> If people use vm86 mode, we can't just disable it. It's that simple.
> "It's poorly maintained" isn't an argument for removal. Only "nobody
> cares" works as an argument for that.
>
> My suspicion is that people still do use vm86 mode, but who knows..
> Quite frankly, rather than disable it, I'd much rather see people who
> modify low-level x86 code (yes, that means you, Luto) *test* it. If
> you aren't willign to test the modifications you make, I don't think
> those modifications should be merged, regardless of how nice a cleanup
> they are.

I tried to test it. As far as I know, my changes in -tip have no
effect on vm86, and the changes I'm planning on sending this week will
make it work better. I still thing that Linux users should have it
configured out or deleted altogether. Especially people who care at
all about security.

It's easy to try the easy case (run from tools/testing/selftests/x86)
-- this is v4.2-rc1, but most recent versions should be identical:

$ ./entry_from_vm86_32
[RUN] #BR from vm86 mode
[OK] Exited vm86 mode due to #BR
[RUN] SYSENTER from vm86 mode
[OK] Exited vm86 mode due to unhandled GP fault

$ strace -e vm86 ./entry_from_vm86_32
[RUN] #BR from vm86 mode
vm86(0x1, 0xbfa50fcc, 0xbfa50fcc, 0x80488bb, 0x1000) = -1 ENOSYS
(Function not implemented)
[OK] Exited vm86 mode due to type 0, arg 0
[RUN] SYSENTER from vm86 mode
vm86(0x1, 0xbfa50fcc, 0xbfa50fcc, 0x80488bb, 0x1000) = -1 ENOSYS
(Function not implemented)
[OK] Exited vm86 mode due to type 0, arg 0

It only says "[OK]" because my test case isn't careful enough. That's
a failure. I suspect it was a much worse failure a couple versions
ago before my ENOSYS-reworking patch went in.

Replace "-e vm86" with "-e write" and be puzzled. The failure mode is
really pretty bad.

This only tests easy stuff. The integration between vm86 and fault
handling is truly awful and I don't even know how to approach testing
it. I'd probably have to run twenty or thirty old real-mode games to
even exercise those code paths.

I'll try to confirm later this week that dosemu can really handle real
mode without sys_vm86.

--Andy

2015-07-08 17:49:43

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH] x86/kconfig/32: Mark CONFIG_VM86 as BROKEN

On Wed, Jul 8, 2015 at 10:30 AM, Andy Lutomirski <[email protected]> wrote:
> I'll try to confirm later this week that dosemu can really handle real
> mode without sys_vm86.

I don't know how to tell whether something is trying to use real mode,
but I can play this just fine in DOSEMU on my 64-bit laptop:

http://dosgames.com/dl.php?filename=http://www.dosgames.com/files/alleycat.zip

which suggests that it works just fine. There is most certainly no
working sys_vm86 on my laptop.

--Andy

2015-07-08 17:55:28

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] x86/kconfig/32: Mark CONFIG_VM86 as BROKEN

On Wed, Jul 8, 2015 at 10:49 AM, Andy Lutomirski <[email protected]> wrote:
>
> I don't know how to tell whether something is trying to use real mode,
> but I can play this just fine in DOSEMU on my 64-bit laptop:

So a 64-bit distro obviously will never have used vm86 mode - it
doesn't work there. Never has. There's no sane way to get to vm86 mode
from long mode, that's just how the 64-bit extensions worked.

(64-bit hardware obviously does support vm86 mode, but you have to
play games with mixing long mode and CPL0 32-bit protected mode to get
there, and we never did that).

It's the 32-bit distros I would worry about. The ones that may have
well disabled emulation, because they have vm86 mode enabled.

Linus

2015-07-08 18:47:39

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH] x86/kconfig/32: Mark CONFIG_VM86 as BROKEN

On Wed, Jul 8, 2015 at 10:55 AM, Linus Torvalds
<[email protected]> wrote:
> On Wed, Jul 8, 2015 at 10:49 AM, Andy Lutomirski <[email protected]> wrote:
>>
>> I don't know how to tell whether something is trying to use real mode,
>> but I can play this just fine in DOSEMU on my 64-bit laptop:
>
> So a 64-bit distro obviously will never have used vm86 mode - it
> doesn't work there. Never has. There's no sane way to get to vm86 mode
> from long mode, that's just how the 64-bit extensions worked.
>
> (64-bit hardware obviously does support vm86 mode, but you have to
> play games with mixing long mode and CPL0 32-bit protected mode to get
> there, and we never did that).

Eww. My sanity hurts just thinking about that. We used to switch in
and out of long mode for EFI mixed mode support, but that's gone now,
since it produced lovely triple faults if perf was running. As far as
I can tell, those triple faults are basically unfixable without
disabling all NMI sources across long mode switches, and 32-bit EFI
works just fine (i.e. as well as it ever did) in CPL0 compat mode.

So exiting long mode to enter v8086 mode is nuts. Entering v8086 mode
via VMX would be *much* better, but just not implementing it would be
better still.

>
> It's the 32-bit distros I would worry about. The ones that may have
> well disabled emulation, because they have vm86 mode enabled.
>

Fedora doesn't package dosemu at all, and Ubuntu turns off CONFIG_VM86
AFAIK. RPMFusion does package dosemu.

Dosemu has a --disable-cpuemu configure option. A quick check
suggests that neither RPMFusion, Gentoo, nor Arch sets that option
(why would they?).

So maybe there's a couple people with home-built --disable-cpuemu
DOSEMU versions on 32-bit kernels who have syscall auditing and
context tracking off. It's even plausible that some nonzero number of
them use new kernels, but I'd be kind of surprised.

Weighed against the fact that sys_vm86 under ptrace is probably a
minor security bug* in some circumstances, I don't think the case for
preserving vm86 support looks all that good. OTOH, if someone were to
actually complain, that would be a different story. That's why I
suggested marking it BROKEN instead of deleting it outright.

* I'm planning on fixing that particular issue regardless on whether
CONFIG_VM86 is marked BROKEN.**

** I don't know enough about the mm innards to know whether
vm86_32.c's mark_screen_rdonly is a security bug, but poking at PTEs
belonging to user addresses without even trying to see what VMAs back
them doesn't look like a good thing... And I have no clue how to fix
that without an ABI break, even if that particular ABI break might not
affect dosemu.

--Andy

2015-07-08 18:49:07

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH] x86/kconfig/32: Mark CONFIG_VM86 as BROKEN

On Wed, Jul 8, 2015 at 10:55 AM, Linus Torvalds
<[email protected]> wrote:
> On Wed, Jul 8, 2015 at 10:49 AM, Andy Lutomirski <[email protected]> wrote:
>>
>> I don't know how to tell whether something is trying to use real mode,
>> but I can play this just fine in DOSEMU on my 64-bit laptop:
>
> So a 64-bit distro obviously will never have used vm86 mode - it
> doesn't work there. Never has. There's no sane way to get to vm86 mode
> from long mode, that's just how the 64-bit extensions worked.
>
> (64-bit hardware obviously does support vm86 mode, but you have to
> play games with mixing long mode and CPL0 32-bit protected mode to get
> there, and we never did that).
>
> It's the 32-bit distros I would worry about. The ones that may have
> well disabled emulation, because they have vm86 mode enabled.

Speaking as the dosemu maintainer in Debian and Ubuntu, I can confirm
what Andy mentioned: dosemu will kick over to emulation if SYS_vm86
and SYS_vm86old fail. The other area I remember that used vm86 mode
was non-KMS Xorg drivers and anything using svgalib that tried to do
video card BIOS initialization.

Also, Andy, I think you weren't looking at i386 builds of Ubuntu.
Current Ubuntu, and 12.04 ("Precise") LTS (supported until 2017), and
14.04 LTS (until 2019) releases all have CONFIG_VM86.

-Kees

--
Kees Cook
Chrome OS Security

2015-07-08 18:53:34

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH] x86/kconfig/32: Mark CONFIG_VM86 as BROKEN

On Wed, Jul 8, 2015 at 11:47 AM, Andy Lutomirski <[email protected]> wrote:
> Fedora doesn't package dosemu at all, and Ubuntu turns off CONFIG_VM86
> AFAIK. RPMFusion does package dosemu.

Just for reference, here's the config on latest Ubuntu:
http://kernel.ubuntu.com/git/ubuntu/ubuntu-vivid.git/tree/debian.master/config/config.common.ubuntu#n8204

Also Debian enables it:
http://anonscm.debian.org/viewvc/kernel/dists/trunk/linux/debian/config/kernelarch-x86/config-arch-32?view=markup

:(

-Kees

--
Kees Cook
Chrome OS Security

2015-07-08 18:54:48

by Austin S Hemmelgarn

[permalink] [raw]
Subject: Re: [PATCH] x86/kconfig/32: Mark CONFIG_VM86 as BROKEN

On 2015-07-08 13:55, Linus Torvalds wrote:
> On Wed, Jul 8, 2015 at 10:49 AM, Andy Lutomirski <[email protected]> wrote:
>>
>> I don't know how to tell whether something is trying to use real mode,
>> but I can play this just fine in DOSEMU on my 64-bit laptop:
>
> So a 64-bit distro obviously will never have used vm86 mode - it
> doesn't work there. Never has. There's no sane way to get to vm86 mode
> from long mode, that's just how the 64-bit extensions worked.
>
> (64-bit hardware obviously does support vm86 mode, but you have to
> play games with mixing long mode and CPL0 32-bit protected mode to get
> there, and we never did that).
>
> It's the 32-bit distros I would worry about. The ones that may have
> well disabled emulation, because they have vm86 mode enabled.
>
Other than the enterprise distros (which _probably_ don't even have
dosemu packages, and I'm 99% certain would have VM86 enabled only for
'backwards compatibility'), I highly doubt that there are any modern
ones that have real-mode emulation disabled in dosemu, there's just too
high of a chance of a security minded user building their own kernel
with VM86 disabled (or they just have it disabled anyway in the distro
kernel, Ubuntu does this, and I'm pretty sure that Debian and Fedora do
also). FWIW, there's no easy way to disable such emulation on Gentoo
(it is possible, it just requires some significant configuration file
hacking for portage).



Attachments:
smime.p7s (2.90 kB)
S/MIME Cryptographic Signature

2015-07-08 19:04:54

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH] x86/kconfig/32: Mark CONFIG_VM86 as BROKEN

On Wed, Jul 8, 2015 at 11:48 AM, Kees Cook <[email protected]> wrote:
> On Wed, Jul 8, 2015 at 10:55 AM, Linus Torvalds
> <[email protected]> wrote:
>> On Wed, Jul 8, 2015 at 10:49 AM, Andy Lutomirski <[email protected]> wrote:
>>>
>>> I don't know how to tell whether something is trying to use real mode,
>>> but I can play this just fine in DOSEMU on my 64-bit laptop:
>>
>> So a 64-bit distro obviously will never have used vm86 mode - it
>> doesn't work there. Never has. There's no sane way to get to vm86 mode
>> from long mode, that's just how the 64-bit extensions worked.
>>
>> (64-bit hardware obviously does support vm86 mode, but you have to
>> play games with mixing long mode and CPL0 32-bit protected mode to get
>> there, and we never did that).
>>
>> It's the 32-bit distros I would worry about. The ones that may have
>> well disabled emulation, because they have vm86 mode enabled.
>
> Speaking as the dosemu maintainer in Debian and Ubuntu, I can confirm
> what Andy mentioned: dosemu will kick over to emulation if SYS_vm86
> and SYS_vm86old fail. The other area I remember that used vm86 mode
> was non-KMS Xorg drivers and anything using svgalib that tried to do
> video card BIOS initialization.

Adam Jackson said on the Fedora list that everything uses x86emu these
days. And haven't modern kernels already dropped most of the UMS
support already?

>
> Also, Andy, I think you weren't looking at i386 builds of Ubuntu.
> Current Ubuntu, and 12.04 ("Precise") LTS (supported until 2017), and
> 14.04 LTS (until 2019) releases all have CONFIG_VM86.

Hmm. I was going off something someone said an IRC. Apparently I
should have double-checked.

If you have a test system easily available, can you see what happens
if you try to do:

$ sudo auditctl -e 1
$ sudo auditctl -D # just in case you had a "-a task,never" rule installed
$ dosemu

on a system with CONFIG_VM86=y? I bet it fails. Maybe it gets lucky
due to the the bogus vm86 asm code managing to explode with
eax=-ENOSYS, triggering a fallback to emulation.

--Andy

2015-07-08 19:05:46

by Brian Gerst

[permalink] [raw]
Subject: Re: [PATCH] x86/kconfig/32: Mark CONFIG_VM86 as BROKEN

On Wed, Jul 8, 2015 at 1:30 PM, Andy Lutomirski <[email protected]> wrote:
> On Wed, Jul 8, 2015 at 9:59 AM, Linus Torvalds
> <[email protected]> wrote:
>> On Tue, Jul 7, 2015 at 7:33 PM, Arjan van de Ven <[email protected]> wrote:
>>>
>>> if this patch would not be acceptable, at minimum we need some sort of "off
>>> by default
>>> unless the sysadmin flips a sysfs thing", which is really just a huge hack.
>>
>> The only thing that matters is whether people use this or not.
>>
>
> I think that the world contains precisely two programs that use the
> vm86 syscalls. One is dosemu, and one is a test case I wrote. (There
> are probably some exploits written by other people that I don't know
> about. Certainly Spender has been patching vm86 for long enough that
> he must have an exploit or two up his sleeve.)
>
> As far as I can tell (and I'll try to test this better for real later
> this week), dosemu already knows how to emulate real mode if vm86 is
> unavailable. So it's unclear that turning off the vm86 syscalls
> actually breaks anything whatsoever.
>
> On the other hand, sys_vm86 fails if the syscall slow path is in use.
> That means that quite a few Fedora versions (auditing), anything with
> ptrace, seccomp (before 3.16 IIRC), and anything with context tracking
> is probably actually *improved* by turning off the vm86 syscalls even
> for dosemu users.
>
> And apparently Ubuntu has had CONFIG_VM86 disabled forever.
>
> IOW, vm86 really is broken.
>
>> If people use vm86 mode, we can't just disable it. It's that simple.
>> "It's poorly maintained" isn't an argument for removal. Only "nobody
>> cares" works as an argument for that.
>>
>> My suspicion is that people still do use vm86 mode, but who knows..
>> Quite frankly, rather than disable it, I'd much rather see people who
>> modify low-level x86 code (yes, that means you, Luto) *test* it. If
>> you aren't willign to test the modifications you make, I don't think
>> those modifications should be merged, regardless of how nice a cleanup
>> they are.
>
> I tried to test it. As far as I know, my changes in -tip have no
> effect on vm86, and the changes I'm planning on sending this week will
> make it work better. I still thing that Linux users should have it
> configured out or deleted altogether. Especially people who care at
> all about security.
>
> It's easy to try the easy case (run from tools/testing/selftests/x86)
> -- this is v4.2-rc1, but most recent versions should be identical:
>
> $ ./entry_from_vm86_32
> [RUN] #BR from vm86 mode
> [OK] Exited vm86 mode due to #BR
> [RUN] SYSENTER from vm86 mode
> [OK] Exited vm86 mode due to unhandled GP fault
>
> $ strace -e vm86 ./entry_from_vm86_32
> [RUN] #BR from vm86 mode
> vm86(0x1, 0xbfa50fcc, 0xbfa50fcc, 0x80488bb, 0x1000) = -1 ENOSYS
> (Function not implemented)
> [OK] Exited vm86 mode due to type 0, arg 0
> [RUN] SYSENTER from vm86 mode
> vm86(0x1, 0xbfa50fcc, 0xbfa50fcc, 0x80488bb, 0x1000) = -1 ENOSYS
> (Function not implemented)
> [OK] Exited vm86 mode due to type 0, arg 0
>
> It only says "[OK]" because my test case isn't careful enough. That's
> a failure. I suspect it was a much worse failure a couple versions
> ago before my ENOSYS-reworking patch went in.
>
> Replace "-e vm86" with "-e write" and be puzzled. The failure mode is
> really pretty bad.
>
> This only tests easy stuff. The integration between vm86 and fault
> handling is truly awful and I don't even know how to approach testing
> it. I'd probably have to run twenty or thirty old real-mode games to
> even exercise those code paths.
>
> I'll try to confirm later this week that dosemu can really handle real
> mode without sys_vm86.

None of these issues are unfixable. As I said before, many of them
can be resolved if vm86 is changed to use the normal syscall/exception
exit paths. Give me a few days to finish off that patch set.

--
Brian Gerst

2015-07-08 19:13:46

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] x86/kconfig/32: Mark CONFIG_VM86 as BROKEN


* Linus Torvalds <[email protected]> wrote:

> On Tue, Jul 7, 2015 at 7:33 PM, Arjan van de Ven <[email protected]> wrote:
> >
> > if this patch would not be acceptable, at minimum we need some sort of "off by
> > default unless the sysadmin flips a sysfs thing", which is really just a huge
> > hack.
>
> The only thing that matters is whether people use this or not.
>
> If people use vm86 mode, we can't just disable it. It's that simple. "It's
> poorly maintained" isn't an argument for removal. Only "nobody cares" works as
> an argument for that.
>
> My suspicion is that people still do use vm86 mode, but who knows.. Quite
> frankly, rather than disable it, I'd much rather see people who modify low-level
> x86 code (yes, that means you, Luto) *test* it. If you aren't willign to test
> the modifications you make, I don't think those modifications should be merged,
> regardless of how nice a cleanup they are.

The dosemu case might just work due to emulation (assuming emulation is equivalent
or better than vm86 mode), but if Xorg still uses vm86 on old systems to run the
Video-BIOS, with no fallback code available, then I doubt we can remove it.

In any case it's a lot less clear-cut than I initially thought, so I've removed
the patch until it's determined whether it's still used by anything.

Thanks,

Ingo

2015-07-08 19:15:24

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH] x86/kconfig/32: Mark CONFIG_VM86 as BROKEN

On Wed, Jul 8, 2015 at 12:05 PM, Brian Gerst <[email protected]> wrote:
> On Wed, Jul 8, 2015 at 1:30 PM, Andy Lutomirski <[email protected]> wrote:
>> On Wed, Jul 8, 2015 at 9:59 AM, Linus Torvalds
>> <[email protected]> wrote:
>>> On Tue, Jul 7, 2015 at 7:33 PM, Arjan van de Ven <[email protected]> wrote:
>>>>
>>>> if this patch would not be acceptable, at minimum we need some sort of "off
>>>> by default
>>>> unless the sysadmin flips a sysfs thing", which is really just a huge hack.
>>>
>>> The only thing that matters is whether people use this or not.
>>>
>>
>> I think that the world contains precisely two programs that use the
>> vm86 syscalls. One is dosemu, and one is a test case I wrote. (There
>> are probably some exploits written by other people that I don't know
>> about. Certainly Spender has been patching vm86 for long enough that
>> he must have an exploit or two up his sleeve.)
>>
>> As far as I can tell (and I'll try to test this better for real later
>> this week), dosemu already knows how to emulate real mode if vm86 is
>> unavailable. So it's unclear that turning off the vm86 syscalls
>> actually breaks anything whatsoever.
>>
>> On the other hand, sys_vm86 fails if the syscall slow path is in use.
>> That means that quite a few Fedora versions (auditing), anything with
>> ptrace, seccomp (before 3.16 IIRC), and anything with context tracking
>> is probably actually *improved* by turning off the vm86 syscalls even
>> for dosemu users.
>>
>> And apparently Ubuntu has had CONFIG_VM86 disabled forever.
>>
>> IOW, vm86 really is broken.
>>
>>> If people use vm86 mode, we can't just disable it. It's that simple.
>>> "It's poorly maintained" isn't an argument for removal. Only "nobody
>>> cares" works as an argument for that.
>>>
>>> My suspicion is that people still do use vm86 mode, but who knows..
>>> Quite frankly, rather than disable it, I'd much rather see people who
>>> modify low-level x86 code (yes, that means you, Luto) *test* it. If
>>> you aren't willign to test the modifications you make, I don't think
>>> those modifications should be merged, regardless of how nice a cleanup
>>> they are.
>>
>> I tried to test it. As far as I know, my changes in -tip have no
>> effect on vm86, and the changes I'm planning on sending this week will
>> make it work better. I still thing that Linux users should have it
>> configured out or deleted altogether. Especially people who care at
>> all about security.
>>
>> It's easy to try the easy case (run from tools/testing/selftests/x86)
>> -- this is v4.2-rc1, but most recent versions should be identical:
>>
>> $ ./entry_from_vm86_32
>> [RUN] #BR from vm86 mode
>> [OK] Exited vm86 mode due to #BR
>> [RUN] SYSENTER from vm86 mode
>> [OK] Exited vm86 mode due to unhandled GP fault
>>
>> $ strace -e vm86 ./entry_from_vm86_32
>> [RUN] #BR from vm86 mode
>> vm86(0x1, 0xbfa50fcc, 0xbfa50fcc, 0x80488bb, 0x1000) = -1 ENOSYS
>> (Function not implemented)
>> [OK] Exited vm86 mode due to type 0, arg 0
>> [RUN] SYSENTER from vm86 mode
>> vm86(0x1, 0xbfa50fcc, 0xbfa50fcc, 0x80488bb, 0x1000) = -1 ENOSYS
>> (Function not implemented)
>> [OK] Exited vm86 mode due to type 0, arg 0
>>
>> It only says "[OK]" because my test case isn't careful enough. That's
>> a failure. I suspect it was a much worse failure a couple versions
>> ago before my ENOSYS-reworking patch went in.
>>
>> Replace "-e vm86" with "-e write" and be puzzled. The failure mode is
>> really pretty bad.
>>
>> This only tests easy stuff. The integration between vm86 and fault
>> handling is truly awful and I don't even know how to approach testing
>> it. I'd probably have to run twenty or thirty old real-mode games to
>> even exercise those code paths.
>>
>> I'll try to confirm later this week that dosemu can really handle real
>> mode without sys_vm86.
>
> None of these issues are unfixable. As I said before, many of them
> can be resolved if vm86 is changed to use the normal syscall/exception
> exit paths. Give me a few days to finish off that patch set.
>

I look forward to it.

However: I imagine that, if you do this, you may need to be quite
careful about an x86_32-ism. Currently, if you have a pt_regs pointer
for the current entry and user_mode(regs) returns true, then regs ==
current_pt_regs(). If you let user mode run with EFLAGS.VM set with
the normal tss.sp0, then this will no longer be true, as the
extra-long entry-from-v8086 frame will shift pt_regs by a few bytes.
I don't know whether this matters, but I can imagine it causing
do_signal to explode. *shudder*

Anyway, I'll send out my 32-bit cleanups for review soon. If it
conflicts with your changes, it'll be easy to fix up.

--Andy

2015-07-08 19:39:56

by Brian Gerst

[permalink] [raw]
Subject: Re: [PATCH] x86/kconfig/32: Mark CONFIG_VM86 as BROKEN

On Wed, Jul 8, 2015 at 3:14 PM, Andy Lutomirski <[email protected]> wrote:
> On Wed, Jul 8, 2015 at 12:05 PM, Brian Gerst <[email protected]> wrote:
>> On Wed, Jul 8, 2015 at 1:30 PM, Andy Lutomirski <[email protected]> wrote:
>>> On Wed, Jul 8, 2015 at 9:59 AM, Linus Torvalds
>>> <[email protected]> wrote:
>>>> On Tue, Jul 7, 2015 at 7:33 PM, Arjan van de Ven <[email protected]> wrote:
>>>>>
>>>>> if this patch would not be acceptable, at minimum we need some sort of "off
>>>>> by default
>>>>> unless the sysadmin flips a sysfs thing", which is really just a huge hack.
>>>>
>>>> The only thing that matters is whether people use this or not.
>>>>
>>>
>>> I think that the world contains precisely two programs that use the
>>> vm86 syscalls. One is dosemu, and one is a test case I wrote. (There
>>> are probably some exploits written by other people that I don't know
>>> about. Certainly Spender has been patching vm86 for long enough that
>>> he must have an exploit or two up his sleeve.)
>>>
>>> As far as I can tell (and I'll try to test this better for real later
>>> this week), dosemu already knows how to emulate real mode if vm86 is
>>> unavailable. So it's unclear that turning off the vm86 syscalls
>>> actually breaks anything whatsoever.
>>>
>>> On the other hand, sys_vm86 fails if the syscall slow path is in use.
>>> That means that quite a few Fedora versions (auditing), anything with
>>> ptrace, seccomp (before 3.16 IIRC), and anything with context tracking
>>> is probably actually *improved* by turning off the vm86 syscalls even
>>> for dosemu users.
>>>
>>> And apparently Ubuntu has had CONFIG_VM86 disabled forever.
>>>
>>> IOW, vm86 really is broken.
>>>
>>>> If people use vm86 mode, we can't just disable it. It's that simple.
>>>> "It's poorly maintained" isn't an argument for removal. Only "nobody
>>>> cares" works as an argument for that.
>>>>
>>>> My suspicion is that people still do use vm86 mode, but who knows..
>>>> Quite frankly, rather than disable it, I'd much rather see people who
>>>> modify low-level x86 code (yes, that means you, Luto) *test* it. If
>>>> you aren't willign to test the modifications you make, I don't think
>>>> those modifications should be merged, regardless of how nice a cleanup
>>>> they are.
>>>
>>> I tried to test it. As far as I know, my changes in -tip have no
>>> effect on vm86, and the changes I'm planning on sending this week will
>>> make it work better. I still thing that Linux users should have it
>>> configured out or deleted altogether. Especially people who care at
>>> all about security.
>>>
>>> It's easy to try the easy case (run from tools/testing/selftests/x86)
>>> -- this is v4.2-rc1, but most recent versions should be identical:
>>>
>>> $ ./entry_from_vm86_32
>>> [RUN] #BR from vm86 mode
>>> [OK] Exited vm86 mode due to #BR
>>> [RUN] SYSENTER from vm86 mode
>>> [OK] Exited vm86 mode due to unhandled GP fault
>>>
>>> $ strace -e vm86 ./entry_from_vm86_32
>>> [RUN] #BR from vm86 mode
>>> vm86(0x1, 0xbfa50fcc, 0xbfa50fcc, 0x80488bb, 0x1000) = -1 ENOSYS
>>> (Function not implemented)
>>> [OK] Exited vm86 mode due to type 0, arg 0
>>> [RUN] SYSENTER from vm86 mode
>>> vm86(0x1, 0xbfa50fcc, 0xbfa50fcc, 0x80488bb, 0x1000) = -1 ENOSYS
>>> (Function not implemented)
>>> [OK] Exited vm86 mode due to type 0, arg 0
>>>
>>> It only says "[OK]" because my test case isn't careful enough. That's
>>> a failure. I suspect it was a much worse failure a couple versions
>>> ago before my ENOSYS-reworking patch went in.
>>>
>>> Replace "-e vm86" with "-e write" and be puzzled. The failure mode is
>>> really pretty bad.
>>>
>>> This only tests easy stuff. The integration between vm86 and fault
>>> handling is truly awful and I don't even know how to approach testing
>>> it. I'd probably have to run twenty or thirty old real-mode games to
>>> even exercise those code paths.
>>>
>>> I'll try to confirm later this week that dosemu can really handle real
>>> mode without sys_vm86.
>>
>> None of these issues are unfixable. As I said before, many of them
>> can be resolved if vm86 is changed to use the normal syscall/exception
>> exit paths. Give me a few days to finish off that patch set.
>>
>
> I look forward to it.
>
> However: I imagine that, if you do this, you may need to be quite
> careful about an x86_32-ism. Currently, if you have a pt_regs pointer
> for the current entry and user_mode(regs) returns true, then regs ==
> current_pt_regs(). If you let user mode run with EFLAGS.VM set with
> the normal tss.sp0, then this will no longer be true, as the
> extra-long entry-from-v8086 frame will shift pt_regs by a few bytes.
> I don't know whether this matters, but I can imagine it causing
> do_signal to explode. *shudder*

I am aware that pt_regs is in a fixed location on the stack. What I
plan to do is increase the padding at the top of the stack if VM86 is
configured, to reserve space for the extra segment registers. Then it
will move tss.sp0 up 16 bytes when entering vm86 mode so that the
longer IRET frame is in the right place.

--
Brian Gerst

2015-07-08 19:59:53

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH] x86/kconfig/32: Mark CONFIG_VM86 as BROKEN

On Wed, Jul 8, 2015 at 12:39 PM, Brian Gerst <[email protected]> wrote:
> On Wed, Jul 8, 2015 at 3:14 PM, Andy Lutomirski <[email protected]> wrote:
>> On Wed, Jul 8, 2015 at 12:05 PM, Brian Gerst <[email protected]> wrote:
>>> On Wed, Jul 8, 2015 at 1:30 PM, Andy Lutomirski <[email protected]> wrote:
>>>> On Wed, Jul 8, 2015 at 9:59 AM, Linus Torvalds
>>>> <[email protected]> wrote:
>>>>> On Tue, Jul 7, 2015 at 7:33 PM, Arjan van de Ven <[email protected]> wrote:
>>>>>>
>>>>>> if this patch would not be acceptable, at minimum we need some sort of "off
>>>>>> by default
>>>>>> unless the sysadmin flips a sysfs thing", which is really just a huge hack.
>>>>>
>>>>> The only thing that matters is whether people use this or not.
>>>>>
>>>>
>>>> I think that the world contains precisely two programs that use the
>>>> vm86 syscalls. One is dosemu, and one is a test case I wrote. (There
>>>> are probably some exploits written by other people that I don't know
>>>> about. Certainly Spender has been patching vm86 for long enough that
>>>> he must have an exploit or two up his sleeve.)
>>>>
>>>> As far as I can tell (and I'll try to test this better for real later
>>>> this week), dosemu already knows how to emulate real mode if vm86 is
>>>> unavailable. So it's unclear that turning off the vm86 syscalls
>>>> actually breaks anything whatsoever.
>>>>
>>>> On the other hand, sys_vm86 fails if the syscall slow path is in use.
>>>> That means that quite a few Fedora versions (auditing), anything with
>>>> ptrace, seccomp (before 3.16 IIRC), and anything with context tracking
>>>> is probably actually *improved* by turning off the vm86 syscalls even
>>>> for dosemu users.
>>>>
>>>> And apparently Ubuntu has had CONFIG_VM86 disabled forever.
>>>>
>>>> IOW, vm86 really is broken.
>>>>
>>>>> If people use vm86 mode, we can't just disable it. It's that simple.
>>>>> "It's poorly maintained" isn't an argument for removal. Only "nobody
>>>>> cares" works as an argument for that.
>>>>>
>>>>> My suspicion is that people still do use vm86 mode, but who knows..
>>>>> Quite frankly, rather than disable it, I'd much rather see people who
>>>>> modify low-level x86 code (yes, that means you, Luto) *test* it. If
>>>>> you aren't willign to test the modifications you make, I don't think
>>>>> those modifications should be merged, regardless of how nice a cleanup
>>>>> they are.
>>>>
>>>> I tried to test it. As far as I know, my changes in -tip have no
>>>> effect on vm86, and the changes I'm planning on sending this week will
>>>> make it work better. I still thing that Linux users should have it
>>>> configured out or deleted altogether. Especially people who care at
>>>> all about security.
>>>>
>>>> It's easy to try the easy case (run from tools/testing/selftests/x86)
>>>> -- this is v4.2-rc1, but most recent versions should be identical:
>>>>
>>>> $ ./entry_from_vm86_32
>>>> [RUN] #BR from vm86 mode
>>>> [OK] Exited vm86 mode due to #BR
>>>> [RUN] SYSENTER from vm86 mode
>>>> [OK] Exited vm86 mode due to unhandled GP fault
>>>>
>>>> $ strace -e vm86 ./entry_from_vm86_32
>>>> [RUN] #BR from vm86 mode
>>>> vm86(0x1, 0xbfa50fcc, 0xbfa50fcc, 0x80488bb, 0x1000) = -1 ENOSYS
>>>> (Function not implemented)
>>>> [OK] Exited vm86 mode due to type 0, arg 0
>>>> [RUN] SYSENTER from vm86 mode
>>>> vm86(0x1, 0xbfa50fcc, 0xbfa50fcc, 0x80488bb, 0x1000) = -1 ENOSYS
>>>> (Function not implemented)
>>>> [OK] Exited vm86 mode due to type 0, arg 0
>>>>
>>>> It only says "[OK]" because my test case isn't careful enough. That's
>>>> a failure. I suspect it was a much worse failure a couple versions
>>>> ago before my ENOSYS-reworking patch went in.
>>>>
>>>> Replace "-e vm86" with "-e write" and be puzzled. The failure mode is
>>>> really pretty bad.
>>>>
>>>> This only tests easy stuff. The integration between vm86 and fault
>>>> handling is truly awful and I don't even know how to approach testing
>>>> it. I'd probably have to run twenty or thirty old real-mode games to
>>>> even exercise those code paths.
>>>>
>>>> I'll try to confirm later this week that dosemu can really handle real
>>>> mode without sys_vm86.
>>>
>>> None of these issues are unfixable. As I said before, many of them
>>> can be resolved if vm86 is changed to use the normal syscall/exception
>>> exit paths. Give me a few days to finish off that patch set.
>>>
>>
>> I look forward to it.
>>
>> However: I imagine that, if you do this, you may need to be quite
>> careful about an x86_32-ism. Currently, if you have a pt_regs pointer
>> for the current entry and user_mode(regs) returns true, then regs ==
>> current_pt_regs(). If you let user mode run with EFLAGS.VM set with
>> the normal tss.sp0, then this will no longer be true, as the
>> extra-long entry-from-v8086 frame will shift pt_regs by a few bytes.
>> I don't know whether this matters, but I can imagine it causing
>> do_signal to explode. *shudder*
>
> I am aware that pt_regs is in a fixed location on the stack. What I
> plan to do is increase the padding at the top of the stack if VM86 is
> configured, to reserve space for the extra segment registers. Then it
> will move tss.sp0 up 16 bytes when entering vm86 mode so that the
> longer IRET frame is in the right place.
>

Hmm, should work.

I wonder if the right way to do this is to set a TIF_VM86 flag and do
the fixups in enter_from_user_mode and prepare_return_to_usermode.
See the patches I just sent (and tip/x88/asm, which they apply to).

Without something like that, we'll be in the awkward position of
having some of the selectors (DS, ES, FS, and GS) in both the normal
pt_regs slot and in the extended hardware frame during execution of
normal vm86-unaware kernel code. If, on the other hand, we copied the
selectors across in enter_from_user_mode and
prepare_return_from_usermode, then pt_regs would work normally even
for tasks that are running in v8086 mode.

regs->flags & X86_EFLAGS_VM will be true regardless, so all of the asm
that decides to invoke those helpers should work fine.

--Andy

--Andy

--Andy

2015-07-09 05:52:38

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] x86/kconfig/32: Mark CONFIG_VM86 as BROKEN


* Andy Lutomirski <[email protected]> wrote:

> >> I look forward to it.
> >>
> >> However: I imagine that, if you do this, you may need to be quite careful
> >> about an x86_32-ism. Currently, if you have a pt_regs pointer for the
> >> current entry and user_mode(regs) returns true, then regs ==
> >> current_pt_regs(). If you let user mode run with EFLAGS.VM set with the
> >> normal tss.sp0, then this will no longer be true, as the extra-long
> >> entry-from-v8086 frame will shift pt_regs by a few bytes. I don't know
> >> whether this matters, but I can imagine it causing do_signal to explode.
> >> *shudder*
> >
> > I am aware that pt_regs is in a fixed location on the stack. What I plan to
> > do is increase the padding at the top of the stack if VM86 is configured, to
> > reserve space for the extra segment registers. Then it will move tss.sp0 up
> > 16 bytes when entering vm86 mode so that the longer IRET frame is in the right
> > place.
> >
>
> Hmm, should work.
>
> I wonder if the right way to do this is to set a TIF_VM86 flag and do the fixups
> in enter_from_user_mode and prepare_return_to_usermode. See the patches I just
> sent (and tip/x88/asm, which they apply to).
>
> Without something like that, we'll be in the awkward position of having some of
> the selectors (DS, ES, FS, and GS) in both the normal pt_regs slot and in the
> extended hardware frame during execution of normal vm86-unaware kernel code.
> If, on the other hand, we copied the selectors across in enter_from_user_mode
> and prepare_return_from_usermode, then pt_regs would work normally even for
> tasks that are running in v8086 mode.
>
> regs->flags & X86_EFLAGS_VM will be true regardless, so all of the asm that
> decides to invoke those helpers should work fine.

Btw., has anyone considered an entirely different approach: using KVM's
instruction emulator to emulate vm86 16-bit code execution? Basically the vm86
system call would be kept compatible, but fully emulated, the CPU never enters
true 16-bit mode, just iterates pt_regs as if it had.

This approach has four main advantages:

- we could remove the fragile vm86 code from the entry code

- it might even be faster for certain workloads than faulting in and out all the
time and using ancient, fragile hardware mode of the CPU. (For example it could
detect the VGA screen write patterns and accelerate them.)

- it could be made to work on 64-bit as well, FWIIW

- it would provide another angle of testing for the KVM emulator

Hm?

Thanks,

Ingo

2015-07-09 05:59:32

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] x86/kconfig/32: Mark CONFIG_VM86 as BROKEN


* Ingo Molnar <[email protected]> wrote:

> > Without something like that, we'll be in the awkward position of having some
> > of the selectors (DS, ES, FS, and GS) in both the normal pt_regs slot and in
> > the extended hardware frame during execution of normal vm86-unaware kernel
> > code. If, on the other hand, we copied the selectors across in
> > enter_from_user_mode and prepare_return_from_usermode, then pt_regs would work
> > normally even for tasks that are running in v8086 mode.
> >
> > regs->flags & X86_EFLAGS_VM will be true regardless, so all of the asm that
> > decides to invoke those helpers should work fine.
>
> Btw., has anyone considered an entirely different approach: using KVM's
> instruction emulator to emulate vm86 16-bit code execution? Basically the vm86
> system call would be kept compatible, but fully emulated, the CPU never enters
> true 16-bit mode, just iterates pt_regs as if it had.
>
> This approach has four main advantages:
>
> - we could remove the fragile vm86 code from the entry code
>
> - it might even be faster for certain workloads than faulting in and out all
> the time and using ancient, fragile hardware mode of the CPU. (For example it
> could detect the VGA screen write patterns and accelerate them.)
>
> - it could be made to work on 64-bit as well, FWIIW
>
> - it would provide another angle of testing for the KVM emulator

So there's a fifth advantage as well that I think needs to be stressed:

- it's an _obviously_ much more secure design, as we only iterate user-space
pt_regs and never truly touch any security relevant CPU state. The whole
nested pt_regs and different hw frame entry complications would go away
entirely. All CPU semantics would not be just assumed implicitly, but would
be very much present in the CPU emulator and would be reviewable.

Thanks,

Ingo

2015-07-09 09:03:48

by Pavel Machek

[permalink] [raw]
Subject: Re: [PATCH] x86/kconfig/32: Mark CONFIG_VM86 as BROKEN

On Wed 2015-07-08 16:00:48, Thomas Gleixner wrote:
> On Tue, 7 Jul 2015, Arjan van de Ven wrote:
>
> > On 7/7/2015 6:25 PM, Andy Lutomirski wrote:
> > > VM86 is entirely broken if ptrace, syscall auditing, or NOHZ_FULL is
> > > in use. The code is a big undocumented mess, it's a real PITA to
> > > test, and it looks like a big chunk of vm86_32.c is dead code. It
> > > also plays awful games with the entry asm.
> > >
> > > No one should be using it anyway. Use DOSBOX or KVM instead.
> > >
> > > Mark it BROKEN. I want to remove some (obviously incorrect) exit
> > > asm that it depends on, and I don't want to figure out how to run
> > > severely obsolete programs just to test something that no one uses
> > > for anything other than exploits anyway.
> > >
> >
> > while it is never great to deprecate features, in this case I am not sure
> > there is another choice unless someone steps up to seriously revamp this code.
> > (and look at it from a PREEMPT, NO_HZ etc etc angle)
>
> Aside of being broken in so many aspects it's even more obsolete than
> 386 support, we should just remove it right away.

Bad news for you:

vbetool-0.5/lrmi.c:#include <asm/vm86.h>
vbetool-0.5/lrmi.c:#include <sys/vm86.h>
vbetool-0.5/lrmi.c:#include <machine/vm86.h>
vbetool-0.5/lrmi.c: struct vm86_struct vm;
vbetool-0.5/lrmi.c: struct vm86_init_args init;
...
vbetool-0.5/lrmi.c:lrmi_vm86(struct vm86_struct *vm)
vbetool-0.5/lrmi.c:#define lrmi_vm86 vm86
vbetool-0.5/lrmi.c: fputs("vm86() failed\n", stderr);
vbetool-0.5/lrmi.c:run_vm86(void)
vbetool-0.5/lrmi.c: vret = lrmi_vm86(&context.vm);
vbetool-0.5/lrmi.c:vm86_callback(int sig, int code, struct sigcontext
*sc)
vbetool-0.5/lrmi.c:vm86_callback(int sig, int code, struct sigcontext
*sc)
vbetool-0.5/lrmi.c:run_vm86(void)
vbetool-0.5/lrmi.c: fprintf(stderr, "run_vm86: callback
already installed\n");

vbetool depends on it, and s2ram depends on vbetool. When we get
proper kernel drivers, this one will be solved, but it is not "more
obsolete than 386".

Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2015-07-09 17:58:02

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH] x86/kconfig/32: Mark CONFIG_VM86 as BROKEN

On Jul 9, 2015 2:03 AM, "Pavel Machek" <[email protected]> wrote:
>
> On Wed 2015-07-08 16:00:48, Thomas Gleixner wrote:
> > On Tue, 7 Jul 2015, Arjan van de Ven wrote:
> >
> > > On 7/7/2015 6:25 PM, Andy Lutomirski wrote:
> > > > VM86 is entirely broken if ptrace, syscall auditing, or NOHZ_FULL is
> > > > in use. The code is a big undocumented mess, it's a real PITA to
> > > > test, and it looks like a big chunk of vm86_32.c is dead code. It
> > > > also plays awful games with the entry asm.
> > > >
> > > > No one should be using it anyway. Use DOSBOX or KVM instead.
> > > >
> > > > Mark it BROKEN. I want to remove some (obviously incorrect) exit
> > > > asm that it depends on, and I don't want to figure out how to run
> > > > severely obsolete programs just to test something that no one uses
> > > > for anything other than exploits anyway.
> > > >
> > >
> > > while it is never great to deprecate features, in this case I am not sure
> > > there is another choice unless someone steps up to seriously revamp this code.
> > > (and look at it from a PREEMPT, NO_HZ etc etc angle)
> >
> > Aside of being broken in so many aspects it's even more obsolete than
> > 386 support, we should just remove it right away.
>
> Bad news for you:
>
> vbetool-0.5/lrmi.c:#include <asm/vm86.h>
> vbetool-0.5/lrmi.c:#include <sys/vm86.h>
> vbetool-0.5/lrmi.c:#include <machine/vm86.h>
> vbetool-0.5/lrmi.c: struct vm86_struct vm;
> vbetool-0.5/lrmi.c: struct vm86_init_args init;
> ...
> vbetool-0.5/lrmi.c:lrmi_vm86(struct vm86_struct *vm)
> vbetool-0.5/lrmi.c:#define lrmi_vm86 vm86
> vbetool-0.5/lrmi.c: fputs("vm86() failed\n", stderr);
> vbetool-0.5/lrmi.c:run_vm86(void)
> vbetool-0.5/lrmi.c: vret = lrmi_vm86(&context.vm);
> vbetool-0.5/lrmi.c:vm86_callback(int sig, int code, struct sigcontext
> *sc)
> vbetool-0.5/lrmi.c:vm86_callback(int sig, int code, struct sigcontext
> *sc)
> vbetool-0.5/lrmi.c:run_vm86(void)
> vbetool-0.5/lrmi.c: fprintf(stderr, "run_vm86: callback
> already installed\n");
>
> vbetool depends on it, and s2ram depends on vbetool. When we get
> proper kernel drivers, this one will be solved, but it is not "more
> obsolete than 386".
>

vmetool has an x86 emulator. As far as I know, it's been there for a
long time, and I'd be surprised if it doesn't work on CONFIG_VM86=n
kernels. That being said, the code is kind of tangled and it's not
quite clear to me what's going on.

See: http://www.codon.org.uk/~mjg59/libx86/

Perhaps we should instead move CONFIG_VM86 out of EXPERT, default it
to n, and suggest that everyone running a reasonably modern distro
(2006 and up?) turn it off.

--Andy

2015-07-09 18:04:10

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH] x86/kconfig/32: Mark CONFIG_VM86 as BROKEN

On Thu, Jul 9, 2015 at 10:57 AM, Andy Lutomirski <[email protected]> wrote:
> On Jul 9, 2015 2:03 AM, "Pavel Machek" <[email protected]> wrote:
>>
>> On Wed 2015-07-08 16:00:48, Thomas Gleixner wrote:
>> > On Tue, 7 Jul 2015, Arjan van de Ven wrote:
>> >
>> > > On 7/7/2015 6:25 PM, Andy Lutomirski wrote:
>> > > > VM86 is entirely broken if ptrace, syscall auditing, or NOHZ_FULL is
>> > > > in use. The code is a big undocumented mess, it's a real PITA to
>> > > > test, and it looks like a big chunk of vm86_32.c is dead code. It
>> > > > also plays awful games with the entry asm.
>> > > >
>> > > > No one should be using it anyway. Use DOSBOX or KVM instead.
>> > > >
>> > > > Mark it BROKEN. I want to remove some (obviously incorrect) exit
>> > > > asm that it depends on, and I don't want to figure out how to run
>> > > > severely obsolete programs just to test something that no one uses
>> > > > for anything other than exploits anyway.
>> > > >
>> > >
>> > > while it is never great to deprecate features, in this case I am not sure
>> > > there is another choice unless someone steps up to seriously revamp this code.
>> > > (and look at it from a PREEMPT, NO_HZ etc etc angle)
>> >
>> > Aside of being broken in so many aspects it's even more obsolete than
>> > 386 support, we should just remove it right away.
>>
>> Bad news for you:
>>
>> vbetool-0.5/lrmi.c:#include <asm/vm86.h>
>> vbetool-0.5/lrmi.c:#include <sys/vm86.h>
>> vbetool-0.5/lrmi.c:#include <machine/vm86.h>
>> vbetool-0.5/lrmi.c: struct vm86_struct vm;
>> vbetool-0.5/lrmi.c: struct vm86_init_args init;
>> ...
>> vbetool-0.5/lrmi.c:lrmi_vm86(struct vm86_struct *vm)
>> vbetool-0.5/lrmi.c:#define lrmi_vm86 vm86
>> vbetool-0.5/lrmi.c: fputs("vm86() failed\n", stderr);
>> vbetool-0.5/lrmi.c:run_vm86(void)
>> vbetool-0.5/lrmi.c: vret = lrmi_vm86(&context.vm);
>> vbetool-0.5/lrmi.c:vm86_callback(int sig, int code, struct sigcontext
>> *sc)
>> vbetool-0.5/lrmi.c:vm86_callback(int sig, int code, struct sigcontext
>> *sc)
>> vbetool-0.5/lrmi.c:run_vm86(void)
>> vbetool-0.5/lrmi.c: fprintf(stderr, "run_vm86: callback
>> already installed\n");
>>
>> vbetool depends on it, and s2ram depends on vbetool. When we get
>> proper kernel drivers, this one will be solved, but it is not "more
>> obsolete than 386".
>>
>
> vmetool has an x86 emulator. As far as I know, it's been there for a
> long time, and I'd be surprised if it doesn't work on CONFIG_VM86=n
> kernels. That being said, the code is kind of tangled and it's not
> quite clear to me what's going on.
>
> See: http://www.codon.org.uk/~mjg59/libx86/
>
> Perhaps we should instead move CONFIG_VM86 out of EXPERT, default it
> to n, and suggest that everyone running a reasonably modern distro
> (2006 and up?) turn it off.

That seems like a good idea to me.

-Kees

--
Kees Cook
Chrome OS Security

2015-07-09 18:30:53

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] x86/kconfig/32: Mark CONFIG_VM86 as BROKEN

On Thu, Jul 9, 2015 at 10:57 AM, Andy Lutomirski <[email protected]> wrote:
>
> Perhaps we should instead move CONFIG_VM86 out of EXPERT, default it
> to n, and suggest that everyone running a reasonably modern distro
> (2006 and up?) turn it off.

Ack. Changing the default and trying to deprecate it sounds like a
good plan to me.

Linus

2015-07-09 18:33:34

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH] x86/kconfig/32: Mark CONFIG_VM86 as BROKEN

On Wed, Jul 8, 2015 at 10:59 PM, Ingo Molnar <[email protected]> wrote:
>
> * Ingo Molnar <[email protected]> wrote:
>
>> > Without something like that, we'll be in the awkward position of having some
>> > of the selectors (DS, ES, FS, and GS) in both the normal pt_regs slot and in
>> > the extended hardware frame during execution of normal vm86-unaware kernel
>> > code. If, on the other hand, we copied the selectors across in
>> > enter_from_user_mode and prepare_return_from_usermode, then pt_regs would work
>> > normally even for tasks that are running in v8086 mode.
>> >
>> > regs->flags & X86_EFLAGS_VM will be true regardless, so all of the asm that
>> > decides to invoke those helpers should work fine.
>>
>> Btw., has anyone considered an entirely different approach: using KVM's
>> instruction emulator to emulate vm86 16-bit code execution? Basically the vm86
>> system call would be kept compatible, but fully emulated, the CPU never enters
>> true 16-bit mode, just iterates pt_regs as if it had.
>>
>> This approach has four main advantages:
>>
>> - we could remove the fragile vm86 code from the entry code
>>
>> - it might even be faster for certain workloads than faulting in and out all
>> the time and using ancient, fragile hardware mode of the CPU. (For example it
>> could detect the VGA screen write patterns and accelerate them.)
>>
>> - it could be made to work on 64-bit as well, FWIIW
>>
>> - it would provide another angle of testing for the KVM emulator
>
> So there's a fifth advantage as well that I think needs to be stressed:
>
> - it's an _obviously_ much more secure design, as we only iterate user-space
> pt_regs and never truly touch any security relevant CPU state. The whole
> nested pt_regs and different hw frame entry complications would go away
> entirely. All CPU semantics would not be just assumed implicitly, but would
> be very much present in the CPU emulator and would be reviewable.
>

Hmm.

If we did this, I think I'd prefer a slightly more general approach.
First teach KVM to support a mode in which it's purely an emulator
(Paolo: how hard is this? It would also make testing the emulator
much easier). Then re-implement vm86 on top of that.

The big downside of that, or of writing a more ad-hoc emulator, is
understanding what the semantics of all the weird vm86plus stuff is
supposed to be in the first place. It's completely undocumented and
it's not at all obvious what it's all supposed to do. This sounds
like a fairly large project.

I think I'd rather get all the distros to turn vm86 off and let it
slowly die in a dark corner. After all, dosemu and vbetool both
already contain emulators that seem to work, and dosbox (which is, by
all reports, better than dosemu) never used vm86 in the first place.

--Andy

2015-07-10 11:16:39

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH] x86/kconfig/32: Mark CONFIG_VM86 as BROKEN



On 09/07/2015 20:33, Andy Lutomirski wrote:
> On Wed, Jul 8, 2015 at 10:59 PM, Ingo Molnar <[email protected]> wrote:
>>
>> * Ingo Molnar <[email protected]> wrote:
>>
>>>> Without something like that, we'll be in the awkward position of having some
>>>> of the selectors (DS, ES, FS, and GS) in both the normal pt_regs slot and in
>>>> the extended hardware frame during execution of normal vm86-unaware kernel
>>>> code. If, on the other hand, we copied the selectors across in
>>>> enter_from_user_mode and prepare_return_from_usermode, then pt_regs would work
>>>> normally even for tasks that are running in v8086 mode.
>>>>
>>>> regs->flags & X86_EFLAGS_VM will be true regardless, so all of the asm that
>>>> decides to invoke those helpers should work fine.
>>>
>>> Btw., has anyone considered an entirely different approach: using KVM's
>>> instruction emulator to emulate vm86 16-bit code execution? Basically the vm86
>>> system call would be kept compatible, but fully emulated, the CPU never enters
>>> true 16-bit mode, just iterates pt_regs as if it had.
>>>
>>> This approach has four main advantages:
>>>
>>> - we could remove the fragile vm86 code from the entry code
>>>
>>> - it might even be faster for certain workloads than faulting in and out all
>>> the time and using ancient, fragile hardware mode of the CPU. (For example it
>>> could detect the VGA screen write patterns and accelerate them.)
>>>
>>> - it could be made to work on 64-bit as well, FWIIW
>>>
>>> - it would provide another angle of testing for the KVM emulator
>>
>> So there's a fifth advantage as well that I think needs to be stressed:
>>
>> - it's an _obviously_ much more secure design, as we only iterate user-space
>> pt_regs and never truly touch any security relevant CPU state. The whole
>> nested pt_regs and different hw frame entry complications would go away
>> entirely. All CPU semantics would not be just assumed implicitly, but would
>> be very much present in the CPU emulator and would be reviewable.
>>
>
> Hmm.
>
> If we did this, I think I'd prefer a slightly more general approach.
> First teach KVM to support a mode in which it's purely an emulator
> (Paolo: how hard is this? It would also make testing the emulator
> much easier).

This isn't hard, at least for Intel: make emulation_required() return
true always (and fix the fallout). However, it's not necessary. The
emulator is designed to be independent from the rest of KVM. At some
point I think Avi was testing it in userspace (or planning to do so).
So you would just move it from arch/x86/kvm to arch/x86/emulate.

The obvious downside is that the emulator isn't really designed for
speed. In KVM it's currently 1000-1500 times slower than the real
thing. Even if you modified it to remove the KVM overhead (vm86 is just
running ring 3 code; no interrupts and no pagetables to walk), it
probably would take 300-500 cycles to execute one instruction.

But it's doable.

> The big downside of that, or of writing a more ad-hoc emulator, is
> understanding what the semantics of all the weird vm86plus stuff is
> supposed to be in the first place.

Do you mean VIF/VIP and the other vm86 mode extensions? Or is vm86plus
something in Linux?

Paolo

> It's completely undocumented and
> it's not at all obvious what it's all supposed to do. This sounds
> like a fairly large project.
>
> I think I'd rather get all the distros to turn vm86 off and let it
> slowly die in a dark corner. After all, dosemu and vbetool both
> already contain emulators that seem to work, and dosbox (which is, by
> all reports, better than dosemu) never used vm86 in the first place.
>
> --Andy
>

2015-07-10 14:18:37

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH] x86/kconfig/32: Mark CONFIG_VM86 as BROKEN

Andy Lutomirski <[email protected]> writes:

> On Wed, Jul 8, 2015 at 9:59 AM, Linus Torvalds
> <[email protected]> wrote:
>> On Tue, Jul 7, 2015 at 7:33 PM, Arjan van de Ven <[email protected]> wrote:
>>>
>>> if this patch would not be acceptable, at minimum we need some sort of "off
>>> by default
>>> unless the sysadmin flips a sysfs thing", which is really just a huge hack.
>>
>> The only thing that matters is whether people use this or not.
>>
>
> I think that the world contains precisely two programs that use the
> vm86 syscalls. One is dosemu, and one is a test case I wrote.

Wine used to also call vm86.

> As far as I can tell (and I'll try to test this better for real later
> this week), dosemu already knows how to emulate real mode if vm86 is
> unavailable. So it's unclear that turning off the vm86 syscalls
> actually breaks anything whatsoever.

Yes. This happened after 64bit kernels became common years ago, as the
lack of vm86 on 64bit nearly killed the dosemu project.

> On the other hand, sys_vm86 fails if the syscall slow path is in use.
> That means that quite a few Fedora versions (auditing), anything with
> ptrace, seccomp (before 3.16 IIRC), and anything with context tracking
> is probably actually *improved* by turning off the vm86 syscalls even
> for dosemu users.

Is there any chance that vm86 is sufficiently badly broken before this
that we can conclude vm86 is not in use? It would really simplify this
discussion if we could point to code rot and say that it is clear that
no one has been testing this code path for ages, and that the code can't
possibly work the way it is now. That would just let us remove vm86.

> It only says "[OK]" because my test case isn't careful enough. That's
> a failure. I suspect it was a much worse failure a couple versions
> ago before my ENOSYS-reworking patch went in.
>
> I'll try to confirm later this week that dosemu can really handle real
> mode without sys_vm86.

I have not looked in ages but certainly on 64bit dosemu can.

As someone else pointed out dosemu maps the zero page so that may also
be a point where vm86 support gets broken.

Eric

2015-07-10 14:14:04

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] x86/kconfig/32: Mark CONFIG_VM86 as BROKEN


* Paolo Bonzini <[email protected]> wrote:

> > Hmm.
> >
> > If we did this, I think I'd prefer a slightly more general approach. First
> > teach KVM to support a mode in which it's purely an emulator (Paolo: how hard
> > is this? It would also make testing the emulator much easier).
>
> This isn't hard, at least for Intel: make emulation_required() return true
> always (and fix the fallout). However, it's not necessary. The emulator is
> designed to be independent from the rest of KVM. At some point I think Avi was
> testing it in userspace (or planning to do so). So you would just move it from
> arch/x86/kvm to arch/x86/emulate.

Very nice!

> The obvious downside is that the emulator isn't really designed for speed.
>
> In KVM it's currently 1000-1500 times slower than the real thing. Even if you
> modified it to remove the KVM overhead (vm86 is just running ring 3 code; no
> interrupts and no pagetables to walk), it probably would take 300-500 cycles to
> execute one instruction.

This needs to be tested, but I wouldn't expect it to be a big issue:

- if anyone cares they can improve its performance

- or worst case they can upgrade their tool to something newer which will use
user-space emulation of 16-bit code anyway ...

- Furthermore I suspect with vm86 we'd trap out of vm86 mode rather often - and a
single trap can take thousands of cycles. So I suspect the effective slowdown
depends on the workload.

- In the absolute worst case it will perform like a really old CPU.

Thanks,

Ingo

2015-07-10 14:24:24

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH] x86/kconfig/32: Mark CONFIG_VM86 as BROKEN



On 10/07/2015 16:13, Ingo Molnar wrote:
> > This isn't hard, at least for Intel: make emulation_required() return true
> > always (and fix the fallout). However, it's not necessary. The emulator is
> > designed to be independent from the rest of KVM. At some point I think Avi was
> > testing it in userspace (or planning to do so). So you would just move it from
> > arch/x86/kvm to arch/x86/emulate.
>
> Very nice!

Thanks. :) Mostly on behalf of the former maintainers---and the Xen
folks too, the emulator has its roots there.

So, the starting point for hooking into the emulator is struct
x86_emulate_ops (in asm/kvm_emulate.h) and the function that calls into
it in KVM is x86_emulate_instruction. You can look there to see how the
emulator can be used. If it doesn't compile straight away in userspace,
I'll gladly accept patches.

There are parts of emulation that are actually done (for simplicity and
laziness) in x86_emulate_instruction rather than emulate.c, most notably
hardware debugging support, but these aren't really needed for an
initial prototype of vm86.

A lot of the stuff in x86_emulate_instruction isn't necessary for vm86
and can be WARN()ed away, because for example IN/OUT always cause a #GP
in vm86 mode.

Paolo

2015-07-10 14:37:43

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH] x86/kconfig/32: Mark CONFIG_VM86 as BROKEN

On Fri, Jul 10, 2015 at 7:12 AM, Eric W. Biederman
<[email protected]> wrote:
> Andy Lutomirski <[email protected]> writes:
>
>> On the other hand, sys_vm86 fails if the syscall slow path is in use.
>> That means that quite a few Fedora versions (auditing), anything with
>> ptrace, seccomp (before 3.16 IIRC), and anything with context tracking
>> is probably actually *improved* by turning off the vm86 syscalls even
>> for dosemu users.
>
> Is there any chance that vm86 is sufficiently badly broken before this
> that we can conclude vm86 is not in use? It would really simplify this
> discussion if we could point to code rot and say that it is clear that
> no one has been testing this code path for ages, and that the code can't
> possibly work the way it is now. That would just let us remove vm86.
>

Having just written a pile of tests for it, I don't think so, as long as none
of the syscall slow path stuff is in use :(

>> It only says "[OK]" because my test case isn't careful enough. That's
>> a failure. I suspect it was a much worse failure a couple versions
>> ago before my ENOSYS-reworking patch went in.
>>
>> I'll try to confirm later this week that dosemu can really handle real
>> mode without sys_vm86.
>
> I have not looked in ages but certainly on 64bit dosemu can.
>
> As someone else pointed out dosemu maps the zero page so that may also
> be a point where vm86 support gets broken.

Right. And someone pointed out that vbetool sometimes needs access to
virtual (or emulated virtual) addresses above 3GB, and vm86 can't do
that.

--Andy

2015-07-10 14:40:22

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH] x86/kconfig/32: Mark CONFIG_VM86 as BROKEN

On Fri, Jul 10, 2015 at 4:16 AM, Paolo Bonzini <[email protected]> wrote:
>
>
> On 09/07/2015 20:33, Andy Lutomirski wrote:
>> On Wed, Jul 8, 2015 at 10:59 PM, Ingo Molnar <[email protected]> wrote:
>
>> The big downside of that, or of writing a more ad-hoc emulator, is
>> understanding what the semantics of all the weird vm86plus stuff is
>> supposed to be in the first place.
>
> Do you mean VIF/VIP and the other vm86 mode extensions? Or is vm86plus
> something in Linux?

Something in Linux written for DOSEMU's benefit. I don't really
understand what it encompasses. Oddly, Linux doesn't use the virtual
mode extensions. Instead, it emulates them (but probably not very
well). So STI manipulates a fake VIF flag and checks a fake VIP flag.

There's also a huge hack involving 0xA0000.

--Andy

2015-07-10 16:35:39

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] x86/kconfig/32: Mark CONFIG_VM86 as BROKEN

On Fri, Jul 10, 2015 at 7:37 AM, Andy Lutomirski <[email protected]> wrote:
>
> Having just written a pile of tests for it, I don't think so, as long as none
> of the syscall slow path stuff is in use :(

It seems that you are thinking that people actually use vm86 mode as a
real Linux mode, and do system calls from it etc.

I'm sure that has happened in some crazy situation (people doing some
random pseudo-BIOS etc), but it's not the common situation at all.

The common situation is that you enter vm86 mode with vm86(), and that
you exit it due to one of the (many) unhandled situations or a signal
or whatever. Yeah,we handle a few sad instructions directly, but most
vm86 exits just return to user mode.

The system call paths just aren't an issue in reality, because they
just aren't used.

And I'm personally violently against Ingo's idea of emulating this
with an instruction emulator. Hell no. That's what user mode does, and
it's fine there. In the kernel, we either support the hardware vm86
mode, or we phase it out because we can show that nobody uses it any
more. None of that "let's emulate it in software" crud.

Linus

2015-07-10 16:44:29

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH] x86/kconfig/32: Mark CONFIG_VM86 as BROKEN

On Fri, Jul 10, 2015 at 9:35 AM, Linus Torvalds
<[email protected]> wrote:
> On Fri, Jul 10, 2015 at 7:37 AM, Andy Lutomirski <[email protected]> wrote:
>>
>> Having just written a pile of tests for it, I don't think so, as long as none
>> of the syscall slow path stuff is in use :(
>
> It seems that you are thinking that people actually use vm86 mode as a
> real Linux mode, and do system calls from it etc.

Nope.

>
> The common situation is that you enter vm86 mode with vm86(), and that
> you exit it due to one of the (many) unhandled situations or a signal
> or whatever. Yeah,we handle a few sad instructions directly, but most
> vm86 exits just return to user mode.
>
> The system call paths just aren't an issue in reality, because they
> just aren't used.
>

That's not what I mean. I'm referring to the vm86 syscall itself. If
you have a ti flag that causes the slow exit path to be used, then you
call vm86. vm86 sets up the ludicrous double stack frame that it uses
and jumps back to the exit asm. The exit asm then branches off to the
slow path, hits the notifysig_v86 kludge, calls save_v86_state, tears
down its double stack frame, and keeps meandering back through the
exit asm. We finally IRET right back to protected mode, and the code
that userspace was trying to execute in v8086 mode never actually
runs.

That code looked fishy when I first read it, and it is, indeed,
entirely incorrect.

So the vm86 syscall itself is broken if the slow path is in use.

Fortunately, you can't do an a syscall inside vm86. If you could, I
think it would be a disaster, because the double stack means that the
syscall would run in a completely bogus context.

--Andy

2015-07-10 17:04:32

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] x86/kconfig/32: Mark CONFIG_VM86 as BROKEN

On Fri, Jul 10, 2015 at 9:44 AM, Andy Lutomirski <[email protected]> wrote:
>
> That's not what I mean. I'm referring to the vm86 syscall itself. If
> you have a ti flag that causes the slow exit path to be used, then you
> call vm86. vm86 sets up the ludicrous double stack frame that it uses
> and jumps back to the exit asm. The exit asm then branches off to the
> slow path, hits the notifysig_v86 kludge, calls save_v86_state, tears
> down its double stack frame, and keeps meandering back through the
> exit asm. We finally IRET right back to protected mode, and the code
> that userspace was trying to execute in v8086 mode never actually
> runs.

So?

Yes, we exit vm86 mode if anything odd happens. That's very much part
of the whole vm86() model. If the kernel needs to do anything, it
saves off the vm86 state and returns to regular 32-bit mode. That's
how it's designed to be.

What's your point?

The user mode "vm86 hypervisor" will call vm86() in a loop. Always
has. Always will.

And yes, that can mean that you never execute even a single
instruction in vm86 mode, if one of the "we have other work to do"
flags are set. Maybe a signal came in. Maybe just a delayed work
happened. Maybe it has nothing to do with user space, and we *could*
have returned to vm86 mode, but the thing is, that code sequence is
_designed_ that way - it's very much minimizing the impact of vm86
mode. Pretty much the *only* thing we ever do with the vm86 stack
still active is reschedule. Pretty much *any* other context change
issue will get rid of the vm86 mode in kernel space, saving back the
state to user space so that user space can try again.

An it was done that way to minimize the vm86 impact on the rest of the
kernel. Basically there's a few hooks in a couple of traps that say
"ok, let's handle this case for vm86 mode", and there's the "let's
reschedule without exiting the user vm86 state", but the code is
designed so that we'll just say "screw it, the user can restart, we'll
go back to normal 32-bit code because something else than just plain
returning to vm86 mode happend".

vm86() mode is not some kind of "run this DOS program to completion".
It's exactly like a (very stupid) vmx mode. There are exit conditions,
and while many of them are about the code it executes, equally many of
them are "oh, we may have some event that cannot be handled in vm86
mode like a signal happened" etc.

So yes, if the thread work flags are set, we never enter vm86 mode.
BUT THAT'S EXACTLY WHAT SHOULD HAPPEN.

It worries me that you think these kinds of fundamental issues are
completely broken.

No, I wouldn't be surprised at all if there is actual breakage, just
because vm86 mode clearly gets very little testing, but the things you
have pointed out as "broken" really haven't been as far as I can tell.

And yes, if you enable system call auditing, and you actually audit
the vm86 mode system call, that probably causes an exit condition,
which means that you can't actually run vm86 mode and make progress if
you audit that system call. Big f*cking deal. People who enable system
call auditing break many more important things (eg basic performance)
that that isn't even an argument. Do you really think that people who
wanted to run DOS games at hardware speeds wanted to _audit_ those
games? No.

Linus

2015-07-10 17:13:57

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH] x86/kconfig/32: Mark CONFIG_VM86 as BROKEN

On Fri, Jul 10, 2015 at 10:04 AM, Linus Torvalds
<[email protected]> wrote:
> On Fri, Jul 10, 2015 at 9:44 AM, Andy Lutomirski <[email protected]> wrote:
>>
>> That's not what I mean. I'm referring to the vm86 syscall itself. If
>> you have a ti flag that causes the slow exit path to be used, then you
>> call vm86. vm86 sets up the ludicrous double stack frame that it uses
>> and jumps back to the exit asm. The exit asm then branches off to the
>> slow path, hits the notifysig_v86 kludge, calls save_v86_state, tears
>> down its double stack frame, and keeps meandering back through the
>> exit asm. We finally IRET right back to protected mode, and the code
>> that userspace was trying to execute in v8086 mode never actually
>> runs.
>
> So?

>
> So yes, if the thread work flags are set, we never enter vm86 mode.
> BUT THAT'S EXACTLY WHAT SHOULD HAPPEN.
>
> It worries me that you think these kinds of fundamental issues are
> completely broken.
>

The problem is that it's *every* event. That includes this that
happen literally every time like strace. (NOHZ_FULL would count, too,
if it worked at all on 32-bit kernels.)

Try it: vm86 will make zero progress if you run it under strace. It
will also execute the trace hooks the wrong number of times, so strace
gets very confused. If someone does something daft like using a
systrace-style sandbox, it probably breaks the sandbox.

>
> And yes, if you enable system call auditing, and you actually audit
> the vm86 mode system call, that probably causes an exit condition,
> which means that you can't actually run vm86 mode and make progress if
> you audit that system call. Big f*cking deal. People who enable system
> call auditing break many more important things (eg basic performance)
> that that isn't even an argument. Do you really think that people who
> wanted to run DOS games at hardware speeds wanted to _audit_ those
> games? No.

Not at all.

It does, however, mean that Fedora/RHEL users (who use auditing by
default in most cases, sigh) have a decent change of having had a
non-working vm86 syscall for a long time. This makes me think that
there really aren't many vm86 users out there, since we'd have heard
about the breakage.

Note that audit is very special, though, since it has its own asm
path. It might actually work, but I haven't tested it.

In any event, we're quibbling about the wording of the kconfig text
here. Both Brian and I have patches that fix the ptrace problem, so
it's likely to be a nonissue in 4.3 regardless.

--Andy

2015-07-10 17:39:19

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] x86/kconfig/32: Mark CONFIG_VM86 as BROKEN

On Fri, Jul 10, 2015 at 10:13 AM, Andy Lutomirski <[email protected]> wrote:
>
> The problem is that it's *every* event. That includes this that
> happen literally every time like strace. (NOHZ_FULL would count, too,
> if it worked at all on 32-bit kernels.)

But things like strace and auditing etc has probably never worked in
the first place.

So yeah, I can well imagine that vm86 isn't universally useful. And
maybe it's been effectively broken in halfway modern distributions due
to their insane use of auditing - which is wonderful, because it's
just a stronger argument for disabling it by default.

But what I'd worry about is regressions - people who actually want to
upgrade kernels, and had an old machine and had an old distro, and
just want to keep that working. They aren't interested in running
strace on their old DOS game, or on their X server that uses it to run
the video BIOS. They just want it to work.

And it doesn't look "completely broken" to me for that.

Put another way: I think vm86 is very much "legacy". Nobody cares
about it in modern environments. That's not what we should even worry
about. We shouldn't worry about new users, and we _should_ try to
discourage it. But I think we should keep it working for the cases it
used to work before.

So no marking it "BROKEN". No calling it names just because it doesn't
work in insane situations that nobody cares about. It's a legacy
thing, and it probably has very few users, but I'm getting the vibe
that you want to remove it or hate it just because it might not work
in situations that simply don't make sense in the first place, and
that it was never used for anyway.

Linus

2015-07-10 17:58:56

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH] x86/kconfig/32: Mark CONFIG_VM86 as BROKEN

On Fri, Jul 10, 2015 at 10:39 AM, Linus Torvalds
<[email protected]> wrote:
> So no marking it "BROKEN". No calling it names just because it doesn't
> work in insane situations that nobody cares about. It's a legacy
> thing, and it probably has very few users, but I'm getting the vibe
> that you want to remove it or hate it just because it might not work
> in situations that simply don't make sense in the first place, and
> that it was never used for anyway.

Oh, right, I didn't realize this was still the v1 thread. v3 no
longer calls it BROKEN.

That being said, if vm86 actually has feelings, then I'm worried :)

--Andy

2015-07-10 18:00:55

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH] x86/kconfig/32: Mark CONFIG_VM86 as BROKEN

On Fri, Jul 10, 2015 at 10:39:02AM -0700, Linus Torvalds wrote:
> But things like strace and auditing etc has probably never worked in
> the first place.
>
> So yeah, I can well imagine that vm86 isn't universally useful. And
> maybe it's been effectively broken in halfway modern distributions due
> to their insane use of auditing - which is wonderful, because it's
> just a stronger argument for disabling it by default.

ITYM "both"...

2015-07-11 09:18:45

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] x86/kconfig/32: Mark CONFIG_VM86 as BROKEN


* Linus Torvalds <[email protected]> wrote:

> [...]
>
> So no marking it "BROKEN". No calling it names just because it doesn't work in
> insane situations that nobody cares about. It's a legacy thing, and it probably
> has very few users, but I'm getting the vibe that you want to remove it or hate
> it just because it might not work in situations that simply don't make sense in
> the first place, and that it was never used for anyway.

So just to make it clear that we are on the same page: I voiced a number of bad
ideas in this thread that got you (rightfully) worried. Those bad ideas are all
off the table:

- We won't mark VM86 as BROKEN (which effectively disables it permanently)

- We won't do SW emulation either.

The current plans with the vm86 ABI are the following:

- We change the name to VM86_LEGACY and mark it default n to flush out
people/distros who had it enabled for no good reason. Anyone who builds a new
kernel for an old kernel and needs it for old hardware or DOS games can still
enable it, and v86 will continue to work to the best of our abilities. (in
fact it will work better, now that we are gradually making the x86 entry code
more maintainable.)

- We enhance the help text so that people who enable it make an informed choice.

- We apply Brian's and Andy's various fixes and cleanups to fix all known vm86
bugs and to make it more maintainable.

Agreed?

Btw., what do you think about one more measure to make vm86 more configurable, and
to allow the locking down of the default some more:

- Introduce a sysctl that globally disables/enables the sys_vm86 and sys_vm86old
syscalls by default for non-privileged users, i.e. something like:

static int __read_mostly sysctl_x86_vm86_paranoia = 1;
...

switch (sysctl_x86_vm86_paranoia) {
case 0:
/* Not paranoid at all: allow everyone vm86 access: */
break;
case 1:
/* Somewhat paranoid: only allow privileged users vm86 access: */
if (!capable(CAP_SYS_ADMIN))
return -EPERM;
break;
case 2:
default:
/* Very paranoid, turn off the syscall: */
return -EPERM;
}

Note that with this we also introduce the '2' setting: users in such a distro
could still disable vm86 globally, as if it had been turned off in the kernel
config.

Thanks,

Ingo