Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758843AbbGHT7x (ORCPT ); Wed, 8 Jul 2015 15:59:53 -0400 Received: from mail-lb0-f181.google.com ([209.85.217.181]:34262 "EHLO mail-lb0-f181.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1758261AbbGHT7p (ORCPT ); Wed, 8 Jul 2015 15:59:45 -0400 MIME-Version: 1.0 In-Reply-To: References: <23d4709cee2fe92c32d41b99c7a3c1823725925a.1436312944.git.luto@kernel.org> <559C8BFE.6050604@linux.intel.com> From: Andy Lutomirski Date: Wed, 8 Jul 2015 12:59:23 -0700 Message-ID: Subject: Re: [PATCH] x86/kconfig/32: Mark CONFIG_VM86 as BROKEN To: Brian Gerst Cc: Linus Torvalds , Arjan van de Ven , Andy Lutomirski , "the arch/x86 maintainers" , Linux Kernel Mailing List , Oleg Nesterov , Kees Cook , Peter Zijlstra , Borislav Petkov Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6361 Lines: 137 On Wed, Jul 8, 2015 at 12:39 PM, Brian Gerst wrote: > On Wed, Jul 8, 2015 at 3:14 PM, Andy Lutomirski wrote: >> On Wed, Jul 8, 2015 at 12:05 PM, Brian Gerst wrote: >>> On Wed, Jul 8, 2015 at 1:30 PM, Andy Lutomirski wrote: >>>> On Wed, Jul 8, 2015 at 9:59 AM, Linus Torvalds >>>> wrote: >>>>> On Tue, Jul 7, 2015 at 7:33 PM, Arjan van de Ven wrote: >>>>>> >>>>>> if this patch would not be acceptable, at minimum we need some sort of "off >>>>>> by default >>>>>> unless the sysadmin flips a sysfs thing", which is really just a huge hack. >>>>> >>>>> The only thing that matters is whether people use this or not. >>>>> >>>> >>>> I think that the world contains precisely two programs that use the >>>> vm86 syscalls. One is dosemu, and one is a test case I wrote. (There >>>> are probably some exploits written by other people that I don't know >>>> about. Certainly Spender has been patching vm86 for long enough that >>>> he must have an exploit or two up his sleeve.) >>>> >>>> As far as I can tell (and I'll try to test this better for real later >>>> this week), dosemu already knows how to emulate real mode if vm86 is >>>> unavailable. So it's unclear that turning off the vm86 syscalls >>>> actually breaks anything whatsoever. >>>> >>>> On the other hand, sys_vm86 fails if the syscall slow path is in use. >>>> That means that quite a few Fedora versions (auditing), anything with >>>> ptrace, seccomp (before 3.16 IIRC), and anything with context tracking >>>> is probably actually *improved* by turning off the vm86 syscalls even >>>> for dosemu users. >>>> >>>> And apparently Ubuntu has had CONFIG_VM86 disabled forever. >>>> >>>> IOW, vm86 really is broken. >>>> >>>>> If people use vm86 mode, we can't just disable it. It's that simple. >>>>> "It's poorly maintained" isn't an argument for removal. Only "nobody >>>>> cares" works as an argument for that. >>>>> >>>>> My suspicion is that people still do use vm86 mode, but who knows.. >>>>> Quite frankly, rather than disable it, I'd much rather see people who >>>>> modify low-level x86 code (yes, that means you, Luto) *test* it. If >>>>> you aren't willign to test the modifications you make, I don't think >>>>> those modifications should be merged, regardless of how nice a cleanup >>>>> they are. >>>> >>>> I tried to test it. As far as I know, my changes in -tip have no >>>> effect on vm86, and the changes I'm planning on sending this week will >>>> make it work better. I still thing that Linux users should have it >>>> configured out or deleted altogether. Especially people who care at >>>> all about security. >>>> >>>> It's easy to try the easy case (run from tools/testing/selftests/x86) >>>> -- this is v4.2-rc1, but most recent versions should be identical: >>>> >>>> $ ./entry_from_vm86_32 >>>> [RUN] #BR from vm86 mode >>>> [OK] Exited vm86 mode due to #BR >>>> [RUN] SYSENTER from vm86 mode >>>> [OK] Exited vm86 mode due to unhandled GP fault >>>> >>>> $ strace -e vm86 ./entry_from_vm86_32 >>>> [RUN] #BR from vm86 mode >>>> vm86(0x1, 0xbfa50fcc, 0xbfa50fcc, 0x80488bb, 0x1000) = -1 ENOSYS >>>> (Function not implemented) >>>> [OK] Exited vm86 mode due to type 0, arg 0 >>>> [RUN] SYSENTER from vm86 mode >>>> vm86(0x1, 0xbfa50fcc, 0xbfa50fcc, 0x80488bb, 0x1000) = -1 ENOSYS >>>> (Function not implemented) >>>> [OK] Exited vm86 mode due to type 0, arg 0 >>>> >>>> It only says "[OK]" because my test case isn't careful enough. That's >>>> a failure. I suspect it was a much worse failure a couple versions >>>> ago before my ENOSYS-reworking patch went in. >>>> >>>> Replace "-e vm86" with "-e write" and be puzzled. The failure mode is >>>> really pretty bad. >>>> >>>> This only tests easy stuff. The integration between vm86 and fault >>>> handling is truly awful and I don't even know how to approach testing >>>> it. I'd probably have to run twenty or thirty old real-mode games to >>>> even exercise those code paths. >>>> >>>> I'll try to confirm later this week that dosemu can really handle real >>>> mode without sys_vm86. >>> >>> None of these issues are unfixable. As I said before, many of them >>> can be resolved if vm86 is changed to use the normal syscall/exception >>> exit paths. Give me a few days to finish off that patch set. >>> >> >> I look forward to it. >> >> However: I imagine that, if you do this, you may need to be quite >> careful about an x86_32-ism. Currently, if you have a pt_regs pointer >> for the current entry and user_mode(regs) returns true, then regs == >> current_pt_regs(). If you let user mode run with EFLAGS.VM set with >> the normal tss.sp0, then this will no longer be true, as the >> extra-long entry-from-v8086 frame will shift pt_regs by a few bytes. >> I don't know whether this matters, but I can imagine it causing >> do_signal to explode. *shudder* > > I am aware that pt_regs is in a fixed location on the stack. What I > plan to do is increase the padding at the top of the stack if VM86 is > configured, to reserve space for the extra segment registers. Then it > will move tss.sp0 up 16 bytes when entering vm86 mode so that the > longer IRET frame is in the right place. > Hmm, should work. I wonder if the right way to do this is to set a TIF_VM86 flag and do the fixups in enter_from_user_mode and prepare_return_to_usermode. See the patches I just sent (and tip/x88/asm, which they apply to). Without something like that, we'll be in the awkward position of having some of the selectors (DS, ES, FS, and GS) in both the normal pt_regs slot and in the extended hardware frame during execution of normal vm86-unaware kernel code. If, on the other hand, we copied the selectors across in enter_from_user_mode and prepare_return_from_usermode, then pt_regs would work normally even for tasks that are running in v8086 mode. regs->flags & X86_EFLAGS_VM will be true regardless, so all of the asm that decides to invoke those helpers should work fine. --Andy --Andy --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/