Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933048AbbGJREc (ORCPT ); Fri, 10 Jul 2015 13:04:32 -0400 Received: from mail-ig0-f170.google.com ([209.85.213.170]:38374 "EHLO mail-ig0-f170.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932839AbbGJREN (ORCPT ); Fri, 10 Jul 2015 13:04:13 -0400 MIME-Version: 1.0 In-Reply-To: References: <23d4709cee2fe92c32d41b99c7a3c1823725925a.1436312944.git.luto@kernel.org> <559C8BFE.6050604@linux.intel.com> <87twtc14po.fsf@x220.int.ebiederm.org> Date: Fri, 10 Jul 2015 10:04:12 -0700 X-Google-Sender-Auth: ijVUbzfLJnV9sJFjlHlVifF4uBo Message-ID: Subject: Re: [PATCH] x86/kconfig/32: Mark CONFIG_VM86 as BROKEN From: Linus Torvalds To: Andy Lutomirski Cc: "Eric W. Biederman" , Arjan van de Ven , Andy Lutomirski , "the arch/x86 maintainers" , Linux Kernel Mailing List , Oleg Nesterov , Kees Cook , Peter Zijlstra , Borislav Petkov Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3589 Lines: 74 On Fri, Jul 10, 2015 at 9:44 AM, Andy Lutomirski wrote: > > That's not what I mean. I'm referring to the vm86 syscall itself. If > you have a ti flag that causes the slow exit path to be used, then you > call vm86. vm86 sets up the ludicrous double stack frame that it uses > and jumps back to the exit asm. The exit asm then branches off to the > slow path, hits the notifysig_v86 kludge, calls save_v86_state, tears > down its double stack frame, and keeps meandering back through the > exit asm. We finally IRET right back to protected mode, and the code > that userspace was trying to execute in v8086 mode never actually > runs. So? Yes, we exit vm86 mode if anything odd happens. That's very much part of the whole vm86() model. If the kernel needs to do anything, it saves off the vm86 state and returns to regular 32-bit mode. That's how it's designed to be. What's your point? The user mode "vm86 hypervisor" will call vm86() in a loop. Always has. Always will. And yes, that can mean that you never execute even a single instruction in vm86 mode, if one of the "we have other work to do" flags are set. Maybe a signal came in. Maybe just a delayed work happened. Maybe it has nothing to do with user space, and we *could* have returned to vm86 mode, but the thing is, that code sequence is _designed_ that way - it's very much minimizing the impact of vm86 mode. Pretty much the *only* thing we ever do with the vm86 stack still active is reschedule. Pretty much *any* other context change issue will get rid of the vm86 mode in kernel space, saving back the state to user space so that user space can try again. An it was done that way to minimize the vm86 impact on the rest of the kernel. Basically there's a few hooks in a couple of traps that say "ok, let's handle this case for vm86 mode", and there's the "let's reschedule without exiting the user vm86 state", but the code is designed so that we'll just say "screw it, the user can restart, we'll go back to normal 32-bit code because something else than just plain returning to vm86 mode happend". vm86() mode is not some kind of "run this DOS program to completion". It's exactly like a (very stupid) vmx mode. There are exit conditions, and while many of them are about the code it executes, equally many of them are "oh, we may have some event that cannot be handled in vm86 mode like a signal happened" etc. So yes, if the thread work flags are set, we never enter vm86 mode. BUT THAT'S EXACTLY WHAT SHOULD HAPPEN. It worries me that you think these kinds of fundamental issues are completely broken. No, I wouldn't be surprised at all if there is actual breakage, just because vm86 mode clearly gets very little testing, but the things you have pointed out as "broken" really haven't been as far as I can tell. And yes, if you enable system call auditing, and you actually audit the vm86 mode system call, that probably causes an exit condition, which means that you can't actually run vm86 mode and make progress if you audit that system call. Big f*cking deal. People who enable system call auditing break many more important things (eg basic performance) that that isn't even an argument. Do you really think that people who wanted to run DOS games at hardware speeds wanted to _audit_ those games? No. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/