Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752306AbcL3FXh (ORCPT ); Fri, 30 Dec 2016 00:23:37 -0500 Received: from mga05.intel.com ([192.55.52.43]:7353 "EHLO mga05.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750720AbcL3FXf (ORCPT ); Fri, 30 Dec 2016 00:23:35 -0500 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.33,428,1477983600"; d="scan'208";a="917366373" Message-ID: <1483075412.106950.86.camel@ranerica-desktop> Subject: Re: [v2 5/7] x86: Add emulation code for UMIP instructions From: Ricardo Neri To: Andy Lutomirski Cc: Ingo Molnar , Thomas Gleixner , Borislav Petkov , Andy Lutomirski , Peter Zijlstra , "linux-kernel@vger.kernel.org" , X86 ML , linux-msdos@vger.kernel.org, wine-devel@winehq.org, Andrew Morton , "H . Peter Anvin" , Brian Gerst , Chen Yucong , Chris Metcalf , Dave Hansen , Fenghua Yu , Huang Rui , Jiri Slaby , Jonathan Corbet , "Michael S . Tsirkin" , Paul Gortmaker , "Ravi V . Shankar" , Shuah Khan , Vlastimil Babka , Tony Luck , Paolo Bonzini , "Liang Z . Li" , Alexandre Julliard , Stas Sergeev Date: Thu, 29 Dec 2016 21:23:32 -0800 In-Reply-To: References: <20161224013745.108716-1-ricardo.neri-calderon@linux.intel.com> <20161224013745.108716-6-ricardo.neri-calderon@linux.intel.com> <1482885582.106950.29.camel@ranerica-desktop> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.10.4-0ubuntu2 Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 7487 Lines: 170 On Tue, 2016-12-27 at 16:48 -0800, Andy Lutomirski wrote: > On Tue, Dec 27, 2016 at 4:39 PM, Ricardo Neri > wrote: > > On Fri, 2016-12-23 at 18:11 -0800, Andy Lutomirski wrote: > >> On Fri, Dec 23, 2016 at 5:37 PM, Ricardo Neri > >> wrote: > >> > The feature User-Mode Instruction Prevention present in recent Intel > >> > processor prevents a group of instructions from being executed with > >> > CPL > 0. Otherwise, a general protection fault is issued. > >> > > >> > Rather than relaying this fault to the user space (in the form of a SIGSEGV > >> > signal), the instructions protected by UMIP can be emulated to provide > >> > dummy results. This allows to conserve the current kernel behavior and not > >> > reveal the system resources that UMIP intends to protect (the global > >> > descriptor and interrupt descriptor tables, the segment selectors of the > >> > local descriptor table and the task state and the machine status word). > >> > > >> > This emulation is needed because certain applications (e.g., WineHQ) rely > >> > on this subset of instructions to function. > >> > > >> > The instructions protected by UMIP can be split in two groups. Those who > >> > return a kernel memory address (sgdt and sidt) and those who return a > >> > value (sldt, str and smsw). > >> > > >> > For the instructions that return a kernel memory address, the result is > >> > emulated as the location of a dummy variable in the kernel memory space. > >> > This is needed as applications such as WineHQ rely on the result being > >> > located in the kernel memory space function. The limit for the GDT and the > >> > IDT are set to zero. > >> > >> Nak. This is a trivial KASLR bypass. Just give them hardcoded > >> values. For x86_64, I would suggest 0xfffffffffffe0000 and > >> 0xffffffffffff0000. > > > > I see. I assume you are suggesting these values for x86_64 because they > > lie in an unused hole. That makes sense to me. > > > > For the case of x86_32, I have trouble finding a suitable place as there > > are not many available holes. It could be put before VMALLOC_START or > > after VMALLOC_END but this would reveal the position of the vmalloc > > area. Although, to my knowledge, randomized memory is not available for > > x86_32. Without randomization, does it hurt to make sidt/sgdt return the > > address of a kernel static variable? > > I would just use the same addresses, truncated. There's no reason > that the address needs to be truly not present -- it just needs to be > inaccessible to user code. Anything near the top of the address space > should work. Right. Then I will reuse the same addresses. > > > > >> > >> > > >> > The instructions sldt and str return a segment selector relative to the > >> > base address of the global descriptor table. Since the actual address of > >> > such table is not revealed, it makes sense to emulate the result as zero. > >> > >> Hmm, now I wonder if anything uses SLDT to see if there is an LDT. If > >> so, we could emulate it better, but I doubt this matters. > > > > So you are saying that the emulated sldt should return a different value > > based on the presence/absence of a LDT? This could reveal this very > > fact. > > User code knows whether the LDT exists because an LDT only exists if > the program called modify_ldt(). But I doubt this matters in > practice. In such a case sldt would return a non-null segment selector. I will keep giving the null segment selector in all cases and make a note in the code. > > >> > +static int __emulate_umip_insn(struct insn *insn, enum umip_insn umip_inst, > >> > + unsigned char *data, int *data_size) > >> > +{ > >> > + unsigned long const *dummy_base_addr; > >> > + unsigned short dummy_limit = 0; > >> > + unsigned short dummy_value = 0; > >> > + > >> > + switch (umip_inst) { > >> > + /* > >> > + * These two instructions return the base address and limit of the > >> > + * global and interrupt descriptor table. The base address can be > >> > + * 32-bit or 64-bit. Limit is always 16-bit. > >> > + */ > >> > + case UMIP_SGDT: > >> > + case UMIP_SIDT: > >> > + if (umip_inst == UMIP_SGDT) > >> > + dummy_base_addr = &umip_dummy_gdt_base; > >> > + else > >> > + dummy_base_addr = &umip_dummy_idt_base; > >> > + if (X86_MODRM_MOD(insn->modrm.value) == 3) { > >> > + WARN_ONCE(1, "SGDT cannot take register as argument!\n"); > >> > >> No warnings please. > > > > I'll. Remove it. > > Thanks. In general, WARN_ONCE, etc are supposed to indicate kernel > bugs, not user bugs. Agreed. Your statement makes it very clear. I didn't have it that clear in my mind. > > >> > + int not_copied, nr_copied, reg_offset, dummy_data_size; > >> > + void __user *uaddr; > >> > + unsigned long *reg_addr; > >> > + enum umip_insn umip_inst; > >> > + > >> > + not_copied = copy_from_user(buf, (void __user *)regs->ip, sizeof(buf)); > >> > >> This is slightly wrong due to PKRU. I doubt we care. > > > > I see. If I am not mistaken, if the memory is protected by a protection > > key this would cause a page fault. I'll make a note of it. > > Exactly. This is correct behavior unless the key happens to be set up > so it can be executed but not read, in which case emulation will fail. If we can't read we can't emulate anyways. I will add another note. > > >> > >> > + nr_copied = sizeof(buf) - not_copied; > >> > + /* > >> > + * The decoder _should_ fail nicely if we pass it a short buffer. > >> > + * But, let's not depend on that implementation detail. If we > >> > + * did not get anything, just error out now. > >> > + */ > >> > + if (!nr_copied) > >> > + return -EFAULT; > >> > >> If the caller cares about EINVAL vs EFAULT, it cares because it is > >> considering changing the signal to a fake page fault. If so, then > >> this should be EINVAL -- failure to read the text should just prevent > >> emulation. > > > > I see. The caller in this case do_general_protection, which will issue a > > SIGSEGV to the user space anyways. I don't think it cares about the > > EINVAL vs EFAULT. It does care about whether the emulation was > > successful. > > Maybe just make it return bool then? But fixing up the return codes > would be fine, too. I just think that, if it returns int, the value > should be meaningful. Right. I have a small query/proposal related to this below. > > >> > + if (nr_copied > 0) > >> > + return -EFAULT; > >> > >> This should be the only EFAULT case. > > Should this be EFAULT event if the caller cares only about successful > > (return 0) vs failed (return non-0) emulation? > > In theory this particular error would be a page fault not a general > protection fault (in the UMIP off case). If you were emulating it > extra carefully, you could change the signal accordingly. But, as I > said, I really doubt this matters. If simple enough and for the sake of accuracy, I could try to issue the page fault. It seems to me that this entitles calling force_sig_info_fault in this particular case as opposed to the force_sig_info(SIGSEGV, SEND_SIG_PRIV, tsk) that do_general_protection calls. Thanks and BR, Ricardo