Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S964821AbbGVA2Z (ORCPT ); Tue, 21 Jul 2015 20:28:25 -0400 Received: from mail-la0-f44.google.com ([209.85.215.44]:36330 "EHLO mail-la0-f44.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S934010AbbGVA2X (ORCPT ); Tue, 21 Jul 2015 20:28:23 -0400 MIME-Version: 1.0 In-Reply-To: <55AEE21E.80108@citrix.com> References: <55AEBF76.4040501@oracle.com> <55AED813.5020603@citrix.com> <55AEE21E.80108@citrix.com> From: Andy Lutomirski Date: Tue, 21 Jul 2015 17:28:01 -0700 Message-ID: Subject: Re: [PATCH v2 1/3] x86/ldt: Make modify_ldt synchronous To: Andrew Cooper Cc: Boris Ostrovsky , Andy Lutomirski , Peter Zijlstra , Steven Rostedt , "security@kernel.org" , X86 ML , Borislav Petkov , Sasha Levin , "linux-kernel@vger.kernel.org" , Konrad Rzeszutek Wilk , stable , Jan Beulich , xen-devel Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4248 Lines: 99 On Tue, Jul 21, 2015 at 5:21 PM, Andrew Cooper wrote: > On 22/07/2015 01:07, Andy Lutomirski wrote: >> On Tue, Jul 21, 2015 at 4:38 PM, Andrew Cooper >> wrote: >>> On 21/07/2015 22:53, Boris Ostrovsky wrote: >>>> On 07/21/2015 03:59 PM, Andy Lutomirski wrote: >>>>> --- a/arch/x86/include/asm/mmu_context.h >>>>> +++ b/arch/x86/include/asm/mmu_context.h >>>>> @@ -34,6 +34,44 @@ static inline void load_mm_cr4(struct mm_struct >>>>> *mm) {} >>>>> #endif >>>>> /* >>>>> + * ldt_structs can be allocated, used, and freed, but they are never >>>>> + * modified while live. >>>>> + */ >>>>> +struct ldt_struct { >>>>> + int size; >>>>> + int __pad; /* keep the descriptors naturally aligned. */ >>>>> + struct desc_struct entries[]; >>>>> +}; >>>> >>>> >>>> This breaks Xen which expects LDT to be page-aligned. Not sure why. >>>> >>>> Jan, Andrew? >>> PV guests are not permitted to have writeable mappings to the frames >>> making up the GDT and LDT, so it cannot make unaudited changes to >>> loadable descriptors. In particular, for a 32bit PV guest, it is only >>> the segment limit which protects Xen from the ring1 guest kernel. >>> >>> A lot of this code hasn't been touched in years, and it certainly >>> predates me. The alignment requirement appears to come from the virtual >>> region Xen uses to map the guests GDT and LDT. Strict alignment is >>> required for the GDT so Xen's descriptors starting at 0xe0xx are >>> correct, but the LDT alignment seems to be a side effect of similar >>> codepaths. >>> >>> For an LDT smaller than 8192 entries, I can't see any specific reason >>> for enforcing alignment, other than "that's the way it has always been". >>> >>> However, the guest would still have to relinquish write access to all >>> frames which make up the LDT, which looks to be a bit of an issue given >>> the snippet above. >> Does the LDT itself need to be aligned or just the address passed to >> paravirt_alloc_ldt? > > The address which Xen receives needs to be aligned. > > It looks like xen_alloc_ldt() blindly assumes that the desc_struct *ldt > it is passed is page aligned, and passes it straight through. xen_alloc_ldt is just fiddling with protection though, I think. Isn't it xen_set_ldt that's the meat? We could easily pass xen_alloc_ldt a pointer to the ldt_struct. > >> >>> I think I have a solution, but I doubt it is going to be very popular. >>> >>> * Make a new paravirt hook for allocation of ldt_struct, so the paravirt >>> backend can choose an alignment if needed >>> * Make absolutely certain that __pad has the value 0 (so size and __pad >>> combined don't look like a present descriptor) >>> * Never hand selector 0x0008 to unsuspecting users. >> Yuck. > > I actually meant 0x0004, but yes. Very much yuck. > >> >>> This will allow ldt_struct itself to be page aligned, and for the size >>> field to sit across the base/limit field of what would logically be >>> selector 0x0008 There would be some issues accessing size. To load >>> frames as an LDT, a guest must drop all refs to the page so that its >>> type may be changed from writeable to segdesc. After that, an >>> update_descriptor hypercall can be used to change size, and I believe >>> the guest may subsequently recreate read-only mappings to the frames in >>> question (although frankly it is getting late so you will want to double >>> check all of this). >>> >>> Anyhow, this looks like an issue which should be fixed up with slightly >>> more PVOps, rather than enforcing a Xen view of the world on native Linux. >>> >> I could presumably make the allocation the other way around so the >> size is at the end. I could even use two separate allocations if >> needed. > > I suspect two separate allocations would be the better solution, as it > means that the size field doesn't need to be subject to funny page > permissions. True. OTOH we never write to the size field after allocating the thing. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/