Received: by 2002:a4a:311b:0:0:0:0:0 with SMTP id k27-v6csp4807075ooa; Tue, 14 Aug 2018 10:49:20 -0700 (PDT) X-Google-Smtp-Source: AA+uWPwUOjpeWr8jR8VoWKEQF9RNrV7k8T7I/aLAF7iETiL/4XIR1T5Zu9JvfRjA2nuj8Hhwh2oz X-Received: by 2002:a62:9349:: with SMTP id b70-v6mr24285625pfe.193.1534268960562; Tue, 14 Aug 2018 10:49:20 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1534268960; cv=none; d=google.com; s=arc-20160816; b=TA/9oiuUEdU8YkkCMmnVbGp5A9XYe7W5JvZ473gU/dZc9crMtXSZKMSyePvqsXXP7h AZVBIQE/RhtZlx9l8RdcvYHQOTFvej6EIQ1ndlfDHjL5xjVacxjy8HYGCXT3l792nE4W vxWh+PwEX+3pNfYHMb5MxQ7i+xR3bDk61ZnDUW94/u4Rkk/LyL2A3tgzbX5hFmz9xrr4 4Dd/5DlIYUSODsI6DTh7P+S/cHQCjlHJDEOtX8Y6ZfRyKrLc0tfhh3OR0QgxB/PD4As6 p6P/DAecyanT9n4Oa8cPtomwA60QYQhSm27M6Crh106yL/0DerkC910ql6VodpPHcIfQ qP2w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :in-reply-to:message-id:date:subject:cc:to:from :arc-authentication-results; bh=HaJzArxSUzsf54+9b0SZ/nePAq677ijSsjc/iVNvEyM=; b=eEmoXDIsxgXgVbqeU4fYqR7TqNG5HcUuj3L5TL4riEycA9MMUU3uV9ZW8zODU2dAhX 1MQXZuuSGX2U36aWRGbq9rr47l8KULl8xKNHQo1pkWTxD0kE1/N4I80sjZ66nZ1/r7EA mRHEIsAaOliaKXKoKKARaNweZTdiqOQN26Vqmd/HepZ4t01q0mNhpQwYc+r+I4umf02r 6Ge5xWE8gazKwmdbvpZppDSUc/QMIS47ghUmfuWKR9WN/Mrv3G2DV3qt39JNn1azAp9E b2t848pDIs5fD8eqfd4wBqLrQR1JD1Nx5IFoVkLbZcays2+apcQRvZu4LJU3hySfpqen 2wUg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id p7-v6si17127676plo.159.2018.08.14.10.49.05; Tue, 14 Aug 2018 10:49:20 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2403812AbeHNUfg (ORCPT + 99 others); Tue, 14 Aug 2018 16:35:36 -0400 Received: from mail.linuxfoundation.org ([140.211.169.12]:60958 "EHLO mail.linuxfoundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2390379AbeHNUff (ORCPT ); Tue, 14 Aug 2018 16:35:35 -0400 Received: from localhost (unknown [194.244.16.108]) by mail.linuxfoundation.org (Postfix) with ESMTPSA id 2A9E4C9A; Tue, 14 Aug 2018 17:47:21 +0000 (UTC) From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Andi Kleen , Thomas Gleixner , Josh Poimboeuf , Dave Hansen , David Woodhouse , Guenter Roeck Subject: [PATCH 4.4 32/43] x86/speculation/l1tf: Disallow non privileged high MMIO PROT_NONE mappings Date: Tue, 14 Aug 2018 19:18:08 +0200 Message-Id: <20180814171519.219354177@linuxfoundation.org> X-Mailer: git-send-email 2.18.0 In-Reply-To: <20180814171517.014285600@linuxfoundation.org> References: <20180814171517.014285600@linuxfoundation.org> User-Agent: quilt/0.65 X-stable: review MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 4.4-stable review patch. If anyone has any objections, please let me know. ------------------ From: Andi Kleen commit 42e4089c7890725fcd329999252dc489b72f2921 upstream For L1TF PROT_NONE mappings are protected by inverting the PFN in the page table entry. This sets the high bits in the CPU's address space, thus making sure to point to not point an unmapped entry to valid cached memory. Some server system BIOSes put the MMIO mappings high up in the physical address space. If such an high mapping was mapped to unprivileged users they could attack low memory by setting such a mapping to PROT_NONE. This could happen through a special device driver which is not access protected. Normal /dev/mem is of course access protected. To avoid this forbid PROT_NONE mappings or mprotect for high MMIO mappings. Valid page mappings are allowed because the system is then unsafe anyways. It's not expected that users commonly use PROT_NONE on MMIO. But to minimize any impact this is only enforced if the mapping actually refers to a high MMIO address (defined as the MAX_PA-1 bit being set), and also skip the check for root. For mmaps this is straight forward and can be handled in vm_insert_pfn and in remap_pfn_range(). For mprotect it's a bit trickier. At the point where the actual PTEs are accessed a lot of state has been changed and it would be difficult to undo on an error. Since this is a uncommon case use a separate early page talk walk pass for MMIO PROT_NONE mappings that checks for this condition early. For non MMIO and non PROT_NONE there are no changes. [dwmw2: Backport to 4.9] [groeck: Backport to 4.4] Signed-off-by: Andi Kleen Signed-off-by: Thomas Gleixner Reviewed-by: Josh Poimboeuf Acked-by: Dave Hansen Signed-off-by: David Woodhouse Signed-off-by: Guenter Roeck Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/pgtable.h | 8 ++++++ arch/x86/mm/mmap.c | 21 +++++++++++++++++ include/asm-generic/pgtable.h | 12 ++++++++++ mm/memory.c | 29 ++++++++++++++++++------ mm/mprotect.c | 49 +++++++++++++++++++++++++++++++++++++++++ 5 files changed, 112 insertions(+), 7 deletions(-) --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -942,6 +942,14 @@ static inline pte_t pte_swp_clear_soft_d } #endif +#define __HAVE_ARCH_PFN_MODIFY_ALLOWED 1 +extern bool pfn_modify_allowed(unsigned long pfn, pgprot_t prot); + +static inline bool arch_has_pfn_modify_check(void) +{ + return boot_cpu_has_bug(X86_BUG_L1TF); +} + #include #endif /* __ASSEMBLY__ */ --- a/arch/x86/mm/mmap.c +++ b/arch/x86/mm/mmap.c @@ -121,3 +121,24 @@ const char *arch_vma_name(struct vm_area return "[mpx]"; return NULL; } + +/* + * Only allow root to set high MMIO mappings to PROT_NONE. + * This prevents an unpriv. user to set them to PROT_NONE and invert + * them, then pointing to valid memory for L1TF speculation. + * + * Note: for locked down kernels may want to disable the root override. + */ +bool pfn_modify_allowed(unsigned long pfn, pgprot_t prot) +{ + if (!boot_cpu_has_bug(X86_BUG_L1TF)) + return true; + if (!__pte_needs_invert(pgprot_val(prot))) + return true; + /* If it's real memory always allow */ + if (pfn_valid(pfn)) + return true; + if (pfn > l1tf_pfn_limit() && !capable(CAP_SYS_ADMIN)) + return false; + return true; +} --- a/include/asm-generic/pgtable.h +++ b/include/asm-generic/pgtable.h @@ -805,4 +805,16 @@ static inline int pmd_free_pte_page(pmd_ #define io_remap_pfn_range remap_pfn_range #endif +#ifndef __HAVE_ARCH_PFN_MODIFY_ALLOWED +static inline bool pfn_modify_allowed(unsigned long pfn, pgprot_t prot) +{ + return true; +} + +static inline bool arch_has_pfn_modify_check(void) +{ + return false; +} +#endif + #endif /* _ASM_GENERIC_PGTABLE_H */ --- a/mm/memory.c +++ b/mm/memory.c @@ -1645,6 +1645,9 @@ int vm_insert_pfn_prot(struct vm_area_st if (track_pfn_insert(vma, &pgprot, pfn)) return -EINVAL; + if (!pfn_modify_allowed(pfn, pgprot)) + return -EACCES; + ret = insert_pfn(vma, addr, pfn, pgprot); return ret; @@ -1663,6 +1666,9 @@ int vm_insert_mixed(struct vm_area_struc if (track_pfn_insert(vma, &pgprot, pfn)) return -EINVAL; + if (!pfn_modify_allowed(pfn, pgprot)) + return -EACCES; + /* * If we don't have pte special, then we have to use the pfn_valid() * based VM_MIXEDMAP scheme (see vm_normal_page), and thus we *must* @@ -1691,6 +1697,7 @@ static int remap_pte_range(struct mm_str { pte_t *pte; spinlock_t *ptl; + int err = 0; pte = pte_alloc_map_lock(mm, pmd, addr, &ptl); if (!pte) @@ -1698,12 +1705,16 @@ static int remap_pte_range(struct mm_str arch_enter_lazy_mmu_mode(); do { BUG_ON(!pte_none(*pte)); + if (!pfn_modify_allowed(pfn, prot)) { + err = -EACCES; + break; + } set_pte_at(mm, addr, pte, pte_mkspecial(pfn_pte(pfn, prot))); pfn++; } while (pte++, addr += PAGE_SIZE, addr != end); arch_leave_lazy_mmu_mode(); pte_unmap_unlock(pte - 1, ptl); - return 0; + return err; } static inline int remap_pmd_range(struct mm_struct *mm, pud_t *pud, @@ -1712,6 +1723,7 @@ static inline int remap_pmd_range(struct { pmd_t *pmd; unsigned long next; + int err; pfn -= addr >> PAGE_SHIFT; pmd = pmd_alloc(mm, pud, addr); @@ -1720,9 +1732,10 @@ static inline int remap_pmd_range(struct VM_BUG_ON(pmd_trans_huge(*pmd)); do { next = pmd_addr_end(addr, end); - if (remap_pte_range(mm, pmd, addr, next, - pfn + (addr >> PAGE_SHIFT), prot)) - return -ENOMEM; + err = remap_pte_range(mm, pmd, addr, next, + pfn + (addr >> PAGE_SHIFT), prot); + if (err) + return err; } while (pmd++, addr = next, addr != end); return 0; } @@ -1733,6 +1746,7 @@ static inline int remap_pud_range(struct { pud_t *pud; unsigned long next; + int err; pfn -= addr >> PAGE_SHIFT; pud = pud_alloc(mm, pgd, addr); @@ -1740,9 +1754,10 @@ static inline int remap_pud_range(struct return -ENOMEM; do { next = pud_addr_end(addr, end); - if (remap_pmd_range(mm, pud, addr, next, - pfn + (addr >> PAGE_SHIFT), prot)) - return -ENOMEM; + err = remap_pmd_range(mm, pud, addr, next, + pfn + (addr >> PAGE_SHIFT), prot); + if (err) + return err; } while (pud++, addr = next, addr != end); return 0; } --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -255,6 +255,42 @@ unsigned long change_protection(struct v return pages; } +static int prot_none_pte_entry(pte_t *pte, unsigned long addr, + unsigned long next, struct mm_walk *walk) +{ + return pfn_modify_allowed(pte_pfn(*pte), *(pgprot_t *)(walk->private)) ? + 0 : -EACCES; +} + +static int prot_none_hugetlb_entry(pte_t *pte, unsigned long hmask, + unsigned long addr, unsigned long next, + struct mm_walk *walk) +{ + return pfn_modify_allowed(pte_pfn(*pte), *(pgprot_t *)(walk->private)) ? + 0 : -EACCES; +} + +static int prot_none_test(unsigned long addr, unsigned long next, + struct mm_walk *walk) +{ + return 0; +} + +static int prot_none_walk(struct vm_area_struct *vma, unsigned long start, + unsigned long end, unsigned long newflags) +{ + pgprot_t new_pgprot = vm_get_page_prot(newflags); + struct mm_walk prot_none_walk = { + .pte_entry = prot_none_pte_entry, + .hugetlb_entry = prot_none_hugetlb_entry, + .test_walk = prot_none_test, + .mm = current->mm, + .private = &new_pgprot, + }; + + return walk_page_range(start, end, &prot_none_walk); +} + int mprotect_fixup(struct vm_area_struct *vma, struct vm_area_struct **pprev, unsigned long start, unsigned long end, unsigned long newflags) @@ -273,6 +309,19 @@ mprotect_fixup(struct vm_area_struct *vm } /* + * Do PROT_NONE PFN permission checks here when we can still + * bail out without undoing a lot of state. This is a rather + * uncommon case, so doesn't need to be very optimized. + */ + if (arch_has_pfn_modify_check() && + (vma->vm_flags & (VM_PFNMAP|VM_MIXEDMAP)) && + (newflags & (VM_READ|VM_WRITE|VM_EXEC)) == 0) { + error = prot_none_walk(vma, start, end, newflags); + if (error) + return error; + } + + /* * If we make a private mapping writable we increase our commit; * but (without finer accounting) cannot reduce our commit if we * make it unwritable again. hugetlb mapping were accounted for