Received: by 2002:a05:6a10:a0d1:0:0:0:0 with SMTP id j17csp1255648pxa; Thu, 20 Aug 2020 06:54:04 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwM2IMjygH7WNeSbmPuf3ZoPARwBOdRiihlEz1g8938isIHh3yWk2gjzYYvGp3kFnMGGpUR X-Received: by 2002:a05:6402:12c4:: with SMTP id k4mr2942708edx.358.1597931644127; Thu, 20 Aug 2020 06:54:04 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1597931644; cv=none; d=google.com; s=arc-20160816; b=Bj/esduKPsCpIV4FRTryPLSba+SQ9ovYfpjwcp6G0ecxEShlq6cw4VM3Ad8I1+x4H2 cUk94eJ4xbqAYaKG4u0n4ayjieobMvIabBJwVtZPXKsXShE/0O118mALmvBOANAcCMlr pNXViwdCE+z1cYDEWLtI1mot7/89qwai0ULwpsuJGKvz32Cy5xiUb1nw87uvY6NZxJKi zNrRJavYOnnOuq45hVOOYJSnSvOkSdFvIseF/wHQRPzgbXBFbt5XGJQkvQh7owx/WqgN ooqZatBOsc9ZHtw2yMVq/KwfMbwugmVKpPuwrb9qWzkmLw1+0QtCEsoaW9pbM4ZOycP9 RhDA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=o3XukNoPmHKrzpMNa3x0BuPhoMJDzg+gNnBdFKoru14=; b=BR+SZmTOLt4kRrmqd3QWmC6pnpgsBUUVJWyZRCNz5smsprqmrDSvGI5ZotLaxG9z21 U9CB10k7x/l+TwGAW6l6QGzGgm12MHemqsm1HVWBFEzOAtbdjrmGbtqm/6/+kv6JgvVB r4iRNCZFuDaB12/ySvNRHRN/v59F004vpF2+i1b8v/czOV/xSlztsE4iiEuOLBVnjDGG xSFAbcYGMSRUdL0DXzHqJ+ViZuzYawN0TzjP0ddzxumtvhpgdDu4KBWxWzI57AYHe1o5 OGigyqZEickLQlkDzbTvskmSNJSFfRO88gYJS2IVjzg951TphQXbWBsh1vT2oQNpjiOs ytlw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=sXNsBR4G; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id oz27si1281866ejb.27.2020.08.20.06.53.39; Thu, 20 Aug 2020 06:54:04 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=sXNsBR4G; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728989AbgHTNue (ORCPT + 99 others); Thu, 20 Aug 2020 09:50:34 -0400 Received: from mail.kernel.org ([198.145.29.99]:33352 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727957AbgHTJ1O (ORCPT ); Thu, 20 Aug 2020 05:27:14 -0400 Received: from localhost (83-86-89-107.cable.dynamic.v4.ziggo.nl [83.86.89.107]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 7907822D07; Thu, 20 Aug 2020 09:27:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1597915633; bh=9xDfmuzMpfCCSYjMK6VIbaZ1gKuDfRRz343P9dQQaIo=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=sXNsBR4GGlZmTQoUvNfapFsI+Np+/E/XVs9HlVja5Axuz1J1Bb8M/x1vP8nhg9eG8 HNi00dONpqTfr+b3PGxI0AGhsAitAnhGncmpHEDdyL1DxYpvli61sPH3FKXJN2AoIN TsxBXfQFThIiyGgqCLL0kMgQA7IJixsSWcZbYV30= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Matthew Wilcox , Mike Kravetz , Andrew Morton , Michal Hocko , Hugh Dickins , Naoya Horiguchi , "Aneesh Kumar K.V" , Andrea Arcangeli , "Kirill A.Shutemov" , Davidlohr Bueso , Prakash Sangappa , Linus Torvalds Subject: [PATCH 5.8 081/232] hugetlbfs: remove call to huge_pte_alloc without i_mmap_rwsem Date: Thu, 20 Aug 2020 11:18:52 +0200 Message-Id: <20200820091616.739964406@linuxfoundation.org> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20200820091612.692383444@linuxfoundation.org> References: <20200820091612.692383444@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Mike Kravetz commit 34ae204f18519f0920bd50a644abd6fefc8dbfcf upstream. Commit c0d0381ade79 ("hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization") requires callers of huge_pte_alloc to hold i_mmap_rwsem in at least read mode. This is because the explicit locking in huge_pmd_share (called by huge_pte_alloc) was removed. When restructuring the code, the call to huge_pte_alloc in the else block at the beginning of hugetlb_fault was missed. Unfortunately, that else clause is exercised when there is no page table entry. This will likely lead to a call to huge_pmd_share. If huge_pmd_share thinks pmd sharing is possible, it will traverse the mapping tree (i_mmap) without holding i_mmap_rwsem. If someone else is modifying the tree, bad things such as addressing exceptions or worse could happen. Simply remove the else clause. It should have been removed previously. The code following the else will call huge_pte_alloc with the appropriate locking. To prevent this type of issue in the future, add routines to assert that i_mmap_rwsem is held, and call these routines in huge pmd sharing routines. Fixes: c0d0381ade79 ("hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization") Suggested-by: Matthew Wilcox Signed-off-by: Mike Kravetz Signed-off-by: Andrew Morton Cc: Michal Hocko Cc: Hugh Dickins Cc: Naoya Horiguchi Cc: "Aneesh Kumar K.V" Cc: Andrea Arcangeli Cc: "Kirill A.Shutemov" Cc: Davidlohr Bueso Cc: Prakash Sangappa Cc: Link: http://lkml.kernel.org/r/e670f327-5cf9-1959-96e4-6dc7cc30d3d5@oracle.com Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- include/linux/fs.h | 10 ++++++++++ include/linux/hugetlb.h | 8 +++++--- mm/hugetlb.c | 15 +++++++-------- mm/rmap.c | 2 +- 4 files changed, 23 insertions(+), 12 deletions(-) --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -549,6 +549,16 @@ static inline void i_mmap_unlock_read(st up_read(&mapping->i_mmap_rwsem); } +static inline void i_mmap_assert_locked(struct address_space *mapping) +{ + lockdep_assert_held(&mapping->i_mmap_rwsem); +} + +static inline void i_mmap_assert_write_locked(struct address_space *mapping) +{ + lockdep_assert_held_write(&mapping->i_mmap_rwsem); +} + /* * Might pages of this file be mapped into userspace? */ --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -164,7 +164,8 @@ pte_t *huge_pte_alloc(struct mm_struct * unsigned long addr, unsigned long sz); pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr, unsigned long sz); -int huge_pmd_unshare(struct mm_struct *mm, unsigned long *addr, pte_t *ptep); +int huge_pmd_unshare(struct mm_struct *mm, struct vm_area_struct *vma, + unsigned long *addr, pte_t *ptep); void adjust_range_if_pmd_sharing_possible(struct vm_area_struct *vma, unsigned long *start, unsigned long *end); struct page *follow_huge_addr(struct mm_struct *mm, unsigned long address, @@ -203,8 +204,9 @@ static inline struct address_space *huge return NULL; } -static inline int huge_pmd_unshare(struct mm_struct *mm, unsigned long *addr, - pte_t *ptep) +static inline int huge_pmd_unshare(struct mm_struct *mm, + struct vm_area_struct *vma, + unsigned long *addr, pte_t *ptep) { return 0; } --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -3952,7 +3952,7 @@ void __unmap_hugepage_range(struct mmu_g continue; ptl = huge_pte_lock(h, mm, ptep); - if (huge_pmd_unshare(mm, &address, ptep)) { + if (huge_pmd_unshare(mm, vma, &address, ptep)) { spin_unlock(ptl); /* * We just unmapped a page of PMDs by clearing a PUD. @@ -4539,10 +4539,6 @@ vm_fault_t hugetlb_fault(struct mm_struc } else if (unlikely(is_hugetlb_entry_hwpoisoned(entry))) return VM_FAULT_HWPOISON_LARGE | VM_FAULT_SET_HINDEX(hstate_index(h)); - } else { - ptep = huge_pte_alloc(mm, haddr, huge_page_size(h)); - if (!ptep) - return VM_FAULT_OOM; } /* @@ -5019,7 +5015,7 @@ unsigned long hugetlb_change_protection( if (!ptep) continue; ptl = huge_pte_lock(h, mm, ptep); - if (huge_pmd_unshare(mm, &address, ptep)) { + if (huge_pmd_unshare(mm, vma, &address, ptep)) { pages++; spin_unlock(ptl); shared_pmd = true; @@ -5400,12 +5396,14 @@ out: * returns: 1 successfully unmapped a shared pte page * 0 the underlying pte page is not shared, or it is the last user */ -int huge_pmd_unshare(struct mm_struct *mm, unsigned long *addr, pte_t *ptep) +int huge_pmd_unshare(struct mm_struct *mm, struct vm_area_struct *vma, + unsigned long *addr, pte_t *ptep) { pgd_t *pgd = pgd_offset(mm, *addr); p4d_t *p4d = p4d_offset(pgd, *addr); pud_t *pud = pud_offset(p4d, *addr); + i_mmap_assert_write_locked(vma->vm_file->f_mapping); BUG_ON(page_count(virt_to_page(ptep)) == 0); if (page_count(virt_to_page(ptep)) == 1) return 0; @@ -5423,7 +5421,8 @@ pte_t *huge_pmd_share(struct mm_struct * return NULL; } -int huge_pmd_unshare(struct mm_struct *mm, unsigned long *addr, pte_t *ptep) +int huge_pmd_unshare(struct mm_struct *mm, struct vm_area_struct *vma, + unsigned long *addr, pte_t *ptep) { return 0; } --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1469,7 +1469,7 @@ static bool try_to_unmap_one(struct page * do this outside rmap routines. */ VM_BUG_ON(!(flags & TTU_RMAP_LOCKED)); - if (huge_pmd_unshare(mm, &address, pvmw.pte)) { + if (huge_pmd_unshare(mm, vma, &address, pvmw.pte)) { /* * huge_pmd_unshare unmapped an entire PMD * page. There is no way of knowing exactly