Received: by 2002:a25:4158:0:0:0:0:0 with SMTP id o85csp103394yba; Mon, 20 May 2019 05:46:10 -0700 (PDT) X-Google-Smtp-Source: APXvYqw/w2Q3sPK6anMbjWGj1w9Bnsv8FTpFn6Uef7tq5UyBDW/j34PvXq622ffnakBC872tWX2L X-Received: by 2002:a17:902:e00f:: with SMTP id ca15mr75233491plb.76.1558356370515; Mon, 20 May 2019 05:46:10 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1558356370; cv=none; d=google.com; s=arc-20160816; b=KXXIhKK3PiJYnIK8Mgdkp/qIEK6hSPcGAvtID5QLi5pQ0MMBUTeJ9czw+rdAIvBS/o 8mkEPGw/wEXEvfw5FrAsl+M6hJUn94kHzIOZtZhsDlEahJIoGpBf3BxNVQ52QS7nSQQZ VOqCwL6JNo+8IDxEHVYKQvXRehI04SsQ1WaaynE6d2tB2uCQ+eMcjQv3lBwqc1H+UIXg XY1cV6KEtL1QmNVFr4OSSXsvl0Cq8qJSCvTYCqSWwyohWVUKOu8u5QBmGmFjGoOtplh7 3082BxRZKqZse0LVEKayl9kbrtVqc9A/8mB9fDCGmuY6iqBAOD7oW2MmNpEvIAo4MtyL g2rA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=a3jpcigLQ+A8bDo+LF8+kXbZ5Fn+sTAnEfwcSZiDtB8=; b=kEmoES8QaA6L2RTt7DMr0GZA32RRuH7a1OG0ZNEqmx8w3ZlcPSA9taMpO3Yqh8Bv4v vndbeGNBwenzcOGKm7dG5s2niKs99xCrgaJhzYoYNz5DSrVoVumDWSWcKx9QpgN5389v FTi3cxw7Ln/wZxn9t+Xcuh/5xq1JOm0Rfla6M4LJ27inV6gGqhjHiYV/YeiACI+5hDDj VVzmc4EsGO6Dq9npg72yURCEuce4a/IjXmkXTroWxVesc+TyhOZn83t7Qt4iuvKNZX9W Fo7BuIO5Smho6dUR+qBlXcQKFofkPjHiH1f+6mHyb9t7sFprxsY2oQig34ncX9WxuQ01 RZpw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=EQzVXXro; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id i37si18095198plb.252.2019.05.20.05.45.55; Mon, 20 May 2019 05:46:10 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=EQzVXXro; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2389185AbfETM2c (ORCPT + 99 others); Mon, 20 May 2019 08:28:32 -0400 Received: from mail.kernel.org ([198.145.29.99]:44258 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2389674AbfETM22 (ORCPT ); Mon, 20 May 2019 08:28:28 -0400 Received: from localhost (83-86-89-107.cable.dynamic.v4.ziggo.nl [83.86.89.107]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id F14D520675; Mon, 20 May 2019 12:28:26 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1558355307; bh=L1rYs2lfQrWT2wVK+AlESVJJiDMYSo9Gf0EDmc2C7/4=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=EQzVXXro1BlcXHHOvF5tgtvnYeQtd3c5AnpTfG26IcsyWHh1LlJmFM+3hOJ0sdSW4 H2aGVGRngzY8Eiy0JIZnypZIPigdmh3/Nm45SlcS62w0sXV3tJSg44K6HfbpDjHo8Y B2Es5D3RHBxRZgh1ThtD9gRffUsQXwbFq2DQ84FE= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Mike Kravetz , Naoya Horiguchi , Davidlohr Bueso , Joonsoo Kim , "Kirill A . Shutemov" , Michal Hocko , Andrew Morton , Linus Torvalds Subject: [PATCH 5.0 070/123] hugetlb: use same fault hash key for shared and private mappings Date: Mon, 20 May 2019 14:14:10 +0200 Message-Id: <20190520115249.495415792@linuxfoundation.org> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20190520115245.439864225@linuxfoundation.org> References: <20190520115245.439864225@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Mike Kravetz commit 1b426bac66e6cc83c9f2d92b96e4e72acf43419a upstream. hugetlb uses a fault mutex hash table to prevent page faults of the same pages concurrently. The key for shared and private mappings is different. Shared keys off address_space and file index. Private keys off mm and virtual address. Consider a private mappings of a populated hugetlbfs file. A fault will map the page from the file and if needed do a COW to map a writable page. Hugetlbfs hole punch uses the fault mutex to prevent mappings of file pages. It uses the address_space file index key. However, private mappings will use a different key and could race with this code to map the file page. This causes problems (BUG) for the page cache remove code as it expects the page to be unmapped. A sample stack is: page dumped because: VM_BUG_ON_PAGE(page_mapped(page)) kernel BUG at mm/filemap.c:169! ... RIP: 0010:unaccount_page_cache_page+0x1b8/0x200 ... Call Trace: __delete_from_page_cache+0x39/0x220 delete_from_page_cache+0x45/0x70 remove_inode_hugepages+0x13c/0x380 ? __add_to_page_cache_locked+0x162/0x380 hugetlbfs_fallocate+0x403/0x540 ? _cond_resched+0x15/0x30 ? __inode_security_revalidate+0x5d/0x70 ? selinux_file_permission+0x100/0x130 vfs_fallocate+0x13f/0x270 ksys_fallocate+0x3c/0x80 __x64_sys_fallocate+0x1a/0x20 do_syscall_64+0x5b/0x180 entry_SYSCALL_64_after_hwframe+0x44/0xa9 There seems to be another potential COW issue/race with this approach of different private and shared keys as noted in commit 8382d914ebf7 ("mm, hugetlb: improve page-fault scalability"). Since every hugetlb mapping (even anon and private) is actually a file mapping, just use the address_space index key for all mappings. This results in potentially more hash collisions. However, this should not be the common case. Link: http://lkml.kernel.org/r/20190328234704.27083-3-mike.kravetz@oracle.com Link: http://lkml.kernel.org/r/20190412165235.t4sscoujczfhuiyt@linux-r8p5 Fixes: b5cec28d36f5 ("hugetlbfs: truncate_hugepages() takes a range of pages") Signed-off-by: Mike Kravetz Reviewed-by: Naoya Horiguchi Reviewed-by: Davidlohr Bueso Cc: Joonsoo Kim Cc: "Kirill A . Shutemov" Cc: Michal Hocko Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- fs/hugetlbfs/inode.c | 7 ++----- include/linux/hugetlb.h | 4 +--- mm/hugetlb.c | 22 ++++++---------------- mm/userfaultfd.c | 3 +-- 4 files changed, 10 insertions(+), 26 deletions(-) --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -426,9 +426,7 @@ static void remove_inode_hugepages(struc u32 hash; index = page->index; - hash = hugetlb_fault_mutex_hash(h, current->mm, - &pseudo_vma, - mapping, index, 0); + hash = hugetlb_fault_mutex_hash(h, mapping, index, 0); mutex_lock(&hugetlb_fault_mutex_table[hash]); /* @@ -625,8 +623,7 @@ static long hugetlbfs_fallocate(struct f addr = index * hpage_size; /* mutex taken here, fault path and hole punch */ - hash = hugetlb_fault_mutex_hash(h, mm, &pseudo_vma, mapping, - index, addr); + hash = hugetlb_fault_mutex_hash(h, mapping, index, addr); mutex_lock(&hugetlb_fault_mutex_table[hash]); /* See if already present in mapping to avoid alloc/free */ --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -123,9 +123,7 @@ void move_hugetlb_state(struct page *old void free_huge_page(struct page *page); void hugetlb_fix_reserve_counts(struct inode *inode); extern struct mutex *hugetlb_fault_mutex_table; -u32 hugetlb_fault_mutex_hash(struct hstate *h, struct mm_struct *mm, - struct vm_area_struct *vma, - struct address_space *mapping, +u32 hugetlb_fault_mutex_hash(struct hstate *h, struct address_space *mapping, pgoff_t idx, unsigned long address); pte_t *huge_pmd_share(struct mm_struct *mm, unsigned long addr, pud_t *pud); --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -3777,8 +3777,7 @@ retry: * handling userfault. Reacquire after handling * fault to make calling code simpler. */ - hash = hugetlb_fault_mutex_hash(h, mm, vma, mapping, - idx, haddr); + hash = hugetlb_fault_mutex_hash(h, mapping, idx, haddr); mutex_unlock(&hugetlb_fault_mutex_table[hash]); ret = handle_userfault(&vmf, VM_UFFD_MISSING); mutex_lock(&hugetlb_fault_mutex_table[hash]); @@ -3886,21 +3885,14 @@ backout_unlocked: } #ifdef CONFIG_SMP -u32 hugetlb_fault_mutex_hash(struct hstate *h, struct mm_struct *mm, - struct vm_area_struct *vma, - struct address_space *mapping, +u32 hugetlb_fault_mutex_hash(struct hstate *h, struct address_space *mapping, pgoff_t idx, unsigned long address) { unsigned long key[2]; u32 hash; - if (vma->vm_flags & VM_SHARED) { - key[0] = (unsigned long) mapping; - key[1] = idx; - } else { - key[0] = (unsigned long) mm; - key[1] = address >> huge_page_shift(h); - } + key[0] = (unsigned long) mapping; + key[1] = idx; hash = jhash2((u32 *)&key, sizeof(key)/sizeof(u32), 0); @@ -3911,9 +3903,7 @@ u32 hugetlb_fault_mutex_hash(struct hsta * For uniprocesor systems we always use a single mutex, so just * return 0 and avoid the hashing overhead. */ -u32 hugetlb_fault_mutex_hash(struct hstate *h, struct mm_struct *mm, - struct vm_area_struct *vma, - struct address_space *mapping, +u32 hugetlb_fault_mutex_hash(struct hstate *h, struct address_space *mapping, pgoff_t idx, unsigned long address) { return 0; @@ -3958,7 +3948,7 @@ vm_fault_t hugetlb_fault(struct mm_struc * get spurious allocation failures if two CPUs race to instantiate * the same page in the page cache. */ - hash = hugetlb_fault_mutex_hash(h, mm, vma, mapping, idx, haddr); + hash = hugetlb_fault_mutex_hash(h, mapping, idx, haddr); mutex_lock(&hugetlb_fault_mutex_table[hash]); entry = huge_ptep_get(ptep); --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -271,8 +271,7 @@ retry: */ idx = linear_page_index(dst_vma, dst_addr); mapping = dst_vma->vm_file->f_mapping; - hash = hugetlb_fault_mutex_hash(h, dst_mm, dst_vma, mapping, - idx, dst_addr); + hash = hugetlb_fault_mutex_hash(h, mapping, idx, dst_addr); mutex_lock(&hugetlb_fault_mutex_table[hash]); err = -ENOMEM;