Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757585AbcC2Qa0 (ORCPT ); Tue, 29 Mar 2016 12:30:26 -0400 Received: from userp1040.oracle.com ([156.151.31.81]:27580 "EHLO userp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752689AbcC2QaY (ORCPT ); Tue, 29 Mar 2016 12:30:24 -0400 Subject: Re: [RFC PATCH 1/2] mm/hugetlbfs: Attempt PUD_SIZE mapping alignment if PMD sharing enabled To: Hillf Danton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, x86@kernel.org References: <1459213970-17957-1-git-send-email-mike.kravetz@oracle.com> <1459213970-17957-2-git-send-email-mike.kravetz@oracle.com> <024b01d1896e$2e600e70$8b202b50$@alibaba-inc.com> Cc: "'Hugh Dickins'" , "'Naoya Horiguchi'" , "'Kirill A. Shutemov'" , "'David Rientjes'" , "'Dave Hansen'" , "'Thomas Gleixner'" , "'Ingo Molnar'" , "'H. Peter Anvin'" , "'Catalin Marinas'" , "'Will Deacon'" , "'Steve Capper'" , "'Andrew Morton'" From: Mike Kravetz Message-ID: <56FAAD70.1020806@oracle.com> Date: Tue, 29 Mar 2016 09:29:36 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.6.0 MIME-Version: 1.0 In-Reply-To: <024b01d1896e$2e600e70$8b202b50$@alibaba-inc.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit X-Source-IP: aserv0021.oracle.com [141.146.126.233] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2790 Lines: 87 On 03/28/2016 08:50 PM, Hillf Danton wrote: >> >> When creating a hugetlb mapping, attempt PUD_SIZE alignment if the >> following conditions are met: >> - Address passed to mmap or shmat is NULL >> - The mapping is flaged as shared >> - The mapping is at least PUD_SIZE in length >> If a PUD_SIZE aligned mapping can not be created, then fall back to a >> huge page size mapping. >> >> Signed-off-by: Mike Kravetz >> --- >> fs/hugetlbfs/inode.c | 29 +++++++++++++++++++++++++++-- >> 1 file changed, 27 insertions(+), 2 deletions(-) >> >> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c >> index 540ddc9..22b2e38 100644 >> --- a/fs/hugetlbfs/inode.c >> +++ b/fs/hugetlbfs/inode.c >> @@ -175,6 +175,17 @@ hugetlb_get_unmapped_area(struct file *file, unsigned long addr, >> struct vm_area_struct *vma; >> struct hstate *h = hstate_file(file); >> struct vm_unmapped_area_info info; >> + bool pud_size_align = false; >> + unsigned long ret_addr; >> + >> + /* >> + * If PMD sharing is enabled, align to PUD_SIZE to facilitate >> + * sharing. Only attempt alignment if no address was passed in, >> + * flags indicate sharing and size is big enough. >> + */ >> + if (IS_ENABLED(CONFIG_ARCH_WANT_HUGE_PMD_SHARE) && >> + !addr && flags & MAP_SHARED && len >= PUD_SIZE) >> + pud_size_align = true; >> >> if (len & ~huge_page_mask(h)) >> return -EINVAL; >> @@ -199,9 +210,23 @@ hugetlb_get_unmapped_area(struct file *file, unsigned long addr, >> info.length = len; >> info.low_limit = TASK_UNMAPPED_BASE; >> info.high_limit = TASK_SIZE; >> - info.align_mask = PAGE_MASK & ~huge_page_mask(h); >> + if (pud_size_align) >> + info.align_mask = PAGE_MASK & (PUD_SIZE - 1); >> + else >> + info.align_mask = PAGE_MASK & ~huge_page_mask(h); >> info.align_offset = 0; >> - return vm_unmapped_area(&info); >> + ret_addr = vm_unmapped_area(&info); >> + >> + /* >> + * If failed with PUD_SIZE alignment, try again with huge page >> + * size alignment. >> + */ > > Can we avoid going another round as long as it is a file with > the PUD page size? Yes, that brings up a good point. Since we only do PMD sharing with PMD_SIZE huge pages, that should be part of the check as to whether we try PUD_SIZE alignment. The initial check should be expanded as follows: if (IS_ENABLED(CONFIG_ARCH_WANT_HUGE_PMD_SHARE) && !addr && flags & MAP_SHARED && huge_page_size(h) == PMD_SIZE && len >= PUD_SIZE) pud_size_align = true; In that case, pud_size_align remains false and we do not retry. -- Mike Kravetz > > Hillf >> + if ((ret_addr & ~PAGE_MASK) && pud_size_align) { >> + info.align_mask = PAGE_MASK & ~huge_page_mask(h); >> + ret_addr = vm_unmapped_area(&info); >> + } >> + >> + return ret_addr; >> } >> #endif >> >> -- >> 2.4.3 >