Received: by 2002:a05:7412:5112:b0:fa:6e18:a558 with SMTP id fm18csp360413rdb; Tue, 23 Jan 2024 01:45:05 -0800 (PST) X-Google-Smtp-Source: AGHT+IEsWDM7cwFmq9iu2hl5QQolrG41wMUAXod0gNBrYaE9G+zG2kCO/4JlJALYNEwYEjZ1SLKm X-Received: by 2002:a9d:6946:0:b0:6e0:c031:6814 with SMTP id p6-20020a9d6946000000b006e0c0316814mr6509298oto.10.1706003105596; Tue, 23 Jan 2024 01:45:05 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1706003105; cv=pass; d=google.com; s=arc-20160816; b=HC4UrYbqNv6rWzqDGJ4Bd0lWLrIbT2ELzl20KtrOMp59d8apOhDlBLGphQIjwb3KAD aZtiL4BYHCLwvEd8ZtC3ntvw0/6zeeM4G8FcyM1NJvT5xd890pUL620bro5pcxUGA2Za 2EXYOhBN9LKXURWSvclxEIfsaxa71yNjAX6nTDiYqNs7v/bgKvuUrpCJy8QiniL2Y05p OfryVbxFK/3VraHwPZKGkxvk81rYai5/pXEb3+Fp+iJ0L364wtGi9bkNt4Zy46SoaxLZ LnS7dMiZDW0tDr+cxnfLP9087K/XlYHydpzGr/Kq4YvJFKD2+sQ+5QTAZsYPvxzAitnE avPA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:date:message-id; bh=uY0ewIwBI9/ymLfqbTbrBie7HWQwuGiRrfJU+QIvMMw=; fh=zq5/iR0SAY5/LTlou3+xPQOMuVP4QRV+GfkbZJ/enmM=; b=ZPHum5VPYLsTDj9YZC34vHV9SkLojczHZVJjT2rxjFT3iExvoI67tH/tMbFz+asglV knK5g1FDaQTyZLN6OZxtJCq+qSxyLdH1bTgnJelu9jhHqPd02yPvOk/lyPLWXj+GkEZ+ MsArl+N7GGMq2sdQ1AXKoox5qfMV8+P9+sPdDqov0AvlIGMGMNzm4tvjsdsUxUx/cXAQ 4/32lFQwKbXX/meqL8126QSxW3GGnw/9SiAorzYMjiWpalPU5dlebrO2r6hxPLEM62Ud dGGpxJAw4j1gLh/sLgwj1HZ9mAxTOAMCB3tRS09uD4itewf1eTh866C+1dDbnsMYjD0m ZDOQ== ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1 spf=pass spfdomain=arm.com dmarc=pass fromdomain=arm.com); spf=pass (google.com: domain of linux-kernel+bounces-35012-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-35012-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org. [147.75.48.161]) by mx.google.com with ESMTPS id 188-20020a6300c5000000b005cf28d6d7d0si9608613pga.369.2024.01.23.01.45.05 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 23 Jan 2024 01:45:05 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-35012-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) client-ip=147.75.48.161; Authentication-Results: mx.google.com; arc=pass (i=1 spf=pass spfdomain=arm.com dmarc=pass fromdomain=arm.com); spf=pass (google.com: domain of linux-kernel+bounces-35012-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-35012-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sy.mirrors.kernel.org (Postfix) with ESMTPS id C7DD3B235DE for ; Tue, 23 Jan 2024 09:41:41 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id E1F935A0EE; Tue, 23 Jan 2024 09:41:35 +0000 (UTC) Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id C9EC75A0E6 for ; Tue, 23 Jan 2024 09:41:32 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706002895; cv=none; b=BkeMB4i+DIaLDELI+p4EsRZ5yu/9pd09o0MIQRiJ2OXcjgECvsvpnR2GRXJ9EGWygnaMUWmJZGP4hZHmwg+Mg5LjUNWO5kFzNaLQjsOS47MKMmlbKkhkTm8k7yjxyjeFTHuTJy26fh08JlufzKXkEmc9eZBQntQKiF+k6eDc7Vs= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706002895; c=relaxed/simple; bh=VPijt7WJ4Q6vpo4Ha3DbzXk8a9d5iHBUi63j2srWCJM=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=DEU/HEKgFILAEmzMFINNAhjeN+8B4cYXzdosxjoTWvSWvAEeLcA7JDzbKz3XWiH+/fzaX3ZqPuqKOAeefoPaQIal5c3YCAFv2ziVi+nLh4/Iik/fjKe8mzXlC0z+4vUDAAnLgGwhZQ/hfZPR14vKqTX234Oi8kEbhj+c/Md0Rts= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 881A91FB; Tue, 23 Jan 2024 01:42:17 -0800 (PST) Received: from [10.57.77.165] (unknown [10.57.77.165]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 05D3D3F5A1; Tue, 23 Jan 2024 01:41:30 -0800 (PST) Message-ID: Date: Tue, 23 Jan 2024 09:41:29 +0000 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [RESEND PATCH] mm: align larger anonymous mappings on THP boundaries Content-Language: en-GB To: Yang Shi Cc: Matthew Wilcox , Yang Shi , riel@surriel.com, cl@linux.com, akpm@linux-foundation.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org References: <20231214223423.1133074-1-yang@os.amperecomputing.com> <1e8f5ac7-54ce-433a-ae53-81522b2320e1@arm.com> From: Ryan Roberts In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit On 22/01/2024 19:43, Yang Shi wrote: > On Mon, Jan 22, 2024 at 3:37 AM Ryan Roberts wrote: >> >> On 20/01/2024 16:39, Matthew Wilcox wrote: >>> On Sat, Jan 20, 2024 at 12:04:27PM +0000, Ryan Roberts wrote: >>>> However, after this patch, each allocation is in its own VMA, and there is a 2M >>>> gap between each VMA. This causes 2 problems: 1) mmap becomes MUCH slower >>>> because there are so many VMAs to check to find a new 1G gap. 2) It fails once >>>> it hits the VMA limit (/proc/sys/vm/max_map_count). Hitting this limit then >>>> causes a subsequent calloc() to fail, which causes the test to fail. >>>> >>>> Looking at the code, I think the problem is that arm64 selects >>>> ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT. But __thp_get_unmapped_area() allocates >>>> len+2M then always aligns to the bottom of the discovered gap. That causes the >>>> 2M hole. As far as I can see, x86 allocates bottom up, so you don't get a hole. >>> >>> As a quick hack, perhaps >>> #ifdef ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT >>> take-the-top-half >>> #else >>> current-take-bottom-half-code >>> #endif >>> >>> ? > > Thanks for the suggestion. It makes sense to me. Doing the alignment > needs to take into account this. > >> >> There is a general problem though that there is a trade-off between abutting >> VMAs, and aligning them to PMD boundaries. This patch has decided that in >> general the latter is preferable. The case I'm hitting is special though, in >> that both requirements could be achieved but currently are not. >> >> The below fixes it, but I feel like there should be some bitwise magic that >> would give the correct answer without the conditional - but my head is gone and >> I can't see it. Any thoughts? > > Thanks Ryan for the patch. TBH I didn't see a bitwise magic without > the conditional either. > >> >> Beyond this, though, there is also a latent bug where the offset provided to >> mmap() is carried all the way through to the get_unmapped_area() >> impelementation, even for MAP_ANONYMOUS - I'm pretty sure we should be >> force-zeroing it for MAP_ANONYMOUS? Certainly before this change, for arches >> that use the default get_unmapped_area(), any non-zero offset would not have >> been used. But this change starts using it, which is incorrect. That said, there >> are some arches that override the default get_unmapped_area() and do use the >> offset. So I'm not sure if this is a bug or a feature that user space can pass >> an arbitrary value to the implementation for anon memory?? > > Thanks for noticing this. If I read the code correctly, the pgoff used > by some arches to workaround VIPT caches, and it looks like it is for > shared mapping only (just checked arm and mips). And I believe > everybody assumes 0 should be used when doing anonymous mapping. The > offset should have nothing to do with seeking proper unmapped virtual > area. But the pgoff does make sense for file THP due to the alignment > requirements. I think it should be zero'ed for anonymous mappings, > like: > > diff --git a/mm/mmap.c b/mm/mmap.c > index 2ff79b1d1564..a9ed353ce627 100644 > --- a/mm/mmap.c > +++ b/mm/mmap.c > @@ -1830,6 +1830,7 @@ get_unmapped_area(struct file *file, unsigned > long addr, unsigned long len, > pgoff = 0; > get_area = shmem_get_unmapped_area; > } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) { > + pgoff = 0; > /* Ensures that larger anonymous mappings are THP aligned. */ > get_area = thp_get_unmapped_area; > } I think it would be cleaner to just zero pgoff if file==NULL, then it covers the shared case, the THP case, and the non-THP case properly. I'll prepare a separate patch for this. > >> >> Finally, the second test failure I reported (ksm_tests) is actually caused by a >> bug in the test code, but provoked by this change. So I'll send out a fix for >> the test code separately. >> >> >> diff --git a/mm/huge_memory.c b/mm/huge_memory.c >> index 4f542444a91f..68ac54117c77 100644 >> --- a/mm/huge_memory.c >> +++ b/mm/huge_memory.c >> @@ -632,7 +632,7 @@ static unsigned long __thp_get_unmapped_area(struct file *filp, >> { >> loff_t off_end = off + len; >> loff_t off_align = round_up(off, size); >> - unsigned long len_pad, ret; >> + unsigned long len_pad, ret, off_sub; >> >> if (off_end <= off_align || (off_end - off_align) < size) >> return 0; >> @@ -658,7 +658,13 @@ static unsigned long __thp_get_unmapped_area(struct file *filp, >> if (ret == addr) >> return addr; >> >> - ret += (off - ret) & (size - 1); >> + off_sub = (off - ret) & (size - 1); >> + >> + if (current->mm->get_unmapped_area == arch_get_unmapped_area_topdown && >> + !off_sub) >> + return ret + size; >> + >> + ret += off_sub; >> return ret; >> } > > I didn't spot any problem, would you please come up with a formal patch? Yeah, I'll aim to post today.