Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp4035396pxb; Tue, 26 Jan 2021 10:37:45 -0800 (PST) X-Google-Smtp-Source: ABdhPJwywiWYenJFdK3+r17UFxO0nf1dRSxNOPvvd+mhVI1+CdZ5lzq2UJnpdU5IPpoSmHmYRegX X-Received: by 2002:aa7:d7c8:: with SMTP id e8mr5847290eds.355.1611686265612; Tue, 26 Jan 2021 10:37:45 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1611686265; cv=none; d=google.com; s=arc-20160816; b=YSKVRrJ3MBfGq2jUoE/gP034XAoES44sb9zsdfrSITECzFh/kYRh3acjHNbwR5xSNI 4t2ovr41KKY+K/vVcQNq3mkaNizgzjDxNC6m24wllF8Z/jlMtVvc9ilxjZ/8asYP5ads 55XZF85N2W75FlYYK8XOzABzU5Q94AnvCT2aEFadeog8N6m8lqgzJDjUewl7OSAQYEBv Xdew0ZGXWJbzjYqkoYzDyTd7RoZgUChwgsj79/4VtYjI+7KRFZp60UebWszw+IsYIAWr agQXq3EnDNCcL8OiSZjVb9H1aAIGdRPqKX9g6By3PhNatvY3qi/FP51Aq066bXHZOKMd YOMA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:message-id :mime-version:in-reply-to:references:cc:to:subject:from:date :dkim-signature; bh=SPAzPMHTUwMywoI0YQ3zzW5m1bqgTd/Dm/sAyJYiZoU=; b=FgtXGI3Xa0uWDI8mWN+Tjgkonar3VnrxJdTQ1dD7GDxFznL7JBIR9NIyKkzxFIKGOd EsWwFc8s8OJuby4ST23LOba4PuEv39iKPpPm5jwjWZBWHXot/lr2+JQeSD7e4QYbnuWw XlYs6WttpLKg6WbqwA9iMBCW5JV/tmN9Y2OI4Hyh5pGxFeHmROd3PmwG83qfbSTYf9uK +C83OTlZhElPNCC1ZJI1+JpxD/dP0I+IWxMMKatZ1x0LwFF2wHXZtl6ohaAzJv5Rud+R n8d8BokkjWWnV1sJNQYaV2xtzQzjenFRgcGYL7AZMvPulghcBGaYOSdAMCUdQPnR0enK gZ1w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=W2PrIvrI; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id e18si5345473ejx.73.2021.01.26.10.37.19; Tue, 26 Jan 2021 10:37:45 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=W2PrIvrI; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2405113AbhAZL3g (ORCPT + 99 others); Tue, 26 Jan 2021 06:29:36 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40504 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2391276AbhAZJsI (ORCPT ); Tue, 26 Jan 2021 04:48:08 -0500 Received: from mail-pg1-x535.google.com (mail-pg1-x535.google.com [IPv6:2607:f8b0:4864:20::535]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9357CC06174A; Tue, 26 Jan 2021 01:47:28 -0800 (PST) Received: by mail-pg1-x535.google.com with SMTP id g15so11134669pgu.9; Tue, 26 Jan 2021 01:47:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:subject:to:cc:references:in-reply-to:mime-version :message-id:content-transfer-encoding; bh=SPAzPMHTUwMywoI0YQ3zzW5m1bqgTd/Dm/sAyJYiZoU=; b=W2PrIvrIVmdbuaEdejTXL/7FLIJb/5mP1qpGdgnJtEEGw7NLZ+e/FI97ohHsWVDipd 5dF8C5mNdVkZUqB8JfAS7PVUERPWq0KWNurxULF9TUHwuvDN0DeZRbbdCL34p8Gvz4rL lmYEI8nq984fQtKOef6ZAELQJyH11zFN8f4BrI4A2zhXFdY6nTsAOBxwfPyggbwFRezg fqippu/BO5P96Mb8WjAFpfV2A8YAsRcRl8N4VBP7YTcQzZte9XHY+e/2unQxKMhd6eSy UmIiqO7KXZL6uSq5C2Wr8+0OKovo+QEy7YhcRTzibFN1n3rz7n4oqURRM4erA2+Pfj+/ 3vRA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:subject:to:cc:references:in-reply-to :mime-version:message-id:content-transfer-encoding; bh=SPAzPMHTUwMywoI0YQ3zzW5m1bqgTd/Dm/sAyJYiZoU=; b=JyN2bTF+sm2Xr84SuTvJoEZbff4ya2QqFWyrbpx9lTUku2W2ZNGGKxGYjEAe4VAaLu x3k2sXaaZRCzSErK54himzCcZXARTNZT9F7dExq0pzdijOMzcUaMweJJZQayYigOk82v 3nd4LJAwSEOe3fynxUxZZK32bBb6K/KTOZYo5aCUGXSEK3f2+UWhIX0ztxgkF/BCTcbL Oav6Epsh50Ty6ZAg/sVEmLl4CH9B8KhW5VZdMfkPuWmT3De8VkgS1QkviVluNY/7moJF FDZcWamuhzG83fTHNvKMl2yEmQBST5tc07SYjCMuwYeTlvfDG/esHMDhM0DGkA65Z4vk pqZg== X-Gm-Message-State: AOAM530Q1JNy9CCaTdVjDBkPf+P9mGg+OrjNAry4yJv6jpGH7ffRMmAs AeyVFcXXRBoUqyeIOXOSNWM= X-Received: by 2002:a62:54c3:0:b029:1bc:731c:dfc1 with SMTP id i186-20020a6254c30000b02901bc731cdfc1mr4627362pfb.20.1611654448148; Tue, 26 Jan 2021 01:47:28 -0800 (PST) Received: from localhost (192.156.221.203.dial.dynamic.acc50-nort-cbr.comindico.com.au. [203.221.156.192]) by smtp.gmail.com with ESMTPSA id v21sm1095752pfn.80.2021.01.26.01.47.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 26 Jan 2021 01:47:27 -0800 (PST) Date: Tue, 26 Jan 2021 19:47:22 +1000 From: Nicholas Piggin Subject: Re: [PATCH v11 12/13] mm/vmalloc: Hugepage vmalloc mappings To: Andrew Morton , Ding Tianhong , linux-mm@kvack.org Cc: Christophe Leroy , Christoph Hellwig , Jonathan Cameron , linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, Rick Edgecombe References: <20210126044510.2491820-1-npiggin@gmail.com> <20210126044510.2491820-13-npiggin@gmail.com> <0f360e6e-6d34-19ce-6c76-a17a5f4f7fc3@huawei.com> In-Reply-To: <0f360e6e-6d34-19ce-6c76-a17a5f4f7fc3@huawei.com> MIME-Version: 1.0 Message-Id: <1611653945.t3oot63nwn.astroid@bobo.none> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Excerpts from Ding Tianhong's message of January 26, 2021 4:59 pm: > On 2021/1/26 12:45, Nicholas Piggin wrote: >> Support huge page vmalloc mappings. Config option HAVE_ARCH_HUGE_VMALLOC >> enables support on architectures that define HAVE_ARCH_HUGE_VMAP and >> supports PMD sized vmap mappings. >>=20 >> vmalloc will attempt to allocate PMD-sized pages if allocating PMD size >> or larger, and fall back to small pages if that was unsuccessful. >>=20 >> Architectures must ensure that any arch specific vmalloc allocations >> that require PAGE_SIZE mappings (e.g., module allocations vs strict >> module rwx) use the VM_NOHUGE flag to inhibit larger mappings. >>=20 >> When hugepage vmalloc mappings are enabled in the next patch, this >> reduces TLB misses by nearly 30x on a `git diff` workload on a 2-node >> POWER9 (59,800 -> 2,100) and reduces CPU cycles by 0.54%. >>=20 >> This can result in more internal fragmentation and memory overhead for a >> given allocation, an option nohugevmalloc is added to disable at boot. >>=20 >> Signed-off-by: Nicholas Piggin >> --- >> arch/Kconfig | 11 ++ >> include/linux/vmalloc.h | 21 ++++ >> mm/page_alloc.c | 5 +- >> mm/vmalloc.c | 215 +++++++++++++++++++++++++++++++--------- >> 4 files changed, 205 insertions(+), 47 deletions(-) >>=20 >> diff --git a/arch/Kconfig b/arch/Kconfig >> index 24862d15f3a3..eef170e0c9b8 100644 >> --- a/arch/Kconfig >> +++ b/arch/Kconfig >> @@ -724,6 +724,17 @@ config HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD >> config HAVE_ARCH_HUGE_VMAP >> bool >> =20 >> +# >> +# Archs that select this would be capable of PMD-sized vmaps (i.e., >> +# arch_vmap_pmd_supported() returns true), and they must make no assum= ptions >> +# that vmalloc memory is mapped with PAGE_SIZE ptes. The VM_NO_HUGE_VM= AP flag >> +# can be used to prohibit arch-specific allocations from using hugepag= es to >> +# help with this (e.g., modules may require it). >> +# >> +config HAVE_ARCH_HUGE_VMALLOC >> + depends on HAVE_ARCH_HUGE_VMAP >> + bool >> + >> config ARCH_WANT_HUGE_PMD_SHARE >> bool >> =20 >> diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h >> index 99ea72d547dc..93270adf5db5 100644 >> --- a/include/linux/vmalloc.h >> +++ b/include/linux/vmalloc.h >> @@ -25,6 +25,7 @@ struct notifier_block; /* in notifier.h */ >> #define VM_NO_GUARD 0x00000040 /* don't add guard page */ >> #define VM_KASAN 0x00000080 /* has allocated kasan shadow memory = */ >> #define VM_MAP_PUT_PAGES 0x00000100 /* put pages and free array in vfre= e */ >> +#define VM_NO_HUGE_VMAP 0x00000200 /* force PAGE_SIZE pte mapping */ >>=20 >> /* >> * VM_KASAN is used slighly differently depending on CONFIG_KASAN_VMALL= OC. >> @@ -59,6 +60,9 @@ struct vm_struct { >> unsigned long size; >> unsigned long flags; >> struct page **pages; >> +#ifdef CONFIG_HAVE_ARCH_HUGE_VMALLOC >> + unsigned int page_order; >> +#endif >> unsigned int nr_pages; >> phys_addr_t phys_addr; >> const void *caller; > Hi Nicholas: >=20 > Give a suggestion :) >=20 > The page order was only used to indicate the huge page flag for vm area, = and only valid when > size bigger than PMD_SIZE, so can we use the vm flgas to instead of that,= just like define the > new flag named VM_HUGEPAGE, it would not break the vm struct, and it is e= asier for me to backport the serious > patches to our own branches. (Base on the lts version). Hmm, it might be possible. I'm not sure if 1GB vmallocs will be used any=20 time soon (or maybe they will for edge case configurations? It would be=20 trivial to add support for). The other concern I have is that Christophe IIRC was asking about=20 implementing a mapping for PPC which used TLB mappings that were=20 different than kernel page table tree size. Although I guess we could=20 deal with that when it comes. I like the flexibility of page_order though. How hard would it be for=20 you to do the backport with VM_HUGEPAGE yourself? I should also say, thanks for all the review and testing from the Huawei=20 team. Do you have an x86 patch? Thanks, Nick