2021-12-27 14:49:48

by Kefeng Wang

[permalink] [raw]
Subject: [PATCH v2 0/3] mm: support huge vmalloc mapping on arm64/x86

Huge vmalloc mappings is supported on PPC[1], but this feature should
be not only used on PPC, it could be used on arch support HAVE_ARCH_HUGE_VMAP
and PMD sized vmap mappings. this patchset is to enable this feature
on arm64/x86.

There are some disadvantages about this feature[2], one of the main
concerns is the possible memory fragmentation/waste in some scenarios,
also archs must ensure that any arch specific vmalloc allocations that
require PAGE_SIZE mappings(eg, module alloc with STRICT_MODULE_RWX)
use the VM_NO_HUGE_VMAP flag to inhibit larger mappings.

Based on the above considerations, we add the first patch is to let
user to control huge vmalloc mapping default behavior. Meanwhile,
add new kernel parameter hugevmalloc=on/off to enable/disable this
feature at boot time, nohugevmalloc parameter is still supported.

The later two patches to enable this feature on arm64/x86, select
HAVE_ARCH_HUGE_VMALLOC and mark VM_NO_HUGE_VMAP in arch's module_alloc().

This patchset based on next-20211224.

v2:
- Default y for HUGE_VMALLOC_DEFAULT_ENABLED, not only select it on PPC
- Fix copy/type error
- Mark VM_NO_HUGE_VMAP in module_alloc() on arm64/x86

[1] https://lore.kernel.org/linux-mm/[email protected]/
[2] https://lore.kernel.org/linux-mm/[email protected]/

Kefeng Wang (3):
mm: vmalloc: Let user to control huge vmalloc default behavior
arm64: Support huge vmalloc mappings
x86: Support huge vmalloc mappings

.../admin-guide/kernel-parameters.txt | 14 +++++++++++++-
arch/arm64/Kconfig | 1 +
arch/arm64/kernel/module.c | 5 +++--
arch/x86/Kconfig | 1 +
arch/x86/kernel/module.c | 4 ++--
mm/Kconfig | 8 ++++++++
mm/vmalloc.c | 18 +++++++++++++++++-
7 files changed, 45 insertions(+), 6 deletions(-)

--
2.26.2



2021-12-27 14:49:48

by Kefeng Wang

[permalink] [raw]
Subject: [PATCH v2 2/3] arm64: Support huge vmalloc mappings

This patch select HAVE_ARCH_HUGE_VMALLOC to let arm64 support huge
vmalloc mappings.

Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Kefeng Wang <[email protected]>
---
Documentation/admin-guide/kernel-parameters.txt | 4 ++--
arch/arm64/Kconfig | 1 +
arch/arm64/kernel/module.c | 5 +++--
3 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 7b2f900fd243..e3f9fd7ec106 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1639,7 +1639,7 @@
precedence over memory_hotplug.memmap_on_memory.


- hugevmalloc= [PPC] Reguires CONFIG_HAVE_ARCH_HUGE_VMALLOC
+ hugevmalloc= [KNL,PPC,ARM64] Reguires CONFIG_HAVE_ARCH_HUGE_VMALLOC
Format: { on | off }
Default set by CONFIG_HUGE_VMALLOC_DEFAULT_ENABLED.

@@ -3424,7 +3424,7 @@

nohugeiomap [KNL,X86,PPC,ARM64] Disable kernel huge I/O mappings.

- nohugevmalloc [PPC] Disable kernel huge vmalloc mappings.
+ nohugevmalloc [KNL,PPC,ARM64] Disable kernel huge vmalloc mappings.

nosmt [KNL,S390] Disable symmetric multithreading (SMT).
Equivalent to smt=1.
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 3bb0b67292b5..c34bbb4482b0 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -142,6 +142,7 @@ config ARM64
select HAVE_ARCH_AUDITSYSCALL
select HAVE_ARCH_BITREVERSE
select HAVE_ARCH_COMPILER_H
+ select HAVE_ARCH_HUGE_VMALLOC
select HAVE_ARCH_HUGE_VMAP
select HAVE_ARCH_JUMP_LABEL
select HAVE_ARCH_JUMP_LABEL_RELATIVE
diff --git a/arch/arm64/kernel/module.c b/arch/arm64/kernel/module.c
index 309a27553c87..af7b4cbace2b 100644
--- a/arch/arm64/kernel/module.c
+++ b/arch/arm64/kernel/module.c
@@ -36,7 +36,8 @@ void *module_alloc(unsigned long size)
module_alloc_end = MODULES_END;

p = __vmalloc_node_range(size, MODULE_ALIGN, module_alloc_base,
- module_alloc_end, gfp_mask, PAGE_KERNEL, VM_DEFER_KMEMLEAK,
+ module_alloc_end, gfp_mask, PAGE_KERNEL,
+ VM_DEFER_KMEMLEAK | VM_NO_HUGE_VMAP,
NUMA_NO_NODE, __builtin_return_address(0));

if (!p && IS_ENABLED(CONFIG_ARM64_MODULE_PLTS) &&
@@ -55,7 +56,7 @@ void *module_alloc(unsigned long size)
*/
p = __vmalloc_node_range(size, MODULE_ALIGN, module_alloc_base,
module_alloc_base + SZ_2G, GFP_KERNEL,
- PAGE_KERNEL, 0, NUMA_NO_NODE,
+ PAGE_KERNEL, VM_NO_HUGE_VMAP, NUMA_NO_NODE,
__builtin_return_address(0));

if (p && (kasan_module_alloc(p, size, gfp_mask) < 0)) {
--
2.26.2


2021-12-27 14:49:52

by Kefeng Wang

[permalink] [raw]
Subject: [PATCH v2 1/3] mm: vmalloc: Let user to control huge vmalloc default behavior

Introduce HUGE_VMALLOC_DEFAULT_ENABLED and make it default y, this
let user to choose whether or not enable huge vmalloc mappings by
default.

Meanwhile, add new hugevmalloc=on/off parameter to enable or disable
this feature at boot time, nohugevmalloc is still supported and
equivalent to hugevmalloc=off.

Signed-off-by: Kefeng Wang <[email protected]>
---
.../admin-guide/kernel-parameters.txt | 12 ++++++++++++
mm/Kconfig | 8 ++++++++
mm/vmalloc.c | 18 +++++++++++++++++-
3 files changed, 37 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index a069d8fe2fee..7b2f900fd243 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1638,6 +1638,18 @@
If both parameters are enabled, hugetlb_free_vmemmap takes
precedence over memory_hotplug.memmap_on_memory.

+
+ hugevmalloc= [PPC] Reguires CONFIG_HAVE_ARCH_HUGE_VMALLOC
+ Format: { on | off }
+ Default set by CONFIG_HUGE_VMALLOC_DEFAULT_ENABLED.
+
+ This parameter enables/disables kernel huge vmalloc
+ mappings at boot time.
+
+ on: Enable the feature
+ off: Disable the feature
+ Equivalent to: nohugevmalloc
+
hung_task_panic=
[KNL] Should the hung task detector generate panics.
Format: 0 | 1
diff --git a/mm/Kconfig b/mm/Kconfig
index a99bd499ef51..8d8a92f22905 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -262,6 +262,14 @@ config HUGETLB_PAGE_SIZE_VARIABLE
HUGETLB_PAGE_ORDER when there are multiple HugeTLB page sizes available
on a platform.

+config HUGE_VMALLOC_DEFAULT_ENABLED
+ bool "Enable huge vmalloc mappings by default"
+ default y
+ depends on HAVE_ARCH_HUGE_VMALLOC
+ help
+ Enable huge vmalloc mappings by default, this value could be overridden
+ by hugevmalloc=off|on.
+
config CONTIG_ALLOC
def_bool (MEMORY_ISOLATION && COMPACTION) || CMA

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 9bf838817a47..0d0f8deb5639 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -60,7 +60,7 @@ static const unsigned int ioremap_max_page_shift = PAGE_SHIFT;
#endif /* CONFIG_HAVE_ARCH_HUGE_VMAP */

#ifdef CONFIG_HAVE_ARCH_HUGE_VMALLOC
-static bool __ro_after_init vmap_allow_huge = true;
+static bool __ro_after_init vmap_allow_huge = IS_ENABLED(CONFIG_HUGE_VMALLOC_DEFAULT_ENABLED);

static int __init set_nohugevmalloc(char *str)
{
@@ -68,6 +68,22 @@ static int __init set_nohugevmalloc(char *str)
return 0;
}
early_param("nohugevmalloc", set_nohugevmalloc);
+
+static int __init set_hugevmalloc(char *str)
+{
+ if (!str)
+ return -EINVAL;
+
+ if (!strcmp(str, "on"))
+ vmap_allow_huge = true;
+ else if (!strcmp(str, "off"))
+ vmap_allow_huge = false;
+ else
+ return -EINVAL;
+
+ return 0;
+}
+early_param("hugevmalloc", set_hugevmalloc);
#else /* CONFIG_HAVE_ARCH_HUGE_VMALLOC */
static const bool vmap_allow_huge = false;
#endif /* CONFIG_HAVE_ARCH_HUGE_VMALLOC */
--
2.26.2


2021-12-27 14:49:54

by Kefeng Wang

[permalink] [raw]
Subject: [PATCH v2 3/3] x86: Support huge vmalloc mappings

This patch select HAVE_ARCH_HUGE_VMALLOC to let X86_64 and X86_PAE
support huge vmalloc mappings.

Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Signed-off-by: Kefeng Wang <[email protected]>
---
Documentation/admin-guide/kernel-parameters.txt | 4 ++--
arch/x86/Kconfig | 1 +
arch/x86/kernel/module.c | 4 ++--
3 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index e3f9fd7ec106..ffce6591ae64 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1639,7 +1639,7 @@
precedence over memory_hotplug.memmap_on_memory.


- hugevmalloc= [KNL,PPC,ARM64] Reguires CONFIG_HAVE_ARCH_HUGE_VMALLOC
+ hugevmalloc= [KNL,PPC,ARM64,X86] Reguires CONFIG_HAVE_ARCH_HUGE_VMALLOC
Format: { on | off }
Default set by CONFIG_HUGE_VMALLOC_DEFAULT_ENABLED.

@@ -3424,7 +3424,7 @@

nohugeiomap [KNL,X86,PPC,ARM64] Disable kernel huge I/O mappings.

- nohugevmalloc [KNL,PPC,ARM64] Disable kernel huge vmalloc mappings.
+ nohugevmalloc [KNL,PPC,ARM64,X86] Disable kernel huge vmalloc mappings.

nosmt [KNL,S390] Disable symmetric multithreading (SMT).
Equivalent to smt=1.
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index ebe8fc76949a..f6bf6675bbe7 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -157,6 +157,7 @@ config X86
select HAVE_ACPI_APEI_NMI if ACPI
select HAVE_ALIGNED_STRUCT_PAGE if SLUB
select HAVE_ARCH_AUDITSYSCALL
+ select HAVE_ARCH_HUGE_VMALLOC if HAVE_ARCH_HUGE_VMAP
select HAVE_ARCH_HUGE_VMAP if X86_64 || X86_PAE
select HAVE_ARCH_JUMP_LABEL
select HAVE_ARCH_JUMP_LABEL_RELATIVE
diff --git a/arch/x86/kernel/module.c b/arch/x86/kernel/module.c
index 95fa745e310a..6bf5cb7d876a 100644
--- a/arch/x86/kernel/module.c
+++ b/arch/x86/kernel/module.c
@@ -75,8 +75,8 @@ void *module_alloc(unsigned long size)

p = __vmalloc_node_range(size, MODULE_ALIGN,
MODULES_VADDR + get_module_load_offset(),
- MODULES_END, gfp_mask,
- PAGE_KERNEL, VM_DEFER_KMEMLEAK, NUMA_NO_NODE,
+ MODULES_END, gfp_mask, PAGE_KERNEL,
+ VM_DEFER_KMEMLEAK | VM_NO_HUGE_VMAP, NUMA_NO_NODE,
__builtin_return_address(0));
if (p && (kasan_module_alloc(p, size, gfp_mask) < 0)) {
vfree(p);
--
2.26.2


2021-12-27 16:00:42

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH v2 3/3] x86: Support huge vmalloc mappings

On 12/27/21 6:59 AM, Kefeng Wang wrote:
> This patch select HAVE_ARCH_HUGE_VMALLOC to let X86_64 and X86_PAE
> support huge vmalloc mappings.

In general, this seems interesting and the diff is simple. But, I don't
see _any_ x86-specific data. I think the bare minimum here would be a
few kernel compiles and some 'perf stat' data for some TLB events.

> diff --git a/arch/x86/kernel/module.c b/arch/x86/kernel/module.c
> index 95fa745e310a..6bf5cb7d876a 100644
> --- a/arch/x86/kernel/module.c
> +++ b/arch/x86/kernel/module.c
> @@ -75,8 +75,8 @@ void *module_alloc(unsigned long size)
>
> p = __vmalloc_node_range(size, MODULE_ALIGN,
> MODULES_VADDR + get_module_load_offset(),
> - MODULES_END, gfp_mask,
> - PAGE_KERNEL, VM_DEFER_KMEMLEAK, NUMA_NO_NODE,
> + MODULES_END, gfp_mask, PAGE_KERNEL,
> + VM_DEFER_KMEMLEAK | VM_NO_HUGE_VMAP, NUMA_NO_NODE,
> __builtin_return_address(0));
> if (p && (kasan_module_alloc(p, size, gfp_mask) < 0)) {
> vfree(p);

To figure out what's going on in this hunk, I had to look at the cover
letter (which I wasn't cc'd on). That's not great and it means that
somebody who stumbles upon this in the code is going to have a really
hard time figuring out what is going on. Cover letters don't make it
into git history.

This desperately needs a comment and some changelog material in *this*
patch.

But, even the description from the cover letter is sparse:

> There are some disadvantages about this feature[2], one of the main
> concerns is the possible memory fragmentation/waste in some scenarios,
> also archs must ensure that any arch specific vmalloc allocations that
> require PAGE_SIZE mappings(eg, module alloc with STRICT_MODULE_RWX)
> use the VM_NO_HUGE_VMAP flag to inhibit larger mappings.

That just says that x86 *needs* PAGE_SIZE allocations. But, what
happens if VM_NO_HUGE_VMAP is not passed (like it was in v1)? Will the
subsequent permission changes just fragment the 2M mapping?

2021-12-27 17:36:55

by William Kucharski

[permalink] [raw]
Subject: Re: (No subject)

You should also fix the existing typo in the documentation (inline):

> On Dec 27, 2021, at 07:49, Kefeng Wang <[email protected]> wrote:
>
> This patch select HAVE_ARCH_HUGE_VMALLOC to let arm64 support huge
> vmalloc mappings.
>
> Cc: Catalin Marinas <[email protected]>
> Cc: Will Deacon <[email protected]>
> Signed-off-by: Kefeng Wang <[email protected]>
> ---
> Documentation/admin-guide/kernel-parameters.txt | 4 ++--
> arch/arm64/Kconfig | 1 +
> arch/arm64/kernel/module.c | 5 +++--
> 3 files changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 7b2f900fd243..e3f9fd7ec106 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -1639,7 +1639,7 @@
> precedence over memory_hotplug.memmap_on_memory.
>
>
> - hugevmalloc= [PPC] Reguires CONFIG_HAVE_ARCH_HUGE_VMALLOC
> + hugevmalloc= [KNL,PPC,ARM64] Reguires CONFIG_HAVE_ARCH_HUGE_VMALLOC
> Format: { on | off }
> Default set by CONFIG_HUGE_VMALLOC_DEFAULT_ENABLED.

"Reguires" should be "Requires."

>
> @@ -3424,7 +3424,7 @@
>
> nohugeiomap [KNL,X86,PPC,ARM64] Disable kernel huge I/O mappings.
>
> - nohugevmalloc [PPC] Disable kernel huge vmalloc mappings.
> + nohugevmalloc [KNL,PPC,ARM64] Disable kernel huge vmalloc mappings.
>
> nosmt [KNL,S390] Disable symmetric multithreading (SMT).
> Equivalent to smt=1.
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 3bb0b67292b5..c34bbb4482b0 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -142,6 +142,7 @@ config ARM64
> select HAVE_ARCH_AUDITSYSCALL
> select HAVE_ARCH_BITREVERSE
> select HAVE_ARCH_COMPILER_H
> + select HAVE_ARCH_HUGE_VMALLOC
> select HAVE_ARCH_HUGE_VMAP
> select HAVE_ARCH_JUMP_LABEL
> select HAVE_ARCH_JUMP_LABEL_RELATIVE
> diff --git a/arch/arm64/kernel/module.c b/arch/arm64/kernel/module.c
> index 309a27553c87..af7b4cbace2b 100644
> --- a/arch/arm64/kernel/module.c
> +++ b/arch/arm64/kernel/module.c
> @@ -36,7 +36,8 @@ void *module_alloc(unsigned long size)
> module_alloc_end = MODULES_END;
>
> p = __vmalloc_node_range(size, MODULE_ALIGN, module_alloc_base,
> - module_alloc_end, gfp_mask, PAGE_KERNEL, VM_DEFER_KMEMLEAK,
> + module_alloc_end, gfp_mask, PAGE_KERNEL,
> + VM_DEFER_KMEMLEAK | VM_NO_HUGE_VMAP,
> NUMA_NO_NODE, __builtin_return_address(0));
>
> if (!p && IS_ENABLED(CONFIG_ARM64_MODULE_PLTS) &&
> @@ -55,7 +56,7 @@ void *module_alloc(unsigned long size)
> */
> p = __vmalloc_node_range(size, MODULE_ALIGN, module_alloc_base,
> module_alloc_base + SZ_2G, GFP_KERNEL,
> - PAGE_KERNEL, 0, NUMA_NO_NODE,
> + PAGE_KERNEL, VM_NO_HUGE_VMAP, NUMA_NO_NODE,
> __builtin_return_address(0));
>
> if (p && (kasan_module_alloc(p, size, gfp_mask) < 0)) {
> --
> 2.26.2
>
>

2021-12-28 01:37:00

by Kefeng Wang

[permalink] [raw]
Subject: Re: (No subject)


On 2021/12/28 1:35, William Kucharski wrote:
> You should also fix the existing typo in the documentation (inline):
>
>> On Dec 27, 2021, at 07:49, Kefeng Wang <[email protected]> wrote:
>>
>> This patch select HAVE_ARCH_HUGE_VMALLOC to let arm64 support huge
>> vmalloc mappings.
>>
>> Cc: Catalin Marinas <[email protected]>
>> Cc: Will Deacon <[email protected]>
>> Signed-off-by: Kefeng Wang <[email protected]>
>> ---
>> Documentation/admin-guide/kernel-parameters.txt | 4 ++--
>> arch/arm64/Kconfig | 1 +
>> arch/arm64/kernel/module.c | 5 +++--
>> 3 files changed, 6 insertions(+), 4 deletions(-)
>>
>> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
>> index 7b2f900fd243..e3f9fd7ec106 100644
>> --- a/Documentation/admin-guide/kernel-parameters.txt
>> +++ b/Documentation/admin-guide/kernel-parameters.txt
>> @@ -1639,7 +1639,7 @@
>> precedence over memory_hotplug.memmap_on_memory.
>>
>>
>> - hugevmalloc= [PPC] Reguires CONFIG_HAVE_ARCH_HUGE_VMALLOC
>> + hugevmalloc= [KNL,PPC,ARM64] Reguires CONFIG_HAVE_ARCH_HUGE_VMALLOC
>> Format: { on | off }
>> Default set by CONFIG_HUGE_VMALLOC_DEFAULT_ENABLED.
> "Reguires" should be "Requires."
Will fix, thanks.
>
>> @@ -3424,7 +3424,7 @@
>>
>> nohugeiomap [KNL,X86,PPC,ARM64] Disable kernel huge I/O mappings.
>>
>> - nohugevmalloc [PPC] Disable kernel huge vmalloc mappings.
>> + nohugevmalloc [KNL,PPC,ARM64] Disable kernel huge vmalloc mappings.
>>
>> nosmt [KNL,S390] Disable symmetric multithreading (SMT).
>> Equivalent to smt=1.
>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>> index 3bb0b67292b5..c34bbb4482b0 100644
>> --- a/arch/arm64/Kconfig
>> +++ b/arch/arm64/Kconfig
>> @@ -142,6 +142,7 @@ config ARM64
>> select HAVE_ARCH_AUDITSYSCALL
>> select HAVE_ARCH_BITREVERSE
>> select HAVE_ARCH_COMPILER_H
>> + select HAVE_ARCH_HUGE_VMALLOC
>> select HAVE_ARCH_HUGE_VMAP
>> select HAVE_ARCH_JUMP_LABEL
>> select HAVE_ARCH_JUMP_LABEL_RELATIVE
>> diff --git a/arch/arm64/kernel/module.c b/arch/arm64/kernel/module.c
>> index 309a27553c87..af7b4cbace2b 100644
>> --- a/arch/arm64/kernel/module.c
>> +++ b/arch/arm64/kernel/module.c
>> @@ -36,7 +36,8 @@ void *module_alloc(unsigned long size)
>> module_alloc_end = MODULES_END;
>>
>> p = __vmalloc_node_range(size, MODULE_ALIGN, module_alloc_base,
>> - module_alloc_end, gfp_mask, PAGE_KERNEL, VM_DEFER_KMEMLEAK,
>> + module_alloc_end, gfp_mask, PAGE_KERNEL,
>> + VM_DEFER_KMEMLEAK | VM_NO_HUGE_VMAP,
>> NUMA_NO_NODE, __builtin_return_address(0));
>>
>> if (!p && IS_ENABLED(CONFIG_ARM64_MODULE_PLTS) &&
>> @@ -55,7 +56,7 @@ void *module_alloc(unsigned long size)
>> */
>> p = __vmalloc_node_range(size, MODULE_ALIGN, module_alloc_base,
>> module_alloc_base + SZ_2G, GFP_KERNEL,
>> - PAGE_KERNEL, 0, NUMA_NO_NODE,
>> + PAGE_KERNEL, VM_NO_HUGE_VMAP, NUMA_NO_NODE,
>> __builtin_return_address(0));
>>
>> if (p && (kasan_module_alloc(p, size, gfp_mask) < 0)) {
>> --
>> 2.26.2
>>
>>

2021-12-28 10:26:46

by Kefeng Wang

[permalink] [raw]
Subject: Re: [PATCH v2 3/3] x86: Support huge vmalloc mappings


On 2021/12/27 23:56, Dave Hansen wrote:
> On 12/27/21 6:59 AM, Kefeng Wang wrote:
>> This patch select HAVE_ARCH_HUGE_VMALLOC to let X86_64 and X86_PAE
>> support huge vmalloc mappings.
> In general, this seems interesting and the diff is simple. But, I don't
> see _any_ x86-specific data. I think the bare minimum here would be a
> few kernel compiles and some 'perf stat' data for some TLB events.

When the feature supported on ppc,

commit 8abddd968a303db75e4debe77a3df484164f1f33
Author: Nicholas Piggin <[email protected]>
Date:   Mon May 3 19:17:55 2021 +1000

    powerpc/64s/radix: Enable huge vmalloc mappings

    This reduces TLB misses by nearly 30x on a `git diff` workload on a
    2-node POWER9 (59,800 -> 2,100) and reduces CPU cycles by 0.54%, due
    to vfs hashes being allocated with 2MB pages.

But the data could be different on different machine/arch.

>> diff --git a/arch/x86/kernel/module.c b/arch/x86/kernel/module.c
>> index 95fa745e310a..6bf5cb7d876a 100644
>> --- a/arch/x86/kernel/module.c
>> +++ b/arch/x86/kernel/module.c
>> @@ -75,8 +75,8 @@ void *module_alloc(unsigned long size)
>>
>> p = __vmalloc_node_range(size, MODULE_ALIGN,
>> MODULES_VADDR + get_module_load_offset(),
>> - MODULES_END, gfp_mask,
>> - PAGE_KERNEL, VM_DEFER_KMEMLEAK, NUMA_NO_NODE,
>> + MODULES_END, gfp_mask, PAGE_KERNEL,
>> + VM_DEFER_KMEMLEAK | VM_NO_HUGE_VMAP, NUMA_NO_NODE,
>> __builtin_return_address(0));
>> if (p && (kasan_module_alloc(p, size, gfp_mask) < 0)) {
>> vfree(p);
> To figure out what's going on in this hunk, I had to look at the cover
> letter (which I wasn't cc'd on). That's not great and it means that
> somebody who stumbles upon this in the code is going to have a really
> hard time figuring out what is going on. Cover letters don't make it
> into git history.
Sorry for that, will add more into arch's patch changelog.
> This desperately needs a comment and some changelog material in *this*
> patch.
>
> But, even the description from the cover letter is sparse:
>
>> There are some disadvantages about this feature[2], one of the main
>> concerns is the possible memory fragmentation/waste in some scenarios,
>> also archs must ensure that any arch specific vmalloc allocations that
>> require PAGE_SIZE mappings(eg, module alloc with STRICT_MODULE_RWX)
>> use the VM_NO_HUGE_VMAP flag to inhibit larger mappings.
> That just says that x86 *needs* PAGE_SIZE allocations. But, what
> happens if VM_NO_HUGE_VMAP is not passed (like it was in v1)? Will the
> subsequent permission changes just fragment the 2M mapping?
> .

Yes, without VM_NO_HUGE_VMAP, it could fragment the 2M mapping.

When module alloc with STRICT_MODULE_RWX on x86, it calls
__change_page_attr()

from set_memory_ro/rw/nx which will split large page, so there is no
need to make

module alloc with HUGE_VMALLOC.

>
>
>
>



2021-12-28 16:15:07

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH v2 3/3] x86: Support huge vmalloc mappings

On 12/28/21 2:26 AM, Kefeng Wang wrote:
>>> There are some disadvantages about this feature[2], one of the main
>>> concerns is the possible memory fragmentation/waste in some scenarios,
>>> also archs must ensure that any arch specific vmalloc allocations that
>>> require PAGE_SIZE mappings(eg, module alloc with STRICT_MODULE_RWX)
>>> use the VM_NO_HUGE_VMAP flag to inhibit larger mappings.
>> That just says that x86 *needs* PAGE_SIZE allocations.  But, what
>> happens if VM_NO_HUGE_VMAP is not passed (like it was in v1)?  Will the
>> subsequent permission changes just fragment the 2M mapping?
>
> Yes, without VM_NO_HUGE_VMAP, it could fragment the 2M mapping.
>
> When module alloc with STRICT_MODULE_RWX on x86, it calls
> __change_page_attr()
>
> from set_memory_ro/rw/nx which will split large page, so there is no
> need to make
>
> module alloc with HUGE_VMALLOC.

This all sounds very fragile to me. Every time a new architecture would
get added for huge vmalloc() support, the developer needs to know to go
find that architecture's module_alloc() and add this flag. They next
guy is going to forget, just like you did.

Considering that this is not a hot path, a weak function would be a nice
choice:

/* vmalloc() flags used for all module allocations. */
unsigned long __weak arch_module_vm_flags()
{
/*
* Modules use a single, large vmalloc(). Different
* permissions are applied later and will fragment
* huge mappings. Avoid using huge pages for modules.
*/
return VM_NO_HUGE_VMAP;
}

Stick that in some the common module code, next to:

> void * __weak module_alloc(unsigned long size)
> {
> return __vmalloc_node_range(size, 1, VMALLOC_START, VMALLOC_END,
...

Then, put arch_module_vm_flags() in *all* of the module_alloc()
implementations, including the generic one. That way (even with a new
architecture) whoever copies-and-pastes their module_alloc()
implementation is likely to get it right. The next guy who just does a
"select HAVE_ARCH_HUGE_VMALLOC" will hopefully just work.

VM_FLUSH_RESET_PERMS could probably be dealt with in the same way.

2021-12-29 11:01:24

by Kefeng Wang

[permalink] [raw]
Subject: Re: [PATCH v2 3/3] x86: Support huge vmalloc mappings


On 2021/12/29 0:14, Dave Hansen wrote:
> On 12/28/21 2:26 AM, Kefeng Wang wrote:
>>>> There are some disadvantages about this feature[2], one of the main
>>>> concerns is the possible memory fragmentation/waste in some scenarios,
>>>> also archs must ensure that any arch specific vmalloc allocations that
>>>> require PAGE_SIZE mappings(eg, module alloc with STRICT_MODULE_RWX)
>>>> use the VM_NO_HUGE_VMAP flag to inhibit larger mappings.
>>> That just says that x86 *needs* PAGE_SIZE allocations.  But, what
>>> happens if VM_NO_HUGE_VMAP is not passed (like it was in v1)?  Will the
>>> subsequent permission changes just fragment the 2M mapping?
>> Yes, without VM_NO_HUGE_VMAP, it could fragment the 2M mapping.
>>
>> When module alloc with STRICT_MODULE_RWX on x86, it calls
>> __change_page_attr()
>>
>> from set_memory_ro/rw/nx which will split large page, so there is no
>> need to make
>>
>> module alloc with HUGE_VMALLOC.
> This all sounds very fragile to me. Every time a new architecture would
> get added for huge vmalloc() support, the developer needs to know to go
> find that architecture's module_alloc() and add this flag. They next
> guy is going to forget, just like you did.
>
> Considering that this is not a hot path, a weak function would be a nice
> choice:
>
> /* vmalloc() flags used for all module allocations. */
> unsigned long __weak arch_module_vm_flags()
> {
> /*
> * Modules use a single, large vmalloc(). Different
> * permissions are applied later and will fragment
> * huge mappings. Avoid using huge pages for modules.
> */
> return VM_NO_HUGE_VMAP;

For x86, it only fragment, but for arm64, due to apply_to_page_range() in

set_memory_*, without this flag maybe crash. Whatever, we need this

flag for module.

> }
>
> Stick that in some the common module code, next to:
>
>> void * __weak module_alloc(unsigned long size)
>> {
>> return __vmalloc_node_range(size, 1, VMALLOC_START, VMALLOC_END,
> ...
>
> Then, put arch_module_vm_flags() in *all* of the module_alloc()
> implementations, including the generic one. That way (even with a new
> architecture) whoever copies-and-pastes their module_alloc()
> implementation is likely to get it right. The next guy who just does a
> "select HAVE_ARCH_HUGE_VMALLOC" will hopefully just work.

OK, Let me check the VM_FLUSH_RESET_PERMS and try about this way.

Thanks.

>
> VM_FLUSH_RESET_PERMS could probably be dealt with in the same way.
> .

2022-01-16 01:09:17

by Christophe Leroy

[permalink] [raw]
Subject: Re: [PATCH v2 2/3] arm64: Support huge vmalloc mappings



Le 27/12/2021 à 15:59, Kefeng Wang a écrit :
> This patch select HAVE_ARCH_HUGE_VMALLOC to let arm64 support huge
> vmalloc mappings.
>
> Cc: Catalin Marinas <[email protected]>
> Cc: Will Deacon <[email protected]>
> Signed-off-by: Kefeng Wang <[email protected]>
> ---
> Documentation/admin-guide/kernel-parameters.txt | 4 ++--
> arch/arm64/Kconfig | 1 +
> arch/arm64/kernel/module.c | 5 +++--
> 3 files changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 7b2f900fd243..e3f9fd7ec106 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -1639,7 +1639,7 @@
> precedence over memory_hotplug.memmap_on_memory.
>
>
> - hugevmalloc= [PPC] Reguires CONFIG_HAVE_ARCH_HUGE_VMALLOC
> + hugevmalloc= [KNL,PPC,ARM64] Reguires CONFIG_HAVE_ARCH_HUGE_VMALLOC
> Format: { on | off }
> Default set by CONFIG_HUGE_VMALLOC_DEFAULT_ENABLED.
>
> @@ -3424,7 +3424,7 @@
>
> nohugeiomap [KNL,X86,PPC,ARM64] Disable kernel huge I/O mappings.
>
> - nohugevmalloc [PPC] Disable kernel huge vmalloc mappings.
> + nohugevmalloc [KNL,PPC,ARM64] Disable kernel huge vmalloc mappings.
>
> nosmt [KNL,S390] Disable symmetric multithreading (SMT).
> Equivalent to smt=1.
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 3bb0b67292b5..c34bbb4482b0 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -142,6 +142,7 @@ config ARM64
> select HAVE_ARCH_AUDITSYSCALL
> select HAVE_ARCH_BITREVERSE
> select HAVE_ARCH_COMPILER_H
> + select HAVE_ARCH_HUGE_VMALLOC
> select HAVE_ARCH_HUGE_VMAP
> select HAVE_ARCH_JUMP_LABEL
> select HAVE_ARCH_JUMP_LABEL_RELATIVE
> diff --git a/arch/arm64/kernel/module.c b/arch/arm64/kernel/module.c
> index 309a27553c87..af7b4cbace2b 100644
> --- a/arch/arm64/kernel/module.c
> +++ b/arch/arm64/kernel/module.c
> @@ -36,7 +36,8 @@ void *module_alloc(unsigned long size)
> module_alloc_end = MODULES_END;
>
> p = __vmalloc_node_range(size, MODULE_ALIGN, module_alloc_base,
> - module_alloc_end, gfp_mask, PAGE_KERNEL, VM_DEFER_KMEMLEAK,
> + module_alloc_end, gfp_mask, PAGE_KERNEL,
> + VM_DEFER_KMEMLEAK | VM_NO_HUGE_VMAP,

you should add a comment like powerpc (commit 8abddd968a30
("powerpc/64s/radix: Enable huge vmalloc mappings")) to explain why this
requires VM_NO_HUGE_VMAP

> NUMA_NO_NODE, __builtin_return_address(0));
>
> if (!p && IS_ENABLED(CONFIG_ARM64_MODULE_PLTS) &&
> @@ -55,7 +56,7 @@ void *module_alloc(unsigned long size)
> */
> p = __vmalloc_node_range(size, MODULE_ALIGN, module_alloc_base,
> module_alloc_base + SZ_2G, GFP_KERNEL,
> - PAGE_KERNEL, 0, NUMA_NO_NODE,
> + PAGE_KERNEL, VM_NO_HUGE_VMAP, NUMA_NO_NODE,

Same

> __builtin_return_address(0));
>
> if (p && (kasan_module_alloc(p, size, gfp_mask) < 0)) {

2022-01-16 02:08:17

by Christophe Leroy

[permalink] [raw]
Subject: Re: [PATCH v2 3/3] x86: Support huge vmalloc mappings



Le 27/12/2021 à 15:59, Kefeng Wang a écrit :
> This patch select HAVE_ARCH_HUGE_VMALLOC to let X86_64 and X86_PAE
> support huge vmalloc mappings.
>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Borislav Petkov <[email protected]>
> Cc: Dave Hansen <[email protected]>
> Cc: "H. Peter Anvin" <[email protected]>
> Signed-off-by: Kefeng Wang <[email protected]>
> ---
> Documentation/admin-guide/kernel-parameters.txt | 4 ++--
> arch/x86/Kconfig | 1 +
> arch/x86/kernel/module.c | 4 ++--
> 3 files changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index e3f9fd7ec106..ffce6591ae64 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -1639,7 +1639,7 @@
> precedence over memory_hotplug.memmap_on_memory.
>
>
> - hugevmalloc= [KNL,PPC,ARM64] Reguires CONFIG_HAVE_ARCH_HUGE_VMALLOC
> + hugevmalloc= [KNL,PPC,ARM64,X86] Reguires CONFIG_HAVE_ARCH_HUGE_VMALLOC
> Format: { on | off }
> Default set by CONFIG_HUGE_VMALLOC_DEFAULT_ENABLED.
>
> @@ -3424,7 +3424,7 @@
>
> nohugeiomap [KNL,X86,PPC,ARM64] Disable kernel huge I/O mappings.
>
> - nohugevmalloc [KNL,PPC,ARM64] Disable kernel huge vmalloc mappings.
> + nohugevmalloc [KNL,PPC,ARM64,X86] Disable kernel huge vmalloc mappings.
>
> nosmt [KNL,S390] Disable symmetric multithreading (SMT).
> Equivalent to smt=1.
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index ebe8fc76949a..f6bf6675bbe7 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -157,6 +157,7 @@ config X86
> select HAVE_ACPI_APEI_NMI if ACPI
> select HAVE_ALIGNED_STRUCT_PAGE if SLUB
> select HAVE_ARCH_AUDITSYSCALL
> + select HAVE_ARCH_HUGE_VMALLOC if HAVE_ARCH_HUGE_VMAP
> select HAVE_ARCH_HUGE_VMAP if X86_64 || X86_PAE
> select HAVE_ARCH_JUMP_LABEL
> select HAVE_ARCH_JUMP_LABEL_RELATIVE
> diff --git a/arch/x86/kernel/module.c b/arch/x86/kernel/module.c
> index 95fa745e310a..6bf5cb7d876a 100644
> --- a/arch/x86/kernel/module.c
> +++ b/arch/x86/kernel/module.c
> @@ -75,8 +75,8 @@ void *module_alloc(unsigned long size)
>
> p = __vmalloc_node_range(size, MODULE_ALIGN,
> MODULES_VADDR + get_module_load_offset(),
> - MODULES_END, gfp_mask,
> - PAGE_KERNEL, VM_DEFER_KMEMLEAK, NUMA_NO_NODE,
> + MODULES_END, gfp_mask, PAGE_KERNEL,
> + VM_DEFER_KMEMLEAK | VM_NO_HUGE_VMAP, NUMA_NO_NODE,

you should add a comment like powerpc (commit 8abddd968a30
("powerpc/64s/radix: Enable huge vmalloc mappings")) to explain why this
requires VM_NO_HUGE_VMAP

> __builtin_return_address(0));
> if (p && (kasan_module_alloc(p, size, gfp_mask) < 0)) {
> vfree(p);

2022-01-16 02:08:37

by Christophe Leroy

[permalink] [raw]
Subject: Re: [PATCH v2 0/3] mm: support huge vmalloc mapping on arm64/x86



Le 27/12/2021 à 15:59, Kefeng Wang a écrit :
> Huge vmalloc mappings is supported on PPC[1], but this feature should
> be not only used on PPC, it could be used on arch support HAVE_ARCH_HUGE_VMAP
> and PMD sized vmap mappings. this patchset is to enable this feature
> on arm64/x86.
>
> There are some disadvantages about this feature[2], one of the main

There are some disadvantage, ok, so are there advantages as well ?

> concerns is the possible memory fragmentation/waste in some scenarios,
> also archs must ensure that any arch specific vmalloc allocations that
> require PAGE_SIZE mappings(eg, module alloc with STRICT_MODULE_RWX)
> use the VM_NO_HUGE_VMAP flag to inhibit larger mappings.
>
> Based on the above considerations, we add the first patch is to let
> user to control huge vmalloc mapping default behavior. Meanwhile,
> add new kernel parameter hugevmalloc=on/off to enable/disable this
> feature at boot time, nohugevmalloc parameter is still supported.
>
> The later two patches to enable this feature on arm64/x86, select
> HAVE_ARCH_HUGE_VMALLOC and mark VM_NO_HUGE_VMAP in arch's module_alloc().
>
> This patchset based on next-20211224.
>
> v2:
> - Default y for HUGE_VMALLOC_DEFAULT_ENABLED, not only select it on PPC
> - Fix copy/type error
> - Mark VM_NO_HUGE_VMAP in module_alloc() on arm64/x86
>
> [1] https://lore.kernel.org/linux-mm/[email protected]/
> [2] https://lore.kernel.org/linux-mm/[email protected]/
>
> Kefeng Wang (3):
> mm: vmalloc: Let user to control huge vmalloc default behavior
> arm64: Support huge vmalloc mappings
> x86: Support huge vmalloc mappings
>
> .../admin-guide/kernel-parameters.txt | 14 +++++++++++++-
> arch/arm64/Kconfig | 1 +
> arch/arm64/kernel/module.c | 5 +++--
> arch/x86/Kconfig | 1 +
> arch/x86/kernel/module.c | 4 ++--
> mm/Kconfig | 8 ++++++++
> mm/vmalloc.c | 18 +++++++++++++++++-
> 7 files changed, 45 insertions(+), 6 deletions(-)
>

2022-01-16 02:08:47

by Christophe Leroy

[permalink] [raw]
Subject: Re: [PATCH v2 3/3] x86: Support huge vmalloc mappings



Le 28/12/2021 à 11:26, Kefeng Wang a écrit :
>
> On 2021/12/27 23:56, Dave Hansen wrote:
>> On 12/27/21 6:59 AM, Kefeng Wang wrote:
>>> This patch select HAVE_ARCH_HUGE_VMALLOC to let X86_64 and X86_PAE
>>> support huge vmalloc mappings.
>> In general, this seems interesting and the diff is simple.  But, I don't
>> see _any_ x86-specific data.  I think the bare minimum here would be a
>> few kernel compiles and some 'perf stat' data for some TLB events.
>
> When the feature supported on ppc,
>
> commit 8abddd968a303db75e4debe77a3df484164f1f33
> Author: Nicholas Piggin <[email protected]>
> Date:   Mon May 3 19:17:55 2021 +1000
>
>     powerpc/64s/radix: Enable huge vmalloc mappings
>
>     This reduces TLB misses by nearly 30x on a `git diff` workload on a
>     2-node POWER9 (59,800 -> 2,100) and reduces CPU cycles by 0.54%, due
>     to vfs hashes being allocated with 2MB pages.
>
> But the data could be different on different machine/arch.
>
>>> diff --git a/arch/x86/kernel/module.c b/arch/x86/kernel/module.c
>>> index 95fa745e310a..6bf5cb7d876a 100644
>>> --- a/arch/x86/kernel/module.c
>>> +++ b/arch/x86/kernel/module.c
>>> @@ -75,8 +75,8 @@ void *module_alloc(unsigned long size)
>>>       p = __vmalloc_node_range(size, MODULE_ALIGN,
>>>                       MODULES_VADDR + get_module_load_offset(),
>>> -                    MODULES_END, gfp_mask,
>>> -                    PAGE_KERNEL, VM_DEFER_KMEMLEAK, NUMA_NO_NODE,
>>> +                    MODULES_END, gfp_mask, PAGE_KERNEL,
>>> +                    VM_DEFER_KMEMLEAK | VM_NO_HUGE_VMAP, NUMA_NO_NODE,
>>>                       __builtin_return_address(0));
>>>       if (p && (kasan_module_alloc(p, size, gfp_mask) < 0)) {
>>>           vfree(p);
>> To figure out what's going on in this hunk, I had to look at the cover
>> letter (which I wasn't cc'd on).  That's not great and it means that
>> somebody who stumbles upon this in the code is going to have a really
>> hard time figuring out what is going on.  Cover letters don't make it
>> into git history.
> Sorry for that, will add more into arch's patch changelog.
>> This desperately needs a comment and some changelog material in *this*
>> patch.
>>
>> But, even the description from the cover letter is sparse:
>>
>>> There are some disadvantages about this feature[2], one of the main
>>> concerns is the possible memory fragmentation/waste in some scenarios,
>>> also archs must ensure that any arch specific vmalloc allocations that
>>> require PAGE_SIZE mappings(eg, module alloc with STRICT_MODULE_RWX)
>>> use the VM_NO_HUGE_VMAP flag to inhibit larger mappings.
>> That just says that x86 *needs* PAGE_SIZE allocations.  But, what
>> happens if VM_NO_HUGE_VMAP is not passed (like it was in v1)?  Will the
>> subsequent permission changes just fragment the 2M mapping?
>> .
>
> Yes, without VM_NO_HUGE_VMAP, it could fragment the 2M mapping.
>
> When module alloc with STRICT_MODULE_RWX on x86, it calls
> __change_page_attr()
>
> from set_memory_ro/rw/nx which will split large page, so there is no
> need to make
>
> module alloc with HUGE_VMALLOC.
>

Maybe there is no need to perform the module alloc with HUGE_VMALLOC,
but it least it would still work if you do so.

Powerpc did add VM_NO_HUGE_VMAP temporarily and for some reason which is
explained in a comment.

If x86 already has the necessary logic to handle it, why add
VM_NO_HUGE_VMAP ?

Christophe

2022-01-16 02:09:29

by Christophe Leroy

[permalink] [raw]
Subject: Re: [PATCH v2 3/3] x86: Support huge vmalloc mappings



Le 28/12/2021 à 17:14, Dave Hansen a écrit :
> On 12/28/21 2:26 AM, Kefeng Wang wrote:
>>>> There are some disadvantages about this feature[2], one of the main
>>>> concerns is the possible memory fragmentation/waste in some scenarios,
>>>> also archs must ensure that any arch specific vmalloc allocations that
>>>> require PAGE_SIZE mappings(eg, module alloc with STRICT_MODULE_RWX)
>>>> use the VM_NO_HUGE_VMAP flag to inhibit larger mappings.
>>> That just says that x86 *needs* PAGE_SIZE allocations.  But, what
>>> happens if VM_NO_HUGE_VMAP is not passed (like it was in v1)?  Will the
>>> subsequent permission changes just fragment the 2M mapping?
>>
>> Yes, without VM_NO_HUGE_VMAP, it could fragment the 2M mapping.
>>
>> When module alloc with STRICT_MODULE_RWX on x86, it calls
>> __change_page_attr()
>>
>> from set_memory_ro/rw/nx which will split large page, so there is no
>> need to make
>>
>> module alloc with HUGE_VMALLOC.
>
> This all sounds very fragile to me. Every time a new architecture would
> get added for huge vmalloc() support, the developer needs to know to go
> find that architecture's module_alloc() and add this flag. They next
> guy is going to forget, just like you did.

That's not correct from my point of view.

When powerpc added that, a clear comment explains why:


+ /*
+ * Don't do huge page allocations for modules yet until more testing
+ * is done. STRICT_MODULE_RWX may require extra work to support this
+ * too.
+ */

So as you can see, this is something specific to powerpc and temporary.

>
> Considering that this is not a hot path, a weak function would be a nice
> choice:
>
> /* vmalloc() flags used for all module allocations. */
> unsigned long __weak arch_module_vm_flags()
> {
> /*
> * Modules use a single, large vmalloc(). Different
> * permissions are applied later and will fragment
> * huge mappings. Avoid using huge pages for modules.
> */

Why ? Not everybody use STRICT_MODULES_RWX.
Even if you do so, you can still benefit from huge pages for modules.

Why make what was initially a temporary precaution for powerpc become a
definitive default limitation for all ?

> return VM_NO_HUGE_VMAP;
> }
>
> Stick that in some the common module code, next to:
>
>> void * __weak module_alloc(unsigned long size)
>> {
>> return __vmalloc_node_range(size, 1, VMALLOC_START, VMALLOC_END,
> ...
>
> Then, put arch_module_vm_flags() in *all* of the module_alloc()
> implementations, including the generic one. That way (even with a new
> architecture) whoever copies-and-pastes their module_alloc()
> implementation is likely to get it right. The next guy who just does a
> "select HAVE_ARCH_HUGE_VMALLOC" will hopefully just work.
>
> VM_FLUSH_RESET_PERMS could probably be dealt with in the same way.

2022-01-16 02:10:18

by Christophe Leroy

[permalink] [raw]
Subject: Re: [PATCH v2 3/3] x86: Support huge vmalloc mappings



Le 29/12/2021 à 12:01, Kefeng Wang a écrit :
>
> On 2021/12/29 0:14, Dave Hansen wrote:
>> On 12/28/21 2:26 AM, Kefeng Wang wrote:
>>>>> There are some disadvantages about this feature[2], one of the main
>>>>> concerns is the possible memory fragmentation/waste in some scenarios,
>>>>> also archs must ensure that any arch specific vmalloc allocations that
>>>>> require PAGE_SIZE mappings(eg, module alloc with STRICT_MODULE_RWX)
>>>>> use the VM_NO_HUGE_VMAP flag to inhibit larger mappings.
>>>> That just says that x86 *needs* PAGE_SIZE allocations.  But, what
>>>> happens if VM_NO_HUGE_VMAP is not passed (like it was in v1)?  Will the
>>>> subsequent permission changes just fragment the 2M mapping?
>>> Yes, without VM_NO_HUGE_VMAP, it could fragment the 2M mapping.
>>>
>>> When module alloc with STRICT_MODULE_RWX on x86, it calls
>>> __change_page_attr()
>>>
>>> from set_memory_ro/rw/nx which will split large page, so there is no
>>> need to make
>>>
>>> module alloc with HUGE_VMALLOC.
>> This all sounds very fragile to me.  Every time a new architecture would
>> get added for huge vmalloc() support, the developer needs to know to go
>> find that architecture's module_alloc() and add this flag.  They next
>> guy is going to forget, just like you did.
>>
>> Considering that this is not a hot path, a weak function would be a nice
>> choice:
>>
>> /* vmalloc() flags used for all module allocations. */
>> unsigned long __weak arch_module_vm_flags()
>> {
>>     /*
>>      * Modules use a single, large vmalloc().  Different
>>      * permissions are applied later and will fragment
>>      * huge mappings.  Avoid using huge pages for modules.
>>      */
>>     return VM_NO_HUGE_VMAP;
>
> For x86, it only fragment, but for arm64, due to apply_to_page_range() in
>
> set_memory_*, without this flag maybe crash. Whatever, we need this
>
> flag for module.

I see no reason to have this flag by default.

Only ARM should have it if necessary, with a comment explaining why just
like powerpc.

And maybe the flag should be there only when STRICT_MODULE_RWX is selected.

>
>> }
>>
>> Stick that in some the common module code, next to:
>>
>>> void * __weak module_alloc(unsigned long size)
>>> {
>>>          return __vmalloc_node_range(size, 1, VMALLOC_START,
>>> VMALLOC_END,
>> ...
>>
>> Then, put arch_module_vm_flags() in *all* of the module_alloc()
>> implementations, including the generic one.  That way (even with a new
>> architecture) whoever copies-and-pastes their module_alloc()
>> implementation is likely to get it right.  The next guy who just does a
>> "select HAVE_ARCH_HUGE_VMALLOC" will hopefully just work.
>
> OK, Let me check the VM_FLUSH_RESET_PERMS and try about this way.
>
> Thanks.
>
>>
>> VM_FLUSH_RESET_PERMS could probably be dealt with in the same way.
>> .

2022-01-19 12:02:38

by Nicholas Piggin

[permalink] [raw]
Subject: Re: [PATCH v2 3/3] x86: Support huge vmalloc mappings

Excerpts from Dave Hansen's message of December 29, 2021 2:14 am:
> On 12/28/21 2:26 AM, Kefeng Wang wrote:
>>>> There are some disadvantages about this feature[2], one of the main
>>>> concerns is the possible memory fragmentation/waste in some scenarios,
>>>> also archs must ensure that any arch specific vmalloc allocations that
>>>> require PAGE_SIZE mappings(eg, module alloc with STRICT_MODULE_RWX)
>>>> use the VM_NO_HUGE_VMAP flag to inhibit larger mappings.
>>> That just says that x86 *needs* PAGE_SIZE allocations.  But, what
>>> happens if VM_NO_HUGE_VMAP is not passed (like it was in v1)?  Will the
>>> subsequent permission changes just fragment the 2M mapping?
>>
>> Yes, without VM_NO_HUGE_VMAP, it could fragment the 2M mapping.
>>
>> When module alloc with STRICT_MODULE_RWX on x86, it calls
>> __change_page_attr()
>>
>> from set_memory_ro/rw/nx which will split large page, so there is no
>> need to make
>>
>> module alloc with HUGE_VMALLOC.
>
> This all sounds very fragile to me. Every time a new architecture would
> get added for huge vmalloc() support, the developer needs to know to go
> find that architecture's module_alloc() and add this flag.

This is documented in the Kconfig.

#
# Archs that select this would be capable of PMD-sized vmaps (i.e.,
# arch_vmap_pmd_supported() returns true), and they must make no assumptions
# that vmalloc memory is mapped with PAGE_SIZE ptes. The VM_NO_HUGE_VMAP flag
# can be used to prohibit arch-specific allocations from using hugepages to
# help with this (e.g., modules may require it).
#
config HAVE_ARCH_HUGE_VMALLOC
depends on HAVE_ARCH_HUGE_VMAP
bool

Is it really fair to say it's *very* fragile? Surely it's reasonable to
read the (not very long) documentation ad understand the consequences for
the arch code before enabling it.

> They next
> guy is going to forget, just like you did.

The miss here could just be a simple oversight or thinko, and caught by
review, as happens to a lot of things.

>
> Considering that this is not a hot path, a weak function would be a nice
> choice:
>
> /* vmalloc() flags used for all module allocations. */
> unsigned long __weak arch_module_vm_flags()
> {
> /*
> * Modules use a single, large vmalloc(). Different
> * permissions are applied later and will fragment
> * huge mappings. Avoid using huge pages for modules.
> */
> return VM_NO_HUGE_VMAP;
> }
>
> Stick that in some the common module code, next to:

Then they have to think about it even less, so I don't know if that's an
improvement. I don't know what else an arch might be doing with these
allocations, at least modules will blow up pretty quickly, who knows
what other rare code relies on 4k vmalloc mappings?

The huge vmalloc option is not supposed to be easy to enable. This is
the same problem Andy was having with the TLB shootdown patches, he
didn't read the documentation and thought it was supposed to be a
trivial thing anybody could enable without thinking about it, and was
dutifully pointing out the the nasty "bugs" the feature has in it if
x86 were to enable it improperly.

Thanks,
Nick

>
>> void * __weak module_alloc(unsigned long size)
>> {
>> return __vmalloc_node_range(size, 1, VMALLOC_START, VMALLOC_END,
> ...
>
> Then, put arch_module_vm_flags() in *all* of the module_alloc()
> implementations, including the generic one. That way (even with a new
> architecture) whoever copies-and-pastes their module_alloc()
> implementation is likely to get it right. The next guy who just does a
> "select HAVE_ARCH_HUGE_VMALLOC" will hopefully just work.
>
> VM_FLUSH_RESET_PERMS could probably be dealt with in the same way.
>

2022-01-19 15:23:23

by Nicholas Piggin

[permalink] [raw]
Subject: Re: [PATCH v2 1/3] mm: vmalloc: Let user to control huge vmalloc default behavior

Excerpts from Kefeng Wang's message of December 28, 2021 12:59 am:
> Introduce HUGE_VMALLOC_DEFAULT_ENABLED and make it default y, this
> let user to choose whether or not enable huge vmalloc mappings by
> default.
>
> Meanwhile, add new hugevmalloc=on/off parameter to enable or disable
> this feature at boot time, nohugevmalloc is still supported and
> equivalent to hugevmalloc=off.

Runtime options are bad enough, Kconfig and boot options are even worse.

The 'nohugevmalloc' option mirrors 'nohugeiomap' and is not expected to
ever be understood by an administrator unless a kernel developer is
working with them to hunt down a regression.

IMO there should be no new options. You could switch it off for
CONFIG_BASE_SMALL perhaps, and otherwise just try to work on heuristics
first. Bring in new options once it's proven they're needed.

Aside from that, thanks for working on these ports, great work.

Thanks,
Nick

2022-01-20 19:18:19

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH v2 3/3] x86: Support huge vmalloc mappings

On 1/17/22 6:46 PM, Nicholas Piggin wrote:
>> This all sounds very fragile to me. Every time a new architecture would
>> get added for huge vmalloc() support, the developer needs to know to go
>> find that architecture's module_alloc() and add this flag.
> This is documented in the Kconfig.
>
> #
> # Archs that select this would be capable of PMD-sized vmaps (i.e.,
> # arch_vmap_pmd_supported() returns true), and they must make no assumptions
> # that vmalloc memory is mapped with PAGE_SIZE ptes. The VM_NO_HUGE_VMAP flag
> # can be used to prohibit arch-specific allocations from using hugepages to
> # help with this (e.g., modules may require it).
> #
> config HAVE_ARCH_HUGE_VMALLOC
> depends on HAVE_ARCH_HUGE_VMAP
> bool
>
> Is it really fair to say it's *very* fragile? Surely it's reasonable to
> read the (not very long) documentation ad understand the consequences for
> the arch code before enabling it.

Very fragile or not, I think folks are likely to get it wrong. It would
be nice to have it default *everyone* to safe and slow and make *sure*
they go look at the architecture modules code itself before enabling
this for modules.

Just from that Kconfig text, I don't think I'd know off the top of my
head what do do for x86, or what code I needed to go touch.

2022-01-21 16:28:15

by Nicholas Piggin

[permalink] [raw]
Subject: Re: [PATCH v2 3/3] x86: Support huge vmalloc mappings

Excerpts from Dave Hansen's message of January 19, 2022 3:28 am:
> On 1/17/22 6:46 PM, Nicholas Piggin wrote:
>>> This all sounds very fragile to me. Every time a new architecture would
>>> get added for huge vmalloc() support, the developer needs to know to go
>>> find that architecture's module_alloc() and add this flag.
>> This is documented in the Kconfig.
>>
>> #
>> # Archs that select this would be capable of PMD-sized vmaps (i.e.,
>> # arch_vmap_pmd_supported() returns true), and they must make no assumptions
>> # that vmalloc memory is mapped with PAGE_SIZE ptes. The VM_NO_HUGE_VMAP flag
>> # can be used to prohibit arch-specific allocations from using hugepages to
>> # help with this (e.g., modules may require it).
>> #
>> config HAVE_ARCH_HUGE_VMALLOC
>> depends on HAVE_ARCH_HUGE_VMAP
>> bool
>>
>> Is it really fair to say it's *very* fragile? Surely it's reasonable to
>> read the (not very long) documentation ad understand the consequences for
>> the arch code before enabling it.
>
> Very fragile or not, I think folks are likely to get it wrong. It would
> be nice to have it default *everyone* to safe and slow and make *sure*

It's not safe to enable though. That's the problem. If it was just
modules then you'd have a point but it could be anything.

> they go look at the architecture modules code itself before enabling
> this for modules.

This is required not just for modules for the whole arch code, it
has to be looked at and decided this will work.

> Just from that Kconfig text, I don't think I'd know off the top of my
> head what do do for x86, or what code I needed to go touch.

You have to make sure arch/x86 makes no assumptions that vmalloc memory
is backed by PAGE_SIZE ptes. If you can't do that then you shouldn't
enable the option. The option can not explain it any more because any
arch could do anything with its mappings. The module code is an example,
not the recipe.

Thanks,
Nick

2022-01-21 19:16:15

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH v2 1/3] mm: vmalloc: Let user to control huge vmalloc default behavior

On Wed, Jan 19, 2022 at 08:57:58PM +0800, Kefeng Wang wrote:
> Only parts of our products wants this feature,  we add some interfaces which
> only
>
> alloc hugevmalloc for them, eg,
> vmap_hugepage/vmalloc_hugepage/remap_vmalloc_hugepage_range..
>
> for our products, but it's not the choice of most products, also add
> nohugevmalloc
>
> for most products is expensive, so this is the reason for adding the patch.
>
> more config/cmdline are more flexible for test/products,

But why do only some products want it? What goes wrong if all products
enable it? Features should be auto-tuning, not relying on admins to
understand them.