2022-04-12 12:48:52

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v2 bpf 1/3] vmalloc: replace VM_NO_HUGE_VMAP with VM_ALLOW_HUGE_VMAP

On Mon, Apr 11, 2022 at 04:35:46PM -0700, Song Liu wrote:
> Huge page backed vmalloc memory could benefit performance in many cases.
> Since some users of vmalloc may not be ready to handle huge pages,
> VM_NO_HUGE_VMAP was introduced to allow vmalloc users to opt-out huge
> pages. However, it is not easy to add VM_NO_HUGE_VMAP to all the users
> that may try to allocate >= PMD_SIZE pages, but are not ready to handle
> huge pages properly.

This is a good place to document what the problems are, and how they are
hard to track down (e.g. because the allocations are passed down I/O
stacks)

>
> Replace VM_NO_HUGE_VMAP with an opt-in flag, VM_ALLOW_HUGE_VMAP, so that
> users that benefit from huge pages could ask specificially.
>
> Also, replace vmalloc_no_huge() with opt-in helper vmalloc_huge().

We still need to find out what the primary users of the large vmalloc
hashes was and convert them.

> +extern void *vmalloc_huge(unsigned long size) __alloc_size(1);

No need for the extern.

> +EXPORT_SYMBOL(vmalloc_huge);

EXPORT_SYMBOL_GPL for all advanced vmalloc functionality, please.


2022-04-12 20:50:58

by Song Liu

[permalink] [raw]
Subject: Re: [PATCH v2 bpf 1/3] vmalloc: replace VM_NO_HUGE_VMAP with VM_ALLOW_HUGE_VMAP

On Mon, Apr 11, 2022 at 9:18 PM Christoph Hellwig <[email protected]> wrote:
>
> On Mon, Apr 11, 2022 at 04:35:46PM -0700, Song Liu wrote:
> > Huge page backed vmalloc memory could benefit performance in many cases.
> > Since some users of vmalloc may not be ready to handle huge pages,
> > VM_NO_HUGE_VMAP was introduced to allow vmalloc users to opt-out huge
> > pages. However, it is not easy to add VM_NO_HUGE_VMAP to all the users
> > that may try to allocate >= PMD_SIZE pages, but are not ready to handle
> > huge pages properly.
>
> This is a good place to document what the problems are, and how they are
> hard to track down (e.g. because the allocations are passed down I/O
> stacks)

Will add it in v3.

>
> >
> > Replace VM_NO_HUGE_VMAP with an opt-in flag, VM_ALLOW_HUGE_VMAP, so that
> > users that benefit from huge pages could ask specificially.
> >
> > Also, replace vmalloc_no_huge() with opt-in helper vmalloc_huge().
>
> We still need to find out what the primary users of the large vmalloc
> hashes was and convert them.

@ Claudio and Nicholas,

Could you please help identify users of large vmalloc? So far, I found
alloc_large_system_hash(), and something like the following seems to
work:

diff --git i/mm/page_alloc.c w/mm/page_alloc.c
index 6e5b4488a0c5..20d38b8482c4 100644
--- i/mm/page_alloc.c
+++ w/mm/page_alloc.c
@@ -8919,7 +8919,7 @@ void *__init alloc_large_system_hash(const char
*tablename,
table = memblock_alloc_raw(size,
SMP_CACHE_BYTES);
} else if (get_order(size) >= MAX_ORDER || hashdist) {
- table = __vmalloc(size, gfp_flags);
+ table = __vmalloc_huge(size, gfp_flags);
virt = true;
if (table)
huge = is_vm_area_hugepages(table);
diff --git i/mm/vmalloc.c w/mm/vmalloc.c
index 7cc2be6a7554..cbadbe83e6a6 100644
--- i/mm/vmalloc.c
+++ w/mm/vmalloc.c
@@ -3253,6 +3253,14 @@ void *__vmalloc(unsigned long size, gfp_t gfp_mask)
}
EXPORT_SYMBOL(__vmalloc);

+void *__vmalloc_huge(unsigned long size, gfp_t gfp_mask)
+{
+ return __vmalloc_node_range(size, 1, VMALLOC_START, VMALLOC_END,
+ gfp_mask, PAGE_KERNEL, VM_ALLOW_HUGE_VMAP,
+ NUMA_NO_NODE, __builtin_return_address(0));
+}
+EXPORT_SYMBOL_GPL(__vmalloc_huge);
+
/**
* vmalloc - allocate virtually contiguous memory
* @size: allocation size


>
> > +extern void *vmalloc_huge(unsigned long size) __alloc_size(1);
>
> No need for the extern.
>
> > +EXPORT_SYMBOL(vmalloc_huge);
>
> EXPORT_SYMBOL_GPL for all advanced vmalloc functionality, please.

Will fix these in v3.

Thanks,
Song

2022-04-22 21:49:47

by Nicholas Piggin

[permalink] [raw]
Subject: Re: [PATCH v2 bpf 1/3] vmalloc: replace VM_NO_HUGE_VMAP with VM_ALLOW_HUGE_VMAP

Excerpts from Song Liu's message of April 12, 2022 4:00 pm:
> On Mon, Apr 11, 2022 at 9:18 PM Christoph Hellwig <[email protected]> wrote:
>>
>> On Mon, Apr 11, 2022 at 04:35:46PM -0700, Song Liu wrote:
>> > Huge page backed vmalloc memory could benefit performance in many cases.
>> > Since some users of vmalloc may not be ready to handle huge pages,
>> > VM_NO_HUGE_VMAP was introduced to allow vmalloc users to opt-out huge
>> > pages. However, it is not easy to add VM_NO_HUGE_VMAP to all the users
>> > that may try to allocate >= PMD_SIZE pages, but are not ready to handle
>> > huge pages properly.
>>
>> This is a good place to document what the problems are, and how they are
>> hard to track down (e.g. because the allocations are passed down I/O
>> stacks)
>
> Will add it in v3.
>
>>
>> >
>> > Replace VM_NO_HUGE_VMAP with an opt-in flag, VM_ALLOW_HUGE_VMAP, so that
>> > users that benefit from huge pages could ask specificially.
>> >
>> > Also, replace vmalloc_no_huge() with opt-in helper vmalloc_huge().
>>
>> We still need to find out what the primary users of the large vmalloc
>> hashes was and convert them.
>
> @ Claudio and Nicholas,
>
> Could you please help identify users of large vmalloc? So far, I found
> alloc_large_system_hash(), and something like the following seems to
> work:

The large system hashes were the main ones I was interested in. IIRC
there was a few more in some drivers or tracing things depending on
config but those are less important (to me at least).

Curious what the problem is though. powerpc so far has not required
any special case outside arch/powerpc/ for this so I would much
prefer x86 to fix itself rather than add APIs which non-arch code
really shouldn't need to know about.

Thanks,
Nick