LinuxLists.cc - Re: [PATCH v4 bpf 0/4] vmalloc: bpf: introduce VM_ALLOW_HUGE

2022-04-22 02:11:52

Subject: Re: [PATCH v4 bpf 0/4] vmalloc: bpf: introduce VM_ALLOW_HUGE_VMAP

On Thu, Apr 21, 2022 at 8:47 AM Edgecombe, Rick P
<[email protected]> wrote:
>
> I wonder if it
> might have to do with the vmalloc huge pages using compound pages, then
> some caller doing vmalloc_to_page() and getting surprised with what
> they could get away with in the struct page.

Very likely. We have 100+ users of vmalloc_to_page() in random
drivers, and the gpu code does show up on that list.

And is very much another case of "it's always been broken, but
enabling it on x86 made the breakage actually show up in real life".

Linus

2022-04-22 14:43:08

by Nicholas Piggin

[permalink] [raw]

Subject: Re: [PATCH v4 bpf 0/4] vmalloc: bpf: introduce VM_ALLOW_HUGE_VMAP

Excerpts from Linus Torvalds's message of April 22, 2022 2:15 am:
> On Thu, Apr 21, 2022 at 8:47 AM Edgecombe, Rick P
> <[email protected]> wrote:
>>
>> I wonder if it
>> might have to do with the vmalloc huge pages using compound pages, then
>> some caller doing vmalloc_to_page() and getting surprised with what
>> they could get away with in the struct page.
>
> Very likely. We have 100+ users of vmalloc_to_page() in random
> drivers, and the gpu code does show up on that list.
>
> And is very much another case of "it's always been broken, but
> enabling it on x86 made the breakage actually show up in real life".

Okay that looks like a valid breakage. *Possibly* fb_deferred_io_fault()
using pages vmalloced to screen_buffer? Or a couple of the gpu drivers
are playing with page->mapping as well, not sure if they're vmalloced.

But the fix is this (untested at the moment). It's not some fundamental
reason why any driver should care about allocation size, it's a simple
bug in my code that missed that case. The whole point of the design is
that it's transparent to callers!

Thanks,
Nick

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index e163372d3967..70933f4ed069 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2925,12 +2925,7 @@ vm_area_alloc_pages(gfp_t gfp, int nid,
if (nr != nr_pages_request)
break;
}
- } else
- /*
- * Compound pages required for remap_vmalloc_page if
- * high-order pages.
- */
- gfp |= __GFP_COMP;
+ }

/* High-order pages or fallback path if "bulk" fails. */

@@ -2944,6 +2939,13 @@ vm_area_alloc_pages(gfp_t gfp, int nid,
page = alloc_pages_node(nid, gfp, order);
if (unlikely(!page))
break;
+ /*
+ * Higher order allocations must be able to be treated as
+ * indepdenent small pages by callers (as they can with
+ * small page allocs).
+ */
+ if (order)
+ split_page(page, order);

/*
* Careful, we allocate and map page-order pages, but

2022-04-22 21:26:44

by Edgecombe, Rick P

[permalink] [raw]

Subject: Re: [PATCH v4 bpf 0/4] vmalloc: bpf: introduce VM_ALLOW_HUGE_VMAP

On Fri, 2022-04-22 at 10:12 +1000, Nicholas Piggin wrote:
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index e163372d3967..70933f4ed069 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -2925,12 +2925,7 @@ vm_area_alloc_pages(gfp_t gfp, int nid,
> if (nr != nr_pages_request)
> break;
> }
> - } else
> - /*
> - * Compound pages required for remap_vmalloc_page if
> - * high-order pages.
> - */
> - gfp |= __GFP_COMP;
> + }
>
> /* High-order pages or fallback path if "bulk" fails. */
>
> @@ -2944,6 +2939,13 @@ vm_area_alloc_pages(gfp_t gfp, int nid,
> page = alloc_pages_node(nid, gfp, order);
> if (unlikely(!page))
> break;
> + /*
> + * Higher order allocations must be able to be
> treated as
> + * indepdenent small pages by callers (as they can
> with
> + * small page allocs).
> + */
> + if (order)
> + split_page(page, order);
>
> /*
> * Careful, we allocate and map page-order pages, but

FWIW, I like this direction. I think it needs to free them differently
though? Since currently assumes they are high order pages in that path.
I also wonder if we wouldn't need vm_struct->page_order anymore, and
all the places that would percolates out to. Basically all the places
where it iterates through vm_struct->pages with page_order stepping.

Besides fixing the bisected issue (hopefully), it also more cleanly
separates the mapping from the backing allocation logic. And then since
all the pages are 4k (from the page allocator perspective), it would be
easier to support non-huge page aligned sizes. i.e. not use up a whole
additional 2MB page if you only need 4k more of allocation size.