2021-02-17 21:01:40

by Mike Kravetz

[permalink] [raw]
Subject: [PATCH 1/2] hugetlb: fix update_and_free_page contig page struct assumption

page structs are not guaranteed to be contiguous for gigantic pages. The
routine update_and_free_page can encounter a gigantic page, yet it assumes
page structs are contiguous when setting page flags in subpages.

If update_and_free_page encounters non-contiguous page structs, we can
see “BUG: Bad page state in process …” errors.

Non-contiguous page structs are generally not an issue. However, they can
exist with a specific kernel configuration and hotplug operations. For
example: Configure the kernel with CONFIG_SPARSEMEM and
!CONFIG_SPARSEMEM_VMEMMAP. Then, hotplug add memory for the area where the
gigantic page will be allocated.
Zi Yan outlined steps to reproduce here [1].

[1] https://lore.kernel.org/linux-mm/[email protected]/

Fixes: 944d9fec8d7a ("hugetlb: add support for gigantic page allocation at runtime")
Signed-off-by: Zi Yan <[email protected]>
Signed-off-by: Mike Kravetz <[email protected]>
Cc: <[email protected]>
---
mm/hugetlb.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 4bdb58ab14cb..94e9fa803294 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1312,14 +1312,16 @@ static inline void destroy_compound_gigantic_page(struct page *page,
static void update_and_free_page(struct hstate *h, struct page *page)
{
int i;
+ struct page *subpage = page;

if (hstate_is_gigantic(h) && !gigantic_page_runtime_supported())
return;

h->nr_huge_pages--;
h->nr_huge_pages_node[page_to_nid(page)]--;
- for (i = 0; i < pages_per_huge_page(h); i++) {
- page[i].flags &= ~(1 << PG_locked | 1 << PG_error |
+ for (i = 0; i < pages_per_huge_page(h);
+ i++, subpage = mem_map_next(subpage, page, i)) {
+ subpage->flags &= ~(1 << PG_locked | 1 << PG_error |
1 << PG_referenced | 1 << PG_dirty |
1 << PG_active | 1 << PG_private |
1 << PG_writeback);
--
2.29.2


2021-02-17 22:30:31

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 1/2] hugetlb: fix update_and_free_page contig page struct assumption

On Wed, 17 Feb 2021 10:49:25 -0800 Mike Kravetz <[email protected]> wrote:

> page structs are not guaranteed to be contiguous for gigantic pages. The
> routine update_and_free_page can encounter a gigantic page, yet it assumes
> page structs are contiguous when setting page flags in subpages.
>
> If update_and_free_page encounters non-contiguous page structs, we can
> see “BUG: Bad page state in process …” errors.
>
> Non-contiguous page structs are generally not an issue. However, they can
> exist with a specific kernel configuration and hotplug operations. For
> example: Configure the kernel with CONFIG_SPARSEMEM and
> !CONFIG_SPARSEMEM_VMEMMAP. Then, hotplug add memory for the area where the
> gigantic page will be allocated.
> Zi Yan outlined steps to reproduce here [1].
>
> [1] https://lore.kernel.org/linux-mm/[email protected]/
>
> Fixes: 944d9fec8d7a ("hugetlb: add support for gigantic page allocation at runtime")

June 2014. That's a long lurk time for a bug. I wonder if some later
commit revealed it.

I guess it doesn't matter a lot, but some -stable kernel maintainers
might wonder if they really need this fix...


2021-02-17 22:30:36

by Mike Kravetz

[permalink] [raw]
Subject: [PATCH 2/2] hugetlb: fix copy_huge_page_from_user contig page struct assumption

page structs are not guaranteed to be contiguous for gigantic pages.
The routine copy_huge_page_from_user can encounter gigantic pages, yet it
assumes page structs are contiguous when copying pages from user space.

Since page structs for the target gigantic page are not contiguous,
the data copied from user space could overwrite other pages not
associated with the gigantic page and cause data corruption.

Non-contiguous page structs are generally not an issue. However, they can
exist with a specific kernel configuration and hotplug operations. For
example: Configure the kernel with CONFIG_SPARSEMEM and
!CONFIG_SPARSEMEM_VMEMMAP. Then, hotplug add memory for the area where the
gigantic page will be allocated.

Fixes: 8fb5debc5fcd ("userfaultfd: hugetlbfs: add hugetlb_mcopy_atomic_pte for userfaultfd support")
Signed-off-by: Mike Kravetz <[email protected]>
Cc: <[email protected]>
---
mm/memory.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index feff48e1465a..241bec4199b5 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -5173,17 +5173,19 @@ long copy_huge_page_from_user(struct page *dst_page,
void *page_kaddr;
unsigned long i, rc = 0;
unsigned long ret_val = pages_per_huge_page * PAGE_SIZE;
+ struct page *subpage = dst_page;

- for (i = 0; i < pages_per_huge_page; i++) {
+ for (i = 0; i < pages_per_huge_page;
+ i++, subpage = mem_map_next(subpage, dst_page, i)) {
if (allow_pagefault)
- page_kaddr = kmap(dst_page + i);
+ page_kaddr = kmap(subpage);
else
- page_kaddr = kmap_atomic(dst_page + i);
+ page_kaddr = kmap_atomic(subpage);
rc = copy_from_user(page_kaddr,
(const void __user *)(src + i * PAGE_SIZE),
PAGE_SIZE);
if (allow_pagefault)
- kunmap(dst_page + i);
+ kunmap(subpage);
else
kunmap_atomic(page_kaddr);

--
2.29.2

2021-02-17 22:34:18

by Mike Kravetz

[permalink] [raw]
Subject: Re: [PATCH 1/2] hugetlb: fix update_and_free_page contig page struct assumption

On 2/17/21 11:02 AM, Andrew Morton wrote:
> On Wed, 17 Feb 2021 10:49:25 -0800 Mike Kravetz <[email protected]> wrote:
>
>> page structs are not guaranteed to be contiguous for gigantic pages. The
>> routine update_and_free_page can encounter a gigantic page, yet it assumes
>> page structs are contiguous when setting page flags in subpages.
>>
>> If update_and_free_page encounters non-contiguous page structs, we can
>> see “BUG: Bad page state in process …” errors.
>>
>> Non-contiguous page structs are generally not an issue. However, they can
>> exist with a specific kernel configuration and hotplug operations. For
>> example: Configure the kernel with CONFIG_SPARSEMEM and
>> !CONFIG_SPARSEMEM_VMEMMAP. Then, hotplug add memory for the area where the
>> gigantic page will be allocated.
>> Zi Yan outlined steps to reproduce here [1].
>>
>> [1] https://lore.kernel.org/linux-mm/[email protected]/
>>
>> Fixes: 944d9fec8d7a ("hugetlb: add support for gigantic page allocation at runtime")
>
> June 2014. That's a long lurk time for a bug. I wonder if some later
> commit revealed it.
>
> I guess it doesn't matter a lot, but some -stable kernel maintainers
> might wonder if they really need this fix...

I am not sure how common a CONFIG_SPARSEMEM and !CONFIG_SPARSEMEM_VMEMMAP
config is. On the more popular architectures, this is not the default.
But, you can build a kernel with such options. And, then you need to
hotplug memory add and allocate a gigantic page there.

It is unlikely to happen, but possible since Zi could force the BUG.

The copy_huge_page_from_user bug requires the same non-normal configuration
and is just as unlikely to occurr. But, since it can overwrite somewhat
random pages I would feel better if it was fixed.
--
Mike Kravetz

2021-02-18 17:36:28

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH 1/2] hugetlb: fix update_and_free_page contig page struct assumption

On Wed, Feb 17, 2021 at 11:02:52AM -0800, Andrew Morton wrote:
> On Wed, 17 Feb 2021 10:49:25 -0800 Mike Kravetz <[email protected]> wrote:
> > page structs are not guaranteed to be contiguous for gigantic pages. The
>
> June 2014. That's a long lurk time for a bug. I wonder if some later
> commit revealed it.

I would suggest that gigantic pages have not seen much use. Certainly
performance with Intel CPUs on benchmarks that I've been involved with
showed lower performance with 1GB pages than with 2MB pages until quite
recently.

2021-02-18 19:04:08

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH 1/2] hugetlb: fix update_and_free_page contig page struct assumption

On Thu, Feb 18, 2021 at 02:45:54PM +0000, Matthew Wilcox wrote:
> On Wed, Feb 17, 2021 at 11:02:52AM -0800, Andrew Morton wrote:
> > On Wed, 17 Feb 2021 10:49:25 -0800 Mike Kravetz <[email protected]> wrote:
> > > page structs are not guaranteed to be contiguous for gigantic pages. The
> >
> > June 2014. That's a long lurk time for a bug. I wonder if some later
> > commit revealed it.
>
> I would suggest that gigantic pages have not seen much use. Certainly
> performance with Intel CPUs on benchmarks that I've been involved with
> showed lower performance with 1GB pages than with 2MB pages until quite
> recently.

I suggested in another thread that maybe it is time to consider
dropping this "feature"

If it has been slightly broken for 7 years it seems a good bet it
isn't actually being used.

The cost to fix GUP to be compatible with this will hurt normal
GUP performance - and again, that nobody has hit this bug in GUP
further suggests the feature isn't used..

Jason

2021-02-18 19:06:05

by Zi Yan

[permalink] [raw]
Subject: Re: [PATCH 1/2] hugetlb: fix update_and_free_page contig page struct assumption

On 18 Feb 2021, at 12:25, Jason Gunthorpe wrote:

> On Thu, Feb 18, 2021 at 02:45:54PM +0000, Matthew Wilcox wrote:
>> On Wed, Feb 17, 2021 at 11:02:52AM -0800, Andrew Morton wrote:
>>> On Wed, 17 Feb 2021 10:49:25 -0800 Mike Kravetz <[email protected]> wrote:
>>>> page structs are not guaranteed to be contiguous for gigantic pages. The
>>>
>>> June 2014. That's a long lurk time for a bug. I wonder if some later
>>> commit revealed it.
>>
>> I would suggest that gigantic pages have not seen much use. Certainly
>> performance with Intel CPUs on benchmarks that I've been involved with
>> showed lower performance with 1GB pages than with 2MB pages until quite
>> recently.
>
> I suggested in another thread that maybe it is time to consider
> dropping this "feature"

You mean dropping gigantic page support in hugetlb?

>
> If it has been slightly broken for 7 years it seems a good bet it
> isn't actually being used.
>
> The cost to fix GUP to be compatible with this will hurt normal
> GUP performance - and again, that nobody has hit this bug in GUP
> further suggests the feature isn't used..

A easy fix might be to make gigantic hugetlb page depends on
CONFIG_SPARSEMEM_VMEMMAP, which guarantee all struct pages are contiguous.



Best Regards,
Yan Zi


Attachments:
signature.asc (871.00 B)
OpenPGP digital signature

2021-02-18 19:10:27

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH 1/2] hugetlb: fix update_and_free_page contig page struct assumption

On Thu, Feb 18, 2021 at 12:27:58PM -0500, Zi Yan wrote:
> On 18 Feb 2021, at 12:25, Jason Gunthorpe wrote:
>
> > On Thu, Feb 18, 2021 at 02:45:54PM +0000, Matthew Wilcox wrote:
> >> On Wed, Feb 17, 2021 at 11:02:52AM -0800, Andrew Morton wrote:
> >>> On Wed, 17 Feb 2021 10:49:25 -0800 Mike Kravetz <[email protected]> wrote:
> >>>> page structs are not guaranteed to be contiguous for gigantic pages. The
> >>>
> >>> June 2014. That's a long lurk time for a bug. I wonder if some later
> >>> commit revealed it.
> >>
> >> I would suggest that gigantic pages have not seen much use. Certainly
> >> performance with Intel CPUs on benchmarks that I've been involved with
> >> showed lower performance with 1GB pages than with 2MB pages until quite
> >> recently.
> >
> > I suggested in another thread that maybe it is time to consider
> > dropping this "feature"
>
> You mean dropping gigantic page support in hugetlb?

No, I mean dropping support for arches that want to do:

tail_page != head_page + tail_page_nr

because they can't allocate the required page array either virtually
or physically contiguously.

It seems like quite a burden on the core mm for a very niche, and
maybe even non-existant, case.

It was originally done for PPC, can these PPC systems use VMEMMAP now?

> > The cost to fix GUP to be compatible with this will hurt normal
> > GUP performance - and again, that nobody has hit this bug in GUP
> > further suggests the feature isn't used..
>
> A easy fix might be to make gigantic hugetlb page depends on
> CONFIG_SPARSEMEM_VMEMMAP, which guarantee all struct pages are contiguous.

Yes, exactly.

Jason

2021-02-18 19:11:29

by Mike Kravetz

[permalink] [raw]
Subject: Re: [PATCH 1/2] hugetlb: fix update_and_free_page contig page struct assumption

On 2/18/21 9:25 AM, Jason Gunthorpe wrote:
> On Thu, Feb 18, 2021 at 02:45:54PM +0000, Matthew Wilcox wrote:
>> On Wed, Feb 17, 2021 at 11:02:52AM -0800, Andrew Morton wrote:
>>> On Wed, 17 Feb 2021 10:49:25 -0800 Mike Kravetz <[email protected]> wrote:
>>>> page structs are not guaranteed to be contiguous for gigantic pages. The
>>>
>>> June 2014. That's a long lurk time for a bug. I wonder if some later
>>> commit revealed it.
>>
>> I would suggest that gigantic pages have not seen much use. Certainly
>> performance with Intel CPUs on benchmarks that I've been involved with
>> showed lower performance with 1GB pages than with 2MB pages until quite
>> recently.
>
> I suggested in another thread that maybe it is time to consider
> dropping this "feature"
>
> If it has been slightly broken for 7 years it seems a good bet it
> isn't actually being used.
>
> The cost to fix GUP to be compatible with this will hurt normal
> GUP performance - and again, that nobody has hit this bug in GUP
> further suggests the feature isn't used..

I was thinking that we could detect these 'unusual' configurations and only
do the slower page struct walking in those cases. However, we would need to
do some research to make sure we have taken into account all possible config
options which can produce non-contiguous page structs. That should have zero
performance impact in the 'normal' cases.

I suppose we could prohibit gigantic pages in these 'unusual' configurations.
It would require some research to see if this 'may' impact someone.
--
Mike Kravetz

2021-02-18 19:11:56

by Zi Yan

[permalink] [raw]
Subject: Re: [PATCH 1/2] hugetlb: fix update_and_free_page contig page struct assumption

On 18 Feb 2021, at 12:32, Jason Gunthorpe wrote:

> On Thu, Feb 18, 2021 at 12:27:58PM -0500, Zi Yan wrote:
>> On 18 Feb 2021, at 12:25, Jason Gunthorpe wrote:
>>
>>> On Thu, Feb 18, 2021 at 02:45:54PM +0000, Matthew Wilcox wrote:
>>>> On Wed, Feb 17, 2021 at 11:02:52AM -0800, Andrew Morton wrote:
>>>>> On Wed, 17 Feb 2021 10:49:25 -0800 Mike Kravetz <[email protected]> wrote:
>>>>>> page structs are not guaranteed to be contiguous for gigantic pages. The
>>>>>
>>>>> June 2014. That's a long lurk time for a bug. I wonder if some later
>>>>> commit revealed it.
>>>>
>>>> I would suggest that gigantic pages have not seen much use. Certainly
>>>> performance with Intel CPUs on benchmarks that I've been involved with
>>>> showed lower performance with 1GB pages than with 2MB pages until quite
>>>> recently.
>>>
>>> I suggested in another thread that maybe it is time to consider
>>> dropping this "feature"
>>
>> You mean dropping gigantic page support in hugetlb?
>
> No, I mean dropping support for arches that want to do:
>
> tail_page != head_page + tail_page_nr
>
> because they can't allocate the required page array either virtually
> or physically contiguously.
>
> It seems like quite a burden on the core mm for a very niche, and
> maybe even non-existant, case.
>
> It was originally done for PPC, can these PPC systems use VMEMMAP now?
>
>>> The cost to fix GUP to be compatible with this will hurt normal
>>> GUP performance - and again, that nobody has hit this bug in GUP
>>> further suggests the feature isn't used..
>>
>> A easy fix might be to make gigantic hugetlb page depends on
>> CONFIG_SPARSEMEM_VMEMMAP, which guarantee all struct pages are contiguous.
>
> Yes, exactly.

I actually have a question on CONFIG_SPARSEMEM_VMEMMAP. Can we assume
PFN_A - PFN_B == struct_page_A - struct_page_B, meaning all struct pages
are ordered based on physical addresses? I just wonder for two PFN ranges,
e.g., [0 - 128MB], [128MB - 256MB], if it is possible to first online
[128MB - 256MB] then [0 - 128MB] and the struct pages of [128MB - 256MB]
are in front of [0 - 128MB] in the vmemmap due to online ordering.



Best Regards,
Yan Zi


Attachments:
signature.asc (871.00 B)
OpenPGP digital signature

2021-02-18 19:15:32

by Mike Kravetz

[permalink] [raw]
Subject: Re: [PATCH 1/2] hugetlb: fix update_and_free_page contig page struct assumption

On 2/18/21 9:40 AM, Zi Yan wrote:
> On 18 Feb 2021, at 12:32, Jason Gunthorpe wrote:
>
>> On Thu, Feb 18, 2021 at 12:27:58PM -0500, Zi Yan wrote:
>>> On 18 Feb 2021, at 12:25, Jason Gunthorpe wrote:
>>>
>>>> On Thu, Feb 18, 2021 at 02:45:54PM +0000, Matthew Wilcox wrote:
>>>>> On Wed, Feb 17, 2021 at 11:02:52AM -0800, Andrew Morton wrote:
>>>>>> On Wed, 17 Feb 2021 10:49:25 -0800 Mike Kravetz <[email protected]> wrote:
>>>>>>> page structs are not guaranteed to be contiguous for gigantic pages. The
>>>>>>
>>>>>> June 2014. That's a long lurk time for a bug. I wonder if some later
>>>>>> commit revealed it.
>>>>>
>>>>> I would suggest that gigantic pages have not seen much use. Certainly
>>>>> performance with Intel CPUs on benchmarks that I've been involved with
>>>>> showed lower performance with 1GB pages than with 2MB pages until quite
>>>>> recently.
>>>>
>>>> I suggested in another thread that maybe it is time to consider
>>>> dropping this "feature"
>>>
>>> You mean dropping gigantic page support in hugetlb?
>>
>> No, I mean dropping support for arches that want to do:
>>
>> tail_page != head_page + tail_page_nr
>>
>> because they can't allocate the required page array either virtually
>> or physically contiguously.
>>
>> It seems like quite a burden on the core mm for a very niche, and
>> maybe even non-existant, case.
>>
>> It was originally done for PPC, can these PPC systems use VMEMMAP now?
>>
>>>> The cost to fix GUP to be compatible with this will hurt normal
>>>> GUP performance - and again, that nobody has hit this bug in GUP
>>>> further suggests the feature isn't used..
>>>
>>> A easy fix might be to make gigantic hugetlb page depends on
>>> CONFIG_SPARSEMEM_VMEMMAP, which guarantee all struct pages are contiguous.
>>
>> Yes, exactly.
>
> I actually have a question on CONFIG_SPARSEMEM_VMEMMAP. Can we assume
> PFN_A - PFN_B == struct_page_A - struct_page_B, meaning all struct pages
> are ordered based on physical addresses? I just wonder for two PFN ranges,
> e.g., [0 - 128MB], [128MB - 256MB], if it is possible to first online
> [128MB - 256MB] then [0 - 128MB] and the struct pages of [128MB - 256MB]
> are in front of [0 - 128MB] in the vmemmap due to online ordering.

I have not looked at the code which does the onlining and vmemmap setup.
But, these definitions make me believe it is true:

#elif defined(CONFIG_SPARSEMEM_VMEMMAP)

/* memmap is virtually contiguous. */
#define __pfn_to_page(pfn) (vmemmap + (pfn))
#define __page_to_pfn(page) (unsigned long)((page) - vmemmap)

--
Mike Kravetz

2021-02-18 19:35:28

by Zi Yan

[permalink] [raw]
Subject: Re: [PATCH 1/2] hugetlb: fix update_and_free_page contig page struct assumption

On 18 Feb 2021, at 12:51, Mike Kravetz wrote:

> On 2/18/21 9:40 AM, Zi Yan wrote:
>> On 18 Feb 2021, at 12:32, Jason Gunthorpe wrote:
>>
>>> On Thu, Feb 18, 2021 at 12:27:58PM -0500, Zi Yan wrote:
>>>> On 18 Feb 2021, at 12:25, Jason Gunthorpe wrote:
>>>>
>>>>> On Thu, Feb 18, 2021 at 02:45:54PM +0000, Matthew Wilcox wrote:
>>>>>> On Wed, Feb 17, 2021 at 11:02:52AM -0800, Andrew Morton wrote:
>>>>>>> On Wed, 17 Feb 2021 10:49:25 -0800 Mike Kravetz <[email protected]> wrote:
>>>>>>>> page structs are not guaranteed to be contiguous for gigantic pages. The
>>>>>>>
>>>>>>> June 2014. That's a long lurk time for a bug. I wonder if some later
>>>>>>> commit revealed it.
>>>>>>
>>>>>> I would suggest that gigantic pages have not seen much use. Certainly
>>>>>> performance with Intel CPUs on benchmarks that I've been involved with
>>>>>> showed lower performance with 1GB pages than with 2MB pages until quite
>>>>>> recently.
>>>>>
>>>>> I suggested in another thread that maybe it is time to consider
>>>>> dropping this "feature"
>>>>
>>>> You mean dropping gigantic page support in hugetlb?
>>>
>>> No, I mean dropping support for arches that want to do:
>>>
>>> tail_page != head_page + tail_page_nr
>>>
>>> because they can't allocate the required page array either virtually
>>> or physically contiguously.
>>>
>>> It seems like quite a burden on the core mm for a very niche, and
>>> maybe even non-existant, case.
>>>
>>> It was originally done for PPC, can these PPC systems use VMEMMAP now?
>>>
>>>>> The cost to fix GUP to be compatible with this will hurt normal
>>>>> GUP performance - and again, that nobody has hit this bug in GUP
>>>>> further suggests the feature isn't used..
>>>>
>>>> A easy fix might be to make gigantic hugetlb page depends on
>>>> CONFIG_SPARSEMEM_VMEMMAP, which guarantee all struct pages are contiguous.
>>>
>>> Yes, exactly.
>>
>> I actually have a question on CONFIG_SPARSEMEM_VMEMMAP. Can we assume
>> PFN_A - PFN_B == struct_page_A - struct_page_B, meaning all struct pages
>> are ordered based on physical addresses? I just wonder for two PFN ranges,
>> e.g., [0 - 128MB], [128MB - 256MB], if it is possible to first online
>> [128MB - 256MB] then [0 - 128MB] and the struct pages of [128MB - 256MB]
>> are in front of [0 - 128MB] in the vmemmap due to online ordering.
>
> I have not looked at the code which does the onlining and vmemmap setup.
> But, these definitions make me believe it is true:
>
> #elif defined(CONFIG_SPARSEMEM_VMEMMAP)
>
> /* memmap is virtually contiguous. */
> #define __pfn_to_page(pfn) (vmemmap + (pfn))
> #define __page_to_pfn(page) (unsigned long)((page) - vmemmap)

Makes sense. Thank you for checking.

I guess making gigantic page depends on CONFIG_SPARSEMEM_VMEMMAP might
be a good way of simplifying code and avoiding future bugs unless
there is an arch really needs gigantic page and cannot have VMEMMAP.


Best Regards,
Yan Zi


Attachments:
signature.asc (871.00 B)
OpenPGP digital signature

2021-02-18 21:47:36

by Mike Kravetz

[permalink] [raw]
Subject: Re: [PATCH 1/2] hugetlb: fix update_and_free_page contig page struct assumption

On 2/18/21 9:34 AM, Mike Kravetz wrote:
> On 2/18/21 9:25 AM, Jason Gunthorpe wrote:
>> On Thu, Feb 18, 2021 at 02:45:54PM +0000, Matthew Wilcox wrote:
>>> On Wed, Feb 17, 2021 at 11:02:52AM -0800, Andrew Morton wrote:
>>>> On Wed, 17 Feb 2021 10:49:25 -0800 Mike Kravetz <[email protected]> wrote:
>>>>> page structs are not guaranteed to be contiguous for gigantic pages. The
>>>>
>>>> June 2014. That's a long lurk time for a bug. I wonder if some later
>>>> commit revealed it.
>>>
>>> I would suggest that gigantic pages have not seen much use. Certainly
>>> performance with Intel CPUs on benchmarks that I've been involved with
>>> showed lower performance with 1GB pages than with 2MB pages until quite
>>> recently.
>>
>> I suggested in another thread that maybe it is time to consider
>> dropping this "feature"
>>
>> If it has been slightly broken for 7 years it seems a good bet it
>> isn't actually being used.
>>
>> The cost to fix GUP to be compatible with this will hurt normal
>> GUP performance - and again, that nobody has hit this bug in GUP
>> further suggests the feature isn't used..
>
> I was thinking that we could detect these 'unusual' configurations and only
> do the slower page struct walking in those cases. However, we would need to
> do some research to make sure we have taken into account all possible config
> options which can produce non-contiguous page structs. That should have zero
> performance impact in the 'normal' cases.

What about something like the following patch, and making all code that
wants to scan gigantic page subpages use mem_map_next()?

From 95b0384bd5d7f0435546bdd3c01c478724ae0166 Mon Sep 17 00:00:00 2001
From: Mike Kravetz <[email protected]>
Date: Thu, 18 Feb 2021 13:35:02 -0800
Subject: [PATCH] mm: define PFN_PAGE_MAP_LINEAR to optimize gigantic page
scans

Signed-off-by: Mike Kravetz <[email protected]>
---
arch/ia64/include/asm/page.h | 1 +
arch/m68k/include/asm/page_no.h | 1 +
include/asm-generic/memory_model.h | 2 ++
mm/internal.h | 2 ++
4 files changed, 6 insertions(+)

diff --git a/arch/ia64/include/asm/page.h b/arch/ia64/include/asm/page.h
index b69a5499d75b..8f4288862ec8 100644
--- a/arch/ia64/include/asm/page.h
+++ b/arch/ia64/include/asm/page.h
@@ -106,6 +106,7 @@ extern struct page *vmem_map;
#ifdef CONFIG_DISCONTIGMEM
# define page_to_pfn(page) ((unsigned long) (page - vmem_map))
# define pfn_to_page(pfn) (vmem_map + (pfn))
+# define PFN_PAGE_MAP_LINEAR
# define __pfn_to_phys(pfn) PFN_PHYS(pfn)
#else
# include <asm-generic/memory_model.h>
diff --git a/arch/m68k/include/asm/page_no.h b/arch/m68k/include/asm/page_no.h
index 6bbe52025de3..cafc0731a42c 100644
--- a/arch/m68k/include/asm/page_no.h
+++ b/arch/m68k/include/asm/page_no.h
@@ -28,6 +28,7 @@ extern unsigned long memory_end;

#define pfn_to_page(pfn) virt_to_page(pfn_to_virt(pfn))
#define page_to_pfn(page) virt_to_pfn(page_to_virt(page))
+#define PFN_PAGE_MAP_LINEAR
#define pfn_valid(pfn) ((pfn) < max_mapnr)

#define virt_addr_valid(kaddr) (((void *)(kaddr) >= (void *)PAGE_OFFSET) && \
diff --git a/include/asm-generic/memory_model.h b/include/asm-generic/memory_model.h
index 7637fb46ba4f..8ac4c48dbf22 100644
--- a/include/asm-generic/memory_model.h
+++ b/include/asm-generic/memory_model.h
@@ -33,6 +33,7 @@
#define __pfn_to_page(pfn) (mem_map + ((pfn) - ARCH_PFN_OFFSET))
#define __page_to_pfn(page) ((unsigned long)((page) - mem_map) + \
ARCH_PFN_OFFSET)
+#define PFN_PAGE_MAP_LINEAR
#elif defined(CONFIG_DISCONTIGMEM)

#define __pfn_to_page(pfn) \
@@ -53,6 +54,7 @@
/* memmap is virtually contiguous. */
#define __pfn_to_page(pfn) (vmemmap + (pfn))
#define __page_to_pfn(page) (unsigned long)((page) - vmemmap)
+#define PFN_PAGE_MAP_LINEAR

#elif defined(CONFIG_SPARSEMEM)
/*
diff --git a/mm/internal.h b/mm/internal.h
index 25d2b2439f19..64cc5069047c 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -454,12 +454,14 @@ static inline struct page *mem_map_offset(struct page *base, int offset)
static inline struct page *mem_map_next(struct page *iter,
struct page *base, int offset)
{
+#ifndef PFN_PAGE_MAP_LINEAR
if (unlikely((offset & (MAX_ORDER_NR_PAGES - 1)) == 0)) {
unsigned long pfn = page_to_pfn(base) + offset;
if (!pfn_valid(pfn))
return NULL;
return pfn_to_page(pfn);
}
+#endif
return iter + 1;
}

--
2.29.2