After commit 907ec5fca3dc ("mm: zero remaining unavailable struct pages"),
struct page of reserved memory is zeroed. This causes page->flags to be 0
and fixes issues related to reading /proc/kpageflags, for example, of
reserved memory.
The VM_BUG_ON() in move_freepages_block(), however, assumes that
page_zone() is meaningful even for reserved memory. That assumption is no
longer true after the aforementioned commit.
There's no reason why move_freepages_block() should be testing the
legitimacy of page_zone() for reserved memory; its scope is limited only
to pages on the zone's freelist.
Note that pfn_valid() can be true for reserved memory: there is a backing
struct page. The check for page_to_nid(page) is also buggy but reserved
memory normally only appears on node 0 so the zeroing doesn't affect this.
Move the debug checks to after verifying PageBuddy is true. This isolates
the scope of the checks to only be for buddy pages which are on the zone's
freelist which move_freepages_block() is operating on. In this case, an
incorrect node or zone is a bug worthy of being warned about (and the
examination of struct page is acceptable bcause this memory is not
reserved).
Signed-off-by: David Rientjes <[email protected]>
---
mm/page_alloc.c | 19 ++++---------------
1 file changed, 4 insertions(+), 15 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2238,27 +2238,12 @@ static int move_freepages(struct zone *zone,
unsigned int order;
int pages_moved = 0;
-#ifndef CONFIG_HOLES_IN_ZONE
- /*
- * page_zone is not safe to call in this context when
- * CONFIG_HOLES_IN_ZONE is set. This bug check is probably redundant
- * anyway as we check zone boundaries in move_freepages_block().
- * Remove at a later date when no bug reports exist related to
- * grouping pages by mobility
- */
- VM_BUG_ON(pfn_valid(page_to_pfn(start_page)) &&
- pfn_valid(page_to_pfn(end_page)) &&
- page_zone(start_page) != page_zone(end_page));
-#endif
for (page = start_page; page <= end_page;) {
if (!pfn_valid_within(page_to_pfn(page))) {
page++;
continue;
}
- /* Make sure we are not inadvertently changing nodes */
- VM_BUG_ON_PAGE(page_to_nid(page) != zone_to_nid(zone), page);
-
if (!PageBuddy(page)) {
/*
* We assume that pages that could be isolated for
@@ -2273,6 +2258,10 @@ static int move_freepages(struct zone *zone,
continue;
}
+ /* Make sure we are not inadvertently changing nodes */
+ VM_BUG_ON_PAGE(page_to_nid(page) != zone_to_nid(zone), page);
+ VM_BUG_ON_PAGE(page_zone(page) != zone, page);
+
order = page_order(page);
move_to_free_area(page, &zone->free_area[order], migratetype);
page += 1 << order;
On 8/13/19 5:37 AM, David Rientjes wrote:
> After commit 907ec5fca3dc ("mm: zero remaining unavailable struct pages"),
> struct page of reserved memory is zeroed. This causes page->flags to be 0
> and fixes issues related to reading /proc/kpageflags, for example, of
> reserved memory.
>
> The VM_BUG_ON() in move_freepages_block(), however, assumes that
> page_zone() is meaningful even for reserved memory. That assumption is no
> longer true after the aforementioned commit.
How comes that move_freepages_block() gets called on reserved memory in
the first place?
> There's no reason why move_freepages_block() should be testing the
> legitimacy of page_zone() for reserved memory; its scope is limited only
> to pages on the zone's freelist.
>
> Note that pfn_valid() can be true for reserved memory: there is a backing
> struct page. The check for page_to_nid(page) is also buggy but reserved
> memory normally only appears on node 0 so the zeroing doesn't affect this.
>
> Move the debug checks to after verifying PageBuddy is true. This isolates
> the scope of the checks to only be for buddy pages which are on the zone's
> freelist which move_freepages_block() is operating on. In this case, an
> incorrect node or zone is a bug worthy of being warned about (and the
> examination of struct page is acceptable bcause this memory is not
> reserved).
>
> Signed-off-by: David Rientjes <[email protected]>
> ---
> mm/page_alloc.c | 19 ++++---------------
> 1 file changed, 4 insertions(+), 15 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2238,27 +2238,12 @@ static int move_freepages(struct zone *zone,
> unsigned int order;
> int pages_moved = 0;
>
> -#ifndef CONFIG_HOLES_IN_ZONE
> - /*
> - * page_zone is not safe to call in this context when
> - * CONFIG_HOLES_IN_ZONE is set. This bug check is probably redundant
> - * anyway as we check zone boundaries in move_freepages_block().
> - * Remove at a later date when no bug reports exist related to
> - * grouping pages by mobility
> - */
> - VM_BUG_ON(pfn_valid(page_to_pfn(start_page)) &&
> - pfn_valid(page_to_pfn(end_page)) &&
> - page_zone(start_page) != page_zone(end_page));
> -#endif
> for (page = start_page; page <= end_page;) {
> if (!pfn_valid_within(page_to_pfn(page))) {
> page++;
> continue;
> }
>
> - /* Make sure we are not inadvertently changing nodes */
> - VM_BUG_ON_PAGE(page_to_nid(page) != zone_to_nid(zone), page);
> -
> if (!PageBuddy(page)) {
> /*
> * We assume that pages that could be isolated for
> @@ -2273,6 +2258,10 @@ static int move_freepages(struct zone *zone,
> continue;
> }
>
> + /* Make sure we are not inadvertently changing nodes */
> + VM_BUG_ON_PAGE(page_to_nid(page) != zone_to_nid(zone), page);
> + VM_BUG_ON_PAGE(page_zone(page) != zone, page);
The later check implies the former check, so if it's to stay, the first
one could be removed and comment adjusted s/nodes/zones/
> +
> order = page_order(page);
> move_to_free_area(page, &zone->free_area[order], migratetype);
> page += 1 << order;
>
On Tue, 13 Aug 2019, Vlastimil Babka wrote:
> > After commit 907ec5fca3dc ("mm: zero remaining unavailable struct pages"),
> > struct page of reserved memory is zeroed. This causes page->flags to be 0
> > and fixes issues related to reading /proc/kpageflags, for example, of
> > reserved memory.
> >
> > The VM_BUG_ON() in move_freepages_block(), however, assumes that
> > page_zone() is meaningful even for reserved memory. That assumption is no
> > longer true after the aforementioned commit.
>
> How comes that move_freepages_block() gets called on reserved memory in
> the first place?
>
It's simply math after finding a valid free page from the per-zone free
area to use as fallback. We find the beginning and end of the pageblock
of the valid page and that can bring us into memory that was reserved per
the e820. pfn_valid() is still true (it's backed by a struct page), but
since it's zero'd we shouldn't make any inferences here about comparing
its node or zone. The current node check just happens to succeed most of
the time by luck because reserved memory typically appears on node 0.
The fix here is to validate that we actually have buddy pages before
testing if there's any type of zone or node strangeness going on.
> > There's no reason why move_freepages_block() should be testing the
> > legitimacy of page_zone() for reserved memory; its scope is limited only
> > to pages on the zone's freelist.
> >
> > Note that pfn_valid() can be true for reserved memory: there is a backing
> > struct page. The check for page_to_nid(page) is also buggy but reserved
> > memory normally only appears on node 0 so the zeroing doesn't affect this.
> >
> > Move the debug checks to after verifying PageBuddy is true. This isolates
> > the scope of the checks to only be for buddy pages which are on the zone's
> > freelist which move_freepages_block() is operating on. In this case, an
> > incorrect node or zone is a bug worthy of being warned about (and the
> > examination of struct page is acceptable bcause this memory is not
> > reserved).
> >
> > Signed-off-by: David Rientjes <[email protected]>
> > ---
> > mm/page_alloc.c | 19 ++++---------------
> > 1 file changed, 4 insertions(+), 15 deletions(-)
> >
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -2238,27 +2238,12 @@ static int move_freepages(struct zone *zone,
> > unsigned int order;
> > int pages_moved = 0;
> >
> > -#ifndef CONFIG_HOLES_IN_ZONE
> > - /*
> > - * page_zone is not safe to call in this context when
> > - * CONFIG_HOLES_IN_ZONE is set. This bug check is probably redundant
> > - * anyway as we check zone boundaries in move_freepages_block().
> > - * Remove at a later date when no bug reports exist related to
> > - * grouping pages by mobility
> > - */
> > - VM_BUG_ON(pfn_valid(page_to_pfn(start_page)) &&
> > - pfn_valid(page_to_pfn(end_page)) &&
> > - page_zone(start_page) != page_zone(end_page));
> > -#endif
> > for (page = start_page; page <= end_page;) {
> > if (!pfn_valid_within(page_to_pfn(page))) {
> > page++;
> > continue;
> > }
> >
> > - /* Make sure we are not inadvertently changing nodes */
> > - VM_BUG_ON_PAGE(page_to_nid(page) != zone_to_nid(zone), page);
> > -
> > if (!PageBuddy(page)) {
> > /*
> > * We assume that pages that could be isolated for
> > @@ -2273,6 +2258,10 @@ static int move_freepages(struct zone *zone,
> > continue;
> > }
> >
> > + /* Make sure we are not inadvertently changing nodes */
> > + VM_BUG_ON_PAGE(page_to_nid(page) != zone_to_nid(zone), page);
> > + VM_BUG_ON_PAGE(page_zone(page) != zone, page);
>
> The later check implies the former check, so if it's to stay, the first
> one could be removed and comment adjusted s/nodes/zones/
>
Does it? The first is checking for a corrupted page_to_nid the second is
checking for a corrupted or unexpected page_zone. What's being tested
here is the state of struct page, as it was previous to this patch, not
the state of struct zone.
On Mon, 12 Aug 2019 20:37:11 -0700 (PDT) David Rientjes <[email protected]> wrote:
> After commit 907ec5fca3dc ("mm: zero remaining unavailable struct pages"),
> struct page of reserved memory is zeroed. This causes page->flags to be 0
> and fixes issues related to reading /proc/kpageflags, for example, of
> reserved memory.
>
> The VM_BUG_ON() in move_freepages_block(), however, assumes that
> page_zone() is meaningful even for reserved memory. That assumption is no
> longer true after the aforementioned commit.
>
> There's no reason why move_freepages_block() should be testing the
> legitimacy of page_zone() for reserved memory; its scope is limited only
> to pages on the zone's freelist.
>
> Note that pfn_valid() can be true for reserved memory: there is a backing
> struct page. The check for page_to_nid(page) is also buggy but reserved
> memory normally only appears on node 0 so the zeroing doesn't affect this.
>
> Move the debug checks to after verifying PageBuddy is true. This isolates
> the scope of the checks to only be for buddy pages which are on the zone's
> freelist which move_freepages_block() is operating on. In this case, an
> incorrect node or zone is a bug worthy of being warned about (and the
> examination of struct page is acceptable bcause this memory is not
> reserved).
I'm thinking Fixes:907ec5fca3dc and Cc:stable? But 907ec5fca3dc is
almost a year old, so you were doing something special to trigger this?
On Tue, 13 Aug 2019, Andrew Morton wrote:
> > After commit 907ec5fca3dc ("mm: zero remaining unavailable struct pages"),
> > struct page of reserved memory is zeroed. This causes page->flags to be 0
> > and fixes issues related to reading /proc/kpageflags, for example, of
> > reserved memory.
> >
> > The VM_BUG_ON() in move_freepages_block(), however, assumes that
> > page_zone() is meaningful even for reserved memory. That assumption is no
> > longer true after the aforementioned commit.
> >
> > There's no reason why move_freepages_block() should be testing the
> > legitimacy of page_zone() for reserved memory; its scope is limited only
> > to pages on the zone's freelist.
> >
> > Note that pfn_valid() can be true for reserved memory: there is a backing
> > struct page. The check for page_to_nid(page) is also buggy but reserved
> > memory normally only appears on node 0 so the zeroing doesn't affect this.
> >
> > Move the debug checks to after verifying PageBuddy is true. This isolates
> > the scope of the checks to only be for buddy pages which are on the zone's
> > freelist which move_freepages_block() is operating on. In this case, an
> > incorrect node or zone is a bug worthy of being warned about (and the
> > examination of struct page is acceptable bcause this memory is not
> > reserved).
>
> I'm thinking Fixes:907ec5fca3dc and Cc:stable? But 907ec5fca3dc is
> almost a year old, so you were doing something special to trigger this?
>
We noticed it almost immediately after bringing 907ec5fca3dc in on
CONFIG_DEBUG_VM builds. It depends on finding specific free pages in the
per-zone free area where the math in move_freepages() will bring the start
or end pfn into reserved memory and wanting to claim that entire pageblock
as a new migratetype. So the path will be rare, require CONFIG_DEBUG_VM,
and require fallback to a different migratetype.
Some struct pages were already zeroed from reserve pages before
907ec5fca3c so it theoretically could trigger before this commit. I think
it's rare enough under a config option that most people don't run that
others may not have noticed. I wouldn't argue against a stable tag and
the backport should be easy enough, but probably wouldn't single out a
commit that this is fixing.
On 8/13/19 7:22 PM, David Rientjes wrote:
> On Tue, 13 Aug 2019, Vlastimil Babka wrote:
>
>> > After commit 907ec5fca3dc ("mm: zero remaining unavailable struct pages"),
>> > struct page of reserved memory is zeroed. This causes page->flags to be 0
>> > and fixes issues related to reading /proc/kpageflags, for example, of
>> > reserved memory.
>> >
>> > The VM_BUG_ON() in move_freepages_block(), however, assumes that
>> > page_zone() is meaningful even for reserved memory. That assumption is no
>> > longer true after the aforementioned commit.
>>
>> How comes that move_freepages_block() gets called on reserved memory in
>> the first place?
>>
>
> It's simply math after finding a valid free page from the per-zone free
> area to use as fallback. We find the beginning and end of the pageblock
> of the valid page and that can bring us into memory that was reserved per
> the e820. pfn_valid() is still true (it's backed by a struct page), but
> since it's zero'd we shouldn't make any inferences here about comparing
> its node or zone. The current node check just happens to succeed most of
> the time by luck because reserved memory typically appears on node 0.
>
> The fix here is to validate that we actually have buddy pages before
> testing if there's any type of zone or node strangeness going on.
I see, thanks.
>> > @@ -2273,6 +2258,10 @@ static int move_freepages(struct zone *zone,
>> > continue;
>> > }
>> >
>> > + /* Make sure we are not inadvertently changing nodes */
>> > + VM_BUG_ON_PAGE(page_to_nid(page) != zone_to_nid(zone), page);
>> > + VM_BUG_ON_PAGE(page_zone(page) != zone, page);
>>
>> The later check implies the former check, so if it's to stay, the first
>> one could be removed and comment adjusted s/nodes/zones/
>>
>
> Does it? The first is checking for a corrupted page_to_nid the second is
> checking for a corrupted or unexpected page_zone. What's being tested
> here is the state of struct page, as it was previous to this patch, not
> the state of struct zone.
page_zone() calls page_to_nid() internally, so if nid was wrong, the resulting
zone pointer would be also wrong. But if you want more fine grained bug output,
that's fine.
On Tue, 13 Aug 2019 16:31:35 -0700 (PDT) David Rientjes <[email protected]> wrote:
> > > Move the debug checks to after verifying PageBuddy is true. This isolates
> > > the scope of the checks to only be for buddy pages which are on the zone's
> > > freelist which move_freepages_block() is operating on. In this case, an
> > > incorrect node or zone is a bug worthy of being warned about (and the
> > > examination of struct page is acceptable bcause this memory is not
> > > reserved).
> >
> > I'm thinking Fixes:907ec5fca3dc and Cc:stable? But 907ec5fca3dc is
> > almost a year old, so you were doing something special to trigger this?
> >
>
> We noticed it almost immediately after bringing 907ec5fca3dc in on
> CONFIG_DEBUG_VM builds. It depends on finding specific free pages in the
> per-zone free area where the math in move_freepages() will bring the start
> or end pfn into reserved memory and wanting to claim that entire pageblock
> as a new migratetype. So the path will be rare, require CONFIG_DEBUG_VM,
> and require fallback to a different migratetype.
>
> Some struct pages were already zeroed from reserve pages before
> 907ec5fca3c so it theoretically could trigger before this commit. I think
> it's rare enough under a config option that most people don't run that
> others may not have noticed. I wouldn't argue against a stable tag and
> the backport should be easy enough, but probably wouldn't single out a
> commit that this is fixing.
OK, thanks. I added the above two paragraphs to the changelog and
removed the Fixes:
Hopefully Mel will be able to review this for us.
On Wed, Aug 14, 2019 at 03:49:29PM -0700, Andrew Morton wrote:
> On Tue, 13 Aug 2019 16:31:35 -0700 (PDT) David Rientjes <[email protected]> wrote:
>
> > > > Move the debug checks to after verifying PageBuddy is true. This isolates
> > > > the scope of the checks to only be for buddy pages which are on the zone's
> > > > freelist which move_freepages_block() is operating on. In this case, an
> > > > incorrect node or zone is a bug worthy of being warned about (and the
> > > > examination of struct page is acceptable bcause this memory is not
> > > > reserved).
> > >
> > > I'm thinking Fixes:907ec5fca3dc and Cc:stable? But 907ec5fca3dc is
> > > almost a year old, so you were doing something special to trigger this?
> > >
> >
> > We noticed it almost immediately after bringing 907ec5fca3dc in on
> > CONFIG_DEBUG_VM builds. It depends on finding specific free pages in the
> > per-zone free area where the math in move_freepages() will bring the start
> > or end pfn into reserved memory and wanting to claim that entire pageblock
> > as a new migratetype. So the path will be rare, require CONFIG_DEBUG_VM,
> > and require fallback to a different migratetype.
> >
> > Some struct pages were already zeroed from reserve pages before
> > 907ec5fca3c so it theoretically could trigger before this commit. I think
> > it's rare enough under a config option that most people don't run that
> > others may not have noticed. I wouldn't argue against a stable tag and
> > the backport should be easy enough, but probably wouldn't single out a
> > commit that this is fixing.
>
> OK, thanks. I added the above two paragraphs to the changelog and
> removed the Fixes:
>
> Hopefully Mel will be able to review this for us.
Bit late as I was offline but FWIW
Acked-by: Mel Gorman <[email protected]>
That said, the overhead of the debugging check is higher with this
patch although it'll only affect debug builds and the path is not
particularly hot. If this was a concern, I think it would be reasonable
to simply remove the debugging check as the zone boundaries are checked
in move_freepages_block and we never expect a zone/node to be smaller
than a pageblock and stuck in the middle of another zone.
--
Mel Gorman
SUSE Labs