2016-03-14 07:31:50

by Joonsoo Kim

[permalink] [raw]
Subject: [PATCH 0/6] Add zone overlapping check

From: Joonsoo Kim <[email protected]>

Hello, all.

This patchset deals with some problematic sites that iterate pfn range.

There is a system that node's pfn are overlapped like as following.

-----pfn-------->
N0 N1 N2 N0 N1 N2

Therefore, we need to care this overlapping when iterating pfn range.

I audit many iterating sites that uses pfn_valid(), pfn_valid_within(),
zone_start_pfn and etc. and others looks safe for me. This is
a preparation step for new CMA implementation, ZONE_CMA [1], because
it would be easily overlapped with other zones. But, zone overlap
check is also needed for general case so I send it separately.

This is based on next-20160311.

Thanks.

[1]: https://lkml.org/lkml/2015/2/12/95

Joonsoo Kim (6):
mm/page_alloc: fix same zone check in __pageblock_pfn_to_page()
mm/hugetlb: add same zone check in pfn_range_valid_gigantic()
mm/memory_hotplug: add comment to some functions related to memory
hotplug
mm/vmstat: add zone range overlapping check
mm/page_owner: add zone range overlapping check
power: add zone range overlapping check

mm/hugetlb.c | 9 ++++++---
mm/page_alloc.c | 10 +++++++---
mm/page_isolation.c | 1 +
mm/page_owner.c | 3 +++
mm/vmstat.c | 7 +++++++
5 files changed, 24 insertions(+), 6 deletions(-)

--
1.9.1


2016-03-14 07:32:09

by Joonsoo Kim

[permalink] [raw]
Subject: [PATCH 1/6] mm/page_alloc: fix same zone check in __pageblock_pfn_to_page()

From: Joonsoo Kim <[email protected]>

There is a system that node's pfn are overlapped like as following.

-----pfn-------->
N0 N1 N2 N0 N1 N2

Therefore, we need to care this overlapping when iterating pfn range.

In __pageblock_pfn_to_page(), there is a check for this but it's
not sufficient. This check cannot distinguish the case that zone id
is the same but node id is different. This patch fixes it.

Signed-off-by: Joonsoo Kim <[email protected]>
---
mm/page_alloc.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 8120f07..93293b4 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1173,8 +1173,7 @@ struct page *__pageblock_pfn_to_page(unsigned long start_pfn,

end_page = pfn_to_page(end_pfn);

- /* This gives a shorter code than deriving page_zone(end_page) */
- if (page_zone_id(start_page) != page_zone_id(end_page))
+ if (zone != page_zone(end_page))
return NULL;

return start_page;
--
1.9.1

2016-03-14 07:32:17

by Joonsoo Kim

[permalink] [raw]
Subject: [PATCH 3/6] mm/memory_hotplug: add comment to some functions related to memory hotplug

From: Joonsoo Kim <[email protected]>

__offline_isolated_pages() and test_pages_isolated() are used by memory
hotplug. These functions require that range is in a single zone but
there is no code about it because memory hotplug checks it before calling
these functions. Not to confuse future user of these functions,
this patch adds comment on them.

Signed-off-by: Joonsoo Kim <[email protected]>
---
mm/page_alloc.c | 3 ++-
mm/page_isolation.c | 1 +
2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 93293b4..08d5536 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -7260,7 +7260,8 @@ void zone_pcp_reset(struct zone *zone)

#ifdef CONFIG_MEMORY_HOTREMOVE
/*
- * All pages in the range must be isolated before calling this.
+ * All pages in the range must be in a single zone and isolated
+ * before calling this.
*/
void
__offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn)
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index 92c4c36..f4c0a9b 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -246,6 +246,7 @@ __test_page_isolated_in_pageblock(unsigned long pfn, unsigned long end_pfn,
return pfn;
}

+/* Caller should ensure that requested range is in a single zone */
int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn,
bool skip_hwpoisoned_pages)
{
--
1.9.1

2016-03-14 07:32:31

by Joonsoo Kim

[permalink] [raw]
Subject: [PATCH 4/6] mm/vmstat: add zone range overlapping check

From: Joonsoo Kim <[email protected]>

There is a system that node's pfn are overlapped like as following.

-----pfn-------->
N0 N1 N2 N0 N1 N2

Therefore, we need to care this overlapping when iterating pfn range.

There are two places in vmstat.c that iterates pfn range and
they don't consider this overlapping. Add it.

Without this patch, above system could over count pageblock number
on a zone.

Signed-off-by: Joonsoo Kim <[email protected]>
---
mm/vmstat.c | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/mm/vmstat.c b/mm/vmstat.c
index 5e43004..0a726e3 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1010,6 +1010,9 @@ static void pagetypeinfo_showblockcount_print(struct seq_file *m,
if (!memmap_valid_within(pfn, page, zone))
continue;

+ if (page_zone(page) != zone)
+ continue;
+
mtype = get_pageblock_migratetype(page);

if (mtype < MIGRATE_TYPES)
@@ -1076,6 +1079,10 @@ static void pagetypeinfo_showmixedcount_print(struct seq_file *m,
continue;

page = pfn_to_page(pfn);
+
+ if (page_zone(page) != zone)
+ continue;
+
if (PageBuddy(page)) {
pfn += (1UL << page_order(page)) - 1;
continue;
--
1.9.1

2016-03-14 07:32:41

by Joonsoo Kim

[permalink] [raw]
Subject: [PATCH 5/6] mm/page_owner: add zone range overlapping check

From: Joonsoo Kim <[email protected]>

There is a system that node's pfn are overlapped like as following.

-----pfn-------->
N0 N1 N2 N0 N1 N2

Therefore, we need to care this overlapping when iterating pfn range.

There are one place in page_owner.c that iterates pfn range and
it doesn't consider this overlapping. Add it.

Without this patch, above system could over count early allocated
page number before page_owner is activated.

Signed-off-by: Joonsoo Kim <[email protected]>
---
mm/page_owner.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/mm/page_owner.c b/mm/page_owner.c
index ac3d8d1..438768c 100644
--- a/mm/page_owner.c
+++ b/mm/page_owner.c
@@ -301,6 +301,9 @@ static void init_pages_in_zone(pg_data_t *pgdat, struct zone *zone)

page = pfn_to_page(pfn);

+ if (page_zone(page) != zone)
+ continue;
+
/*
* We are safe to check buddy flag and order, because
* this is init stage and only single thread runs.
--
1.9.1

2016-03-14 07:32:48

by Joonsoo Kim

[permalink] [raw]
Subject: [PATCH 6/6] power: add zone range overlapping check

From: Joonsoo Kim <[email protected]>

There is a system that node's pfn are overlapped like as following.

-----pfn-------->
N0 N1 N2 N0 N1 N2

Therefore, we need to care this overlapping when iterating pfn range.

mark_free_pages() iterates requested zone's pfn range and unset
all range's bitmap first. And then it marks freepages in a zone
to the bitmap. If there is an overlapping zone, above unset could
clear previous marked bit and reference to this bitmap in the future
will cause the problem. To prevent it, this patch adds a zone check
in mark_free_pages().

Signed-off-by: Joonsoo Kim <[email protected]>
---
mm/page_alloc.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 08d5536..998f636 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2136,6 +2136,10 @@ void mark_free_pages(struct zone *zone)
for (pfn = zone->zone_start_pfn; pfn < max_zone_pfn; pfn++)
if (pfn_valid(pfn)) {
page = pfn_to_page(pfn);
+
+ if (page_zone(page) != zone)
+ continue;
+
if (!swsusp_page_is_forbidden(page))
swsusp_unset_page_free(page);
}
--
1.9.1

2016-03-14 07:33:25

by Joonsoo Kim

[permalink] [raw]
Subject: [PATCH 2/6] mm/hugetlb: add same zone check in pfn_range_valid_gigantic()

From: Joonsoo Kim <[email protected]>

alloc_gigantic_page() uses alloc_contig_range() and this
requires that requested range is in a single zone. To satisfy
that requirement, add this check to pfn_range_valid_gigantic().

Signed-off-by: Joonsoo Kim <[email protected]>
---
mm/hugetlb.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 06058ea..daceeb5 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1030,8 +1030,8 @@ static int __alloc_gigantic_page(unsigned long start_pfn,
return alloc_contig_range(start_pfn, end_pfn, MIGRATE_MOVABLE);
}

-static bool pfn_range_valid_gigantic(unsigned long start_pfn,
- unsigned long nr_pages)
+static bool pfn_range_valid_gigantic(struct zone *z,
+ unsigned long start_pfn, unsigned long nr_pages)
{
unsigned long i, end_pfn = start_pfn + nr_pages;
struct page *page;
@@ -1042,6 +1042,9 @@ static bool pfn_range_valid_gigantic(unsigned long start_pfn,

page = pfn_to_page(i);

+ if (page_zone(page) != z)
+ return false;
+
if (PageReserved(page))
return false;

@@ -1074,7 +1077,7 @@ static struct page *alloc_gigantic_page(int nid, unsigned int order)

pfn = ALIGN(z->zone_start_pfn, nr_pages);
while (zone_spans_last_pfn(z, pfn, nr_pages)) {
- if (pfn_range_valid_gigantic(pfn, nr_pages)) {
+ if (pfn_range_valid_gigantic(z, pfn, nr_pages)) {
/*
* We release the zone lock here because
* alloc_contig_range() will also lock the zone
--
1.9.1

2016-03-21 11:37:36

by Mel Gorman

[permalink] [raw]
Subject: Re: [PATCH 1/6] mm/page_alloc: fix same zone check in __pageblock_pfn_to_page()

On Mon, Mar 14, 2016 at 04:31:32PM +0900, [email protected] wrote:
> From: Joonsoo Kim <[email protected]>
>
> There is a system that node's pfn are overlapped like as following.
>
> -----pfn-------->
> N0 N1 N2 N0 N1 N2
>
> Therefore, we need to care this overlapping when iterating pfn range.
>
> In __pageblock_pfn_to_page(), there is a check for this but it's
> not sufficient. This check cannot distinguish the case that zone id
> is the same but node id is different. This patch fixes it.
>

I think you may be mixing up page_zone_id with page_zonenum.

--
Mel Gorman
SUSE Labs

2016-03-22 05:01:17

by Joonsoo Kim

[permalink] [raw]
Subject: Re: [PATCH 1/6] mm/page_alloc: fix same zone check in __pageblock_pfn_to_page()

On Mon, Mar 21, 2016 at 11:37:19AM +0000, Mel Gorman wrote:
> On Mon, Mar 14, 2016 at 04:31:32PM +0900, [email protected] wrote:
> > From: Joonsoo Kim <[email protected]>
> >
> > There is a system that node's pfn are overlapped like as following.
> >
> > -----pfn-------->
> > N0 N1 N2 N0 N1 N2
> >
> > Therefore, we need to care this overlapping when iterating pfn range.
> >
> > In __pageblock_pfn_to_page(), there is a check for this but it's
> > not sufficient. This check cannot distinguish the case that zone id
> > is the same but node id is different. This patch fixes it.
> >
>
> I think you may be mixing up page_zone_id with page_zonenum.

Oops... Indeed.

I will drop the patch. Thanks for catching it. :)

Thanks.

2016-03-23 14:38:23

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH 2/6] mm/hugetlb: add same zone check in pfn_range_valid_gigantic()

On 03/14/2016 08:31 AM, [email protected] wrote:
> From: Joonsoo Kim <[email protected]>
>
> alloc_gigantic_page() uses alloc_contig_range() and this
> requires that requested range is in a single zone. To satisfy
> that requirement, add this check to pfn_range_valid_gigantic().
>
> Signed-off-by: Joonsoo Kim <[email protected]>

Acked-by: Vlastimil Babka <[email protected]>

2016-03-23 14:38:48

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH 3/6] mm/memory_hotplug: add comment to some functions related to memory hotplug

On 03/14/2016 08:31 AM, [email protected] wrote:
> From: Joonsoo Kim <[email protected]>
>
> __offline_isolated_pages() and test_pages_isolated() are used by memory
> hotplug. These functions require that range is in a single zone but
> there is no code about it because memory hotplug checks it before calling
> these functions. Not to confuse future user of these functions,
> this patch adds comment on them.
>
> Signed-off-by: Joonsoo Kim <[email protected]>

Acked-by: Vlastimil Babka <[email protected]>

2016-03-23 14:47:51

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH 4/6] mm/vmstat: add zone range overlapping check

On 03/14/2016 08:31 AM, [email protected] wrote:
> From: Joonsoo Kim <[email protected]>
>
> There is a system that node's pfn are overlapped like as following.
>
> -----pfn-------->
> N0 N1 N2 N0 N1 N2
>
> Therefore, we need to care this overlapping when iterating pfn range.
>
> There are two places in vmstat.c that iterates pfn range and
> they don't consider this overlapping. Add it.
>
> Without this patch, above system could over count pageblock number
> on a zone.
>
> Signed-off-by: Joonsoo Kim <[email protected]>
> ---
> mm/vmstat.c | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> diff --git a/mm/vmstat.c b/mm/vmstat.c
> index 5e43004..0a726e3 100644
> --- a/mm/vmstat.c
> +++ b/mm/vmstat.c
> @@ -1010,6 +1010,9 @@ static void pagetypeinfo_showblockcount_print(struct seq_file *m,
> if (!memmap_valid_within(pfn, page, zone))
> continue;

The above already does this for each page within the block, but it's
guarded by CONFIG_ARCH_HAS_HOLES_MEMORYMODEL. I guess that's not the
case of your system, right?

I guess your added check should go above this, though. Also what about
employing pageblock_pfn_to_page() here and in all other applicable
places, so it's unified and optimized by zone->contiguous?

>
> + if (page_zone(page) != zone)
> + continue;
> +
> mtype = get_pageblock_migratetype(page);
>
> if (mtype < MIGRATE_TYPES)
> @@ -1076,6 +1079,10 @@ static void pagetypeinfo_showmixedcount_print(struct seq_file *m,
> continue;
>
> page = pfn_to_page(pfn);
> +
> + if (page_zone(page) != zone)
> + continue;
> +
> if (PageBuddy(page)) {
> pfn += (1UL << page_order(page)) - 1;
> continue;
>

2016-03-24 00:06:57

by Joonsoo Kim

[permalink] [raw]
Subject: Re: [PATCH 4/6] mm/vmstat: add zone range overlapping check

On Wed, Mar 23, 2016 at 03:47:46PM +0100, Vlastimil Babka wrote:
> On 03/14/2016 08:31 AM, [email protected] wrote:
> >From: Joonsoo Kim <[email protected]>
> >
> >There is a system that node's pfn are overlapped like as following.
> >
> >-----pfn-------->
> >N0 N1 N2 N0 N1 N2
> >
> >Therefore, we need to care this overlapping when iterating pfn range.
> >
> >There are two places in vmstat.c that iterates pfn range and
> >they don't consider this overlapping. Add it.
> >
> >Without this patch, above system could over count pageblock number
> >on a zone.
> >
> >Signed-off-by: Joonsoo Kim <[email protected]>
> >---
> > mm/vmstat.c | 7 +++++++
> > 1 file changed, 7 insertions(+)
> >
> >diff --git a/mm/vmstat.c b/mm/vmstat.c
> >index 5e43004..0a726e3 100644
> >--- a/mm/vmstat.c
> >+++ b/mm/vmstat.c
> >@@ -1010,6 +1010,9 @@ static void pagetypeinfo_showblockcount_print(struct seq_file *m,
> > if (!memmap_valid_within(pfn, page, zone))
> > continue;
>
> The above already does this for each page within the block, but it's
> guarded by CONFIG_ARCH_HAS_HOLES_MEMORYMODEL. I guess that's not the
> case of your system, right?
>
> I guess your added check should go above this, though. Also what
> about employing pageblock_pfn_to_page() here and in all other
> applicable places, so it's unified and optimized by
> zone->contiguous?

Comment on memmap_valid_within() in mmzone.h says that page_zone()
linkages could be broken in that system even if pfn_valid() returns
true. So, we cannot do zone check before it.

In fact, I wonder how that system works fine under the situation where
there are many pfn interators which doesn't check
memmap_valid_within(). I guess there may be enough constraint.
Anyway, I think that it is another issue and would be revisited later.

Thanks.