2017-06-01 08:37:55

by Michal Hocko

[permalink] [raw]
Subject: [PATCH 0/2] memory hotplug follow up fixes

Hi Andrew,
Heiko Carstens has noticed an unexpected memory online behavior for the
default onlining (aka MMOP_ONLINE_KEEP) and also online_kernel to other
kernel zones than ZONE_NORMAL. Both fixes are rather straightforward
so could you add them to the memory hotplug rewrite pile please?

Shortlog
Michal Hocko (2):
mm, memory_hotplug: fix MMOP_ONLINE_KEEP behavior
mm, memory_hotplug: do not assume ZONE_NORMAL is default kernel zone

Diffstat
drivers/base/memory.c | 2 +-
include/linux/memory_hotplug.h | 2 ++
mm/memory_hotplug.c | 36 +++++++++++++++++++++++++++++-------
3 files changed, 32 insertions(+), 8 deletions(-)


2017-06-01 08:37:59

by Michal Hocko

[permalink] [raw]
Subject: [PATCH 2/2] mm, memory_hotplug: do not assume ZONE_NORMAL is default kernel zone

From: Michal Hocko <[email protected]>

Heiko Carstens has noticed that he can generate overlapping zones for
ZONE_DMA and ZONE_NORMAL:
DMA [mem 0x0000000000000000-0x000000007fffffff]
Normal [mem 0x0000000080000000-0x000000017fffffff]

$ cat /sys/devices/system/memory/block_size_bytes
10000000
$ cat /sys/devices/system/memory/memory5/valid_zones
DMA
$ echo 0 > /sys/devices/system/memory/memory5/online
$ cat /sys/devices/system/memory/memory5/valid_zones
Normal
$ echo 1 > /sys/devices/system/memory/memory5/online
Normal

$ cat /proc/zoneinfo
Node 0, zone DMA
spanned 524288 <-----
present 458752
managed 455078
start_pfn: 0 <-----

Node 0, zone Normal
spanned 720896
present 589824
managed 571648
start_pfn: 327680 <-----

The reason is that we assume that the default zone for kernel onlining
is ZONE_NORMAL. This was a simplification introduced by the memory
hotplug rework and it is easily fixable by checking the range overlap in
the zone order and considering the first matching zone as the default
one. If there is no such zone then assume ZONE_NORMAL as we have been
doing so far.

Fixes: "mm, memory_hotplug: do not associate hotadded memory to zones until online"
Reported-by: Heiko Carstens <[email protected]>
Tested-by: Heiko Carstens <[email protected]>
Signed-off-by: Michal Hocko <[email protected]>
---
drivers/base/memory.c | 2 +-
include/linux/memory_hotplug.h | 2 ++
mm/memory_hotplug.c | 27 ++++++++++++++++++++++++---
3 files changed, 27 insertions(+), 4 deletions(-)

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index b86fda30ce62..c7c4e0325cdb 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -419,7 +419,7 @@ static ssize_t show_valid_zones(struct device *dev,

nid = pfn_to_nid(start_pfn);
if (allow_online_pfn_range(nid, start_pfn, nr_pages, MMOP_ONLINE_KERNEL)) {
- strcat(buf, NODE_DATA(nid)->node_zones[ZONE_NORMAL].name);
+ strcat(buf, default_zone_for_pfn(nid, start_pfn, nr_pages)->name);
append = true;
}

diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 9e0249d0f5e4..ed167541e4fc 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -309,4 +309,6 @@ extern struct page *sparse_decode_mem_map(unsigned long coded_mem_map,
unsigned long pnum);
extern bool allow_online_pfn_range(int nid, unsigned long pfn, unsigned long nr_pages,
int online_type);
+extern struct zone *default_zone_for_pfn(int nid, unsigned long pfn,
+ unsigned long nr_pages);
#endif /* __LINUX_MEMORY_HOTPLUG_H */
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index b3895fd609f4..a0348de3e18c 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -858,7 +858,7 @@ bool allow_online_pfn_range(int nid, unsigned long pfn, unsigned long nr_pages,
{
struct pglist_data *pgdat = NODE_DATA(nid);
struct zone *movable_zone = &pgdat->node_zones[ZONE_MOVABLE];
- struct zone *normal_zone = &pgdat->node_zones[ZONE_NORMAL];
+ struct zone *default_zone = default_zone_for_pfn(nid, pfn, nr_pages);

/*
* TODO there shouldn't be any inherent reason to have ZONE_NORMAL
@@ -872,7 +872,7 @@ bool allow_online_pfn_range(int nid, unsigned long pfn, unsigned long nr_pages,
return true;
return movable_zone->zone_start_pfn >= pfn + nr_pages;
} else if (online_type == MMOP_ONLINE_MOVABLE) {
- return zone_end_pfn(normal_zone) <= pfn;
+ return zone_end_pfn(default_zone) <= pfn;
}

/* MMOP_ONLINE_KEEP will always succeed and inherits the current zone */
@@ -938,6 +938,27 @@ void __ref move_pfn_range_to_zone(struct zone *zone,
}

/*
+ * Returns a default kernel memory zone for the given pfn range.
+ * If no kernel zone covers this pfn range it will automatically go
+ * to the ZONE_NORMAL.
+ */
+struct zone *default_zone_for_pfn(int nid, unsigned long start_pfn,
+ unsigned long nr_pages)
+{
+ struct pglist_data *pgdat = NODE_DATA(nid);
+ int zid;
+
+ for (zid = 0; zid <= ZONE_NORMAL; zid++) {
+ struct zone *zone = &pgdat->node_zones[zid];
+
+ if (zone_intersects(zone, start_pfn, nr_pages))
+ return zone;
+ }
+
+ return &pgdat->node_zones[ZONE_NORMAL];
+}
+
+/*
* Associates the given pfn range with the given node and the zone appropriate
* for the given online type.
*/
@@ -945,7 +966,7 @@ static struct zone * __meminit move_pfn_range(int online_type, int nid,
unsigned long start_pfn, unsigned long nr_pages)
{
struct pglist_data *pgdat = NODE_DATA(nid);
- struct zone *zone = &pgdat->node_zones[ZONE_NORMAL];
+ struct zone *zone = default_zone_for_pfn(nid, start_pfn, nr_pages);

if (online_type == MMOP_ONLINE_KEEP) {
struct zone *movable_zone = &pgdat->node_zones[ZONE_MOVABLE];
--
2.11.0

2017-06-01 08:38:21

by Michal Hocko

[permalink] [raw]
Subject: [PATCH 1/2] mm, memory_hotplug: fix MMOP_ONLINE_KEEP behavior

From: Michal Hocko <[email protected]>

Heiko Carstens has noticed that the MMOP_ONLINE_KEEP is broken currently
$ grep . memory3?/valid_zones
memory34/valid_zones:Normal Movable
memory35/valid_zones:Normal Movable
memory36/valid_zones:Normal Movable
memory37/valid_zones:Normal Movable

$ echo online_movable > memory34/state
$ grep . memory3?/valid_zones
memory34/valid_zones:Movable
memory35/valid_zones:Movable
memory36/valid_zones:Movable
memory37/valid_zones:Movable

$ echo online > memory36/state
$ grep . memory3?/valid_zones
memory34/valid_zones:Movable
memory36/valid_zones:Normal
memory37/valid_zones:Movable

so we have effectivelly punched a hole into the movable zone. The
problem is that move_pfn_range() check for MMOP_ONLINE_KEEP is wrong.
It only checks whether the given range is already part of the movable
zone which is not the case here as only memory34 is in the zone. Fix
this by using allow_online_pfn_range(..., MMOP_ONLINE_KERNEL) if that
is false then we can be sure that movable onlining is the right thing to
do.

Reported-by: Heiko Carstens <[email protected]>
Tested-by: Heiko Carstens <[email protected]>
Fixes: "mm, memory_hotplug: do not associate hotadded memory to zones until online"
Signed-off-by: Michal Hocko <[email protected]>
---
mm/memory_hotplug.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 0a895df2397e..b3895fd609f4 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -950,11 +950,12 @@ static struct zone * __meminit move_pfn_range(int online_type, int nid,
if (online_type == MMOP_ONLINE_KEEP) {
struct zone *movable_zone = &pgdat->node_zones[ZONE_MOVABLE];
/*
- * MMOP_ONLINE_KEEP inherits the current zone which is
- * ZONE_NORMAL by default but we might be within ZONE_MOVABLE
- * already.
+ * MMOP_ONLINE_KEEP defaults to MMOP_ONLINE_KERNEL but use
+ * movable zone if that is not possible (e.g. we are within
+ * or past the existing movable zone)
*/
- if (zone_intersects(movable_zone, start_pfn, nr_pages))
+ if (!allow_online_pfn_range(nid, start_pfn, nr_pages,
+ MMOP_ONLINE_KERNEL))
zone = movable_zone;
} else if (online_type == MMOP_ONLINE_MOVABLE) {
zone = &pgdat->node_zones[ZONE_MOVABLE];
--
2.11.0

2017-06-01 12:32:48

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH 1/2] mm, memory_hotplug: fix MMOP_ONLINE_KEEP behavior

On 06/01/2017 10:37 AM, Michal Hocko wrote:
> From: Michal Hocko <[email protected]>
>
> Heiko Carstens has noticed that the MMOP_ONLINE_KEEP is broken currently
> $ grep . memory3?/valid_zones
> memory34/valid_zones:Normal Movable
> memory35/valid_zones:Normal Movable
> memory36/valid_zones:Normal Movable
> memory37/valid_zones:Normal Movable
>
> $ echo online_movable > memory34/state
> $ grep . memory3?/valid_zones
> memory34/valid_zones:Movable
> memory35/valid_zones:Movable
> memory36/valid_zones:Movable
> memory37/valid_zones:Movable
>
> $ echo online > memory36/state
> $ grep . memory3?/valid_zones
> memory34/valid_zones:Movable
> memory36/valid_zones:Normal
> memory37/valid_zones:Movable
>
> so we have effectivelly punched a hole into the movable zone. The
> problem is that move_pfn_range() check for MMOP_ONLINE_KEEP is wrong.
> It only checks whether the given range is already part of the movable
> zone which is not the case here as only memory34 is in the zone. Fix
> this by using allow_online_pfn_range(..., MMOP_ONLINE_KERNEL) if that
> is false then we can be sure that movable onlining is the right thing to
> do.
>
> Reported-by: Heiko Carstens <[email protected]>
> Tested-by: Heiko Carstens <[email protected]>
> Fixes: "mm, memory_hotplug: do not associate hotadded memory to zones until online"

Just fold it there before sending to Linus, right?

> Signed-off-by: Michal Hocko <[email protected]>

Acked-by: Vlastimil Babka <[email protected]>

> ---
> mm/memory_hotplug.c | 9 +++++----
> 1 file changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 0a895df2397e..b3895fd609f4 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -950,11 +950,12 @@ static struct zone * __meminit move_pfn_range(int online_type, int nid,
> if (online_type == MMOP_ONLINE_KEEP) {
> struct zone *movable_zone = &pgdat->node_zones[ZONE_MOVABLE];
> /*
> - * MMOP_ONLINE_KEEP inherits the current zone which is
> - * ZONE_NORMAL by default but we might be within ZONE_MOVABLE
> - * already.
> + * MMOP_ONLINE_KEEP defaults to MMOP_ONLINE_KERNEL but use
> + * movable zone if that is not possible (e.g. we are within
> + * or past the existing movable zone)
> */
> - if (zone_intersects(movable_zone, start_pfn, nr_pages))
> + if (!allow_online_pfn_range(nid, start_pfn, nr_pages,
> + MMOP_ONLINE_KERNEL))
> zone = movable_zone;
> } else if (online_type == MMOP_ONLINE_MOVABLE) {
> zone = &pgdat->node_zones[ZONE_MOVABLE];
>

2017-06-01 12:40:27

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH 1/2] mm, memory_hotplug: fix MMOP_ONLINE_KEEP behavior

On Thu 01-06-17 14:32:42, Vlastimil Babka wrote:
> On 06/01/2017 10:37 AM, Michal Hocko wrote:
> > From: Michal Hocko <[email protected]>
> >
> > Heiko Carstens has noticed that the MMOP_ONLINE_KEEP is broken currently
> > $ grep . memory3?/valid_zones
> > memory34/valid_zones:Normal Movable
> > memory35/valid_zones:Normal Movable
> > memory36/valid_zones:Normal Movable
> > memory37/valid_zones:Normal Movable
> >
> > $ echo online_movable > memory34/state
> > $ grep . memory3?/valid_zones
> > memory34/valid_zones:Movable
> > memory35/valid_zones:Movable
> > memory36/valid_zones:Movable
> > memory37/valid_zones:Movable
> >
> > $ echo online > memory36/state
> > $ grep . memory3?/valid_zones
> > memory34/valid_zones:Movable
> > memory36/valid_zones:Normal
> > memory37/valid_zones:Movable
> >
> > so we have effectivelly punched a hole into the movable zone. The
> > problem is that move_pfn_range() check for MMOP_ONLINE_KEEP is wrong.
> > It only checks whether the given range is already part of the movable
> > zone which is not the case here as only memory34 is in the zone. Fix
> > this by using allow_online_pfn_range(..., MMOP_ONLINE_KERNEL) if that
> > is false then we can be sure that movable onlining is the right thing to
> > do.
> >
> > Reported-by: Heiko Carstens <[email protected]>
> > Tested-by: Heiko Carstens <[email protected]>
> > Fixes: "mm, memory_hotplug: do not associate hotadded memory to zones until online"
>
> Just fold it there before sending to Linus, right?

I do not have a strong preference. The changelog could still be helpful
for reference. The original patch is quite large and details like this
are likely to get lost there.

>
> > Signed-off-by: Michal Hocko <[email protected]>
>
> Acked-by: Vlastimil Babka <[email protected]>

Thanks!
--
Michal Hocko
SUSE Labs

2017-06-01 12:57:40

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH 2/2] mm, memory_hotplug: do not assume ZONE_NORMAL is default kernel zone

On 06/01/2017 10:37 AM, Michal Hocko wrote:
> From: Michal Hocko <[email protected]>
>
> Heiko Carstens has noticed that he can generate overlapping zones for
> ZONE_DMA and ZONE_NORMAL:
> DMA [mem 0x0000000000000000-0x000000007fffffff]
> Normal [mem 0x0000000080000000-0x000000017fffffff]
>
> $ cat /sys/devices/system/memory/block_size_bytes
> 10000000
> $ cat /sys/devices/system/memory/memory5/valid_zones
> DMA
> $ echo 0 > /sys/devices/system/memory/memory5/online
> $ cat /sys/devices/system/memory/memory5/valid_zones
> Normal
> $ echo 1 > /sys/devices/system/memory/memory5/online
> Normal
>
> $ cat /proc/zoneinfo
> Node 0, zone DMA
> spanned 524288 <-----
> present 458752
> managed 455078
> start_pfn: 0 <-----
>
> Node 0, zone Normal
> spanned 720896
> present 589824
> managed 571648
> start_pfn: 327680 <-----
>
> The reason is that we assume that the default zone for kernel onlining
> is ZONE_NORMAL. This was a simplification introduced by the memory
> hotplug rework and it is easily fixable by checking the range overlap in
> the zone order and considering the first matching zone as the default
> one. If there is no such zone then assume ZONE_NORMAL as we have been
> doing so far.
>
> Fixes: "mm, memory_hotplug: do not associate hotadded memory to zones until online"
> Reported-by: Heiko Carstens <[email protected]>
> Tested-by: Heiko Carstens <[email protected]>
> Signed-off-by: Michal Hocko <[email protected]>

Acked-by: Vlastimil Babka <[email protected]>

2017-06-22 02:32:48

by Wei Yang

[permalink] [raw]
Subject: Re: [PATCH 2/2] mm, memory_hotplug: do not assume ZONE_NORMAL is default kernel zone

On Thu, Jun 01, 2017 at 10:37:46AM +0200, Michal Hocko wrote:
>From: Michal Hocko <[email protected]>
>
>Heiko Carstens has noticed that he can generate overlapping zones for
>ZONE_DMA and ZONE_NORMAL:
>DMA [mem 0x0000000000000000-0x000000007fffffff]
>Normal [mem 0x0000000080000000-0x000000017fffffff]
>
>$ cat /sys/devices/system/memory/block_size_bytes
>10000000
>$ cat /sys/devices/system/memory/memory5/valid_zones
>DMA
>$ echo 0 > /sys/devices/system/memory/memory5/online
>$ cat /sys/devices/system/memory/memory5/valid_zones
>Normal
>$ echo 1 > /sys/devices/system/memory/memory5/online
>Normal
>
>$ cat /proc/zoneinfo
>Node 0, zone DMA
>spanned 524288 <-----
>present 458752
>managed 455078
>start_pfn: 0 <-----
>
>Node 0, zone Normal
>spanned 720896
>present 589824
>managed 571648
>start_pfn: 327680 <-----
>
>The reason is that we assume that the default zone for kernel onlining
>is ZONE_NORMAL. This was a simplification introduced by the memory
>hotplug rework and it is easily fixable by checking the range overlap in
>the zone order and considering the first matching zone as the default
>one. If there is no such zone then assume ZONE_NORMAL as we have been
>doing so far.
>
>Fixes: "mm, memory_hotplug: do not associate hotadded memory to zones until online"
>Reported-by: Heiko Carstens <[email protected]>
>Tested-by: Heiko Carstens <[email protected]>
>Signed-off-by: Michal Hocko <[email protected]>
>---
> drivers/base/memory.c | 2 +-
> include/linux/memory_hotplug.h | 2 ++
> mm/memory_hotplug.c | 27 ++++++++++++++++++++++++---
> 3 files changed, 27 insertions(+), 4 deletions(-)
>
>diff --git a/drivers/base/memory.c b/drivers/base/memory.c
>index b86fda30ce62..c7c4e0325cdb 100644
>--- a/drivers/base/memory.c
>+++ b/drivers/base/memory.c
>@@ -419,7 +419,7 @@ static ssize_t show_valid_zones(struct device *dev,
>
> nid = pfn_to_nid(start_pfn);
> if (allow_online_pfn_range(nid, start_pfn, nr_pages, MMOP_ONLINE_KERNEL)) {
>- strcat(buf, NODE_DATA(nid)->node_zones[ZONE_NORMAL].name);
>+ strcat(buf, default_zone_for_pfn(nid, start_pfn, nr_pages)->name);
> append = true;
> }
>
>diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
>index 9e0249d0f5e4..ed167541e4fc 100644
>--- a/include/linux/memory_hotplug.h
>+++ b/include/linux/memory_hotplug.h
>@@ -309,4 +309,6 @@ extern struct page *sparse_decode_mem_map(unsigned long coded_mem_map,
> unsigned long pnum);
> extern bool allow_online_pfn_range(int nid, unsigned long pfn, unsigned long nr_pages,
> int online_type);
>+extern struct zone *default_zone_for_pfn(int nid, unsigned long pfn,
>+ unsigned long nr_pages);
> #endif /* __LINUX_MEMORY_HOTPLUG_H */
>diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>index b3895fd609f4..a0348de3e18c 100644
>--- a/mm/memory_hotplug.c
>+++ b/mm/memory_hotplug.c
>@@ -858,7 +858,7 @@ bool allow_online_pfn_range(int nid, unsigned long pfn, unsigned long nr_pages,
> {
> struct pglist_data *pgdat = NODE_DATA(nid);
> struct zone *movable_zone = &pgdat->node_zones[ZONE_MOVABLE];
>- struct zone *normal_zone = &pgdat->node_zones[ZONE_NORMAL];
>+ struct zone *default_zone = default_zone_for_pfn(nid, pfn, nr_pages);
>
> /*
> * TODO there shouldn't be any inherent reason to have ZONE_NORMAL
>@@ -872,7 +872,7 @@ bool allow_online_pfn_range(int nid, unsigned long pfn, unsigned long nr_pages,
> return true;
> return movable_zone->zone_start_pfn >= pfn + nr_pages;
> } else if (online_type == MMOP_ONLINE_MOVABLE) {
>- return zone_end_pfn(normal_zone) <= pfn;
>+ return zone_end_pfn(default_zone) <= pfn;
> }
>
> /* MMOP_ONLINE_KEEP will always succeed and inherits the current zone */
>@@ -938,6 +938,27 @@ void __ref move_pfn_range_to_zone(struct zone *zone,
> }
>
> /*
>+ * Returns a default kernel memory zone for the given pfn range.
>+ * If no kernel zone covers this pfn range it will automatically go
>+ * to the ZONE_NORMAL.
>+ */
>+struct zone *default_zone_for_pfn(int nid, unsigned long start_pfn,
>+ unsigned long nr_pages)
>+{
>+ struct pglist_data *pgdat = NODE_DATA(nid);
>+ int zid;
>+
>+ for (zid = 0; zid <= ZONE_NORMAL; zid++) {
>+ struct zone *zone = &pgdat->node_zones[zid];
>+
>+ if (zone_intersects(zone, start_pfn, nr_pages))
>+ return zone;
>+ }
>+
>+ return &pgdat->node_zones[ZONE_NORMAL];
>+}

Hmm... a corner case jumped into my mind which may invalidate this
calculation.

The case is:


Zone: | DMA | DMA32 | NORMAL |
v v v v

Phy mem: [ ] [ ]

^ ^ ^ ^
Node: | Node0 | | Node1 |
A B C D


The key point is
1. There is a hole between Node0 and Node1
2. The hole sits in a non-normal zone

Let's mark the boundary as A, B, C, D. Then we would have
node0->zone[dma21] = [A, B]
node1->zone[dma32] = [C, D]

If we want to hotplug a range in [B, C] on node0, it looks not that bad. While
if we want to hotplug a range in [B, C] on node1, it will introduce the
overlapped zone. Because the range [B, C] intersects none of the existing
zones on node1.

Do you think this is possible?

>+
>+/*
> * Associates the given pfn range with the given node and the zone appropriate
> * for the given online type.
> */
>@@ -945,7 +966,7 @@ static struct zone * __meminit move_pfn_range(int online_type, int nid,
> unsigned long start_pfn, unsigned long nr_pages)
> {
> struct pglist_data *pgdat = NODE_DATA(nid);
>- struct zone *zone = &pgdat->node_zones[ZONE_NORMAL];
>+ struct zone *zone = default_zone_for_pfn(nid, start_pfn, nr_pages);
>
> if (online_type == MMOP_ONLINE_KEEP) {
> struct zone *movable_zone = &pgdat->node_zones[ZONE_MOVABLE];
>--
>2.11.0

--
Wei Yang
Help you, Help me


Attachments:
(No filename) (5.81 kB)
signature.asc (819.00 B)
Download all attachments

2017-06-22 18:17:03

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH 2/2] mm, memory_hotplug: do not assume ZONE_NORMAL is default kernel zone

[Again, please try to trim your quoted response to the minimum]

On Thu 22-06-17 10:32:43, Wei Yang wrote:
> On Thu, Jun 01, 2017 at 10:37:46AM +0200, Michal Hocko wrote:
[...]
> >@@ -938,6 +938,27 @@ void __ref move_pfn_range_to_zone(struct zone *zone,
> > }
> >
> > /*
> >+ * Returns a default kernel memory zone for the given pfn range.
> >+ * If no kernel zone covers this pfn range it will automatically go
> >+ * to the ZONE_NORMAL.
> >+ */
> >+struct zone *default_zone_for_pfn(int nid, unsigned long start_pfn,
> >+ unsigned long nr_pages)
> >+{
> >+ struct pglist_data *pgdat = NODE_DATA(nid);
> >+ int zid;
> >+
> >+ for (zid = 0; zid <= ZONE_NORMAL; zid++) {
> >+ struct zone *zone = &pgdat->node_zones[zid];
> >+
> >+ if (zone_intersects(zone, start_pfn, nr_pages))
> >+ return zone;
> >+ }
> >+
> >+ return &pgdat->node_zones[ZONE_NORMAL];
> >+}
>
> Hmm... a corner case jumped into my mind which may invalidate this
> calculation.
>
> The case is:
>
>
> Zone: | DMA | DMA32 | NORMAL |
> v v v v
>
> Phy mem: [ ] [ ]
>
> ^ ^ ^ ^
> Node: | Node0 | | Node1 |
> A B C D
>
>
> The key point is
> 1. There is a hole between Node0 and Node1
> 2. The hole sits in a non-normal zone
>
> Let's mark the boundary as A, B, C, D. Then we would have
> node0->zone[dma21] = [A, B]
> node1->zone[dma32] = [C, D]
>
> If we want to hotplug a range in [B, C] on node0, it looks not that bad. While
> if we want to hotplug a range in [B, C] on node1, it will introduce the
> overlapped zone. Because the range [B, C] intersects none of the existing
> zones on node1.
>
> Do you think this is possible?

Yes, it is possible. I would be much more more surprised if it was real
as well. Fixing that would require to use arch_zone_{lowest,highest}_possible_pfn
which is not available after init section disappears and I am not even
sure we should care. I would rather wait for a real life example of such
a configuration to fix it.
--
Michal Hocko
SUSE Labs

2017-06-23 01:37:27

by Wei Yang

[permalink] [raw]
Subject: Re: [PATCH 2/2] mm, memory_hotplug: do not assume ZONE_NORMAL is default kernel zone

On Thu, Jun 22, 2017 at 08:16:57PM +0200, Michal Hocko wrote:
>>
>> Hmm... a corner case jumped into my mind which may invalidate this
>> calculation.
>>
>> The case is:
>>
>>
>> Zone: | DMA | DMA32 | NORMAL |
>> v v v v
>>
>> Phy mem: [ ] [ ]
>>
>> ^ ^ ^ ^
>> Node: | Node0 | | Node1 |
>> A B C D
>>
>>
>> The key point is
>> 1. There is a hole between Node0 and Node1
>> 2. The hole sits in a non-normal zone
>>
>> Let's mark the boundary as A, B, C, D. Then we would have
>> node0->zone[dma21] = [A, B]
>> node1->zone[dma32] = [C, D]
>>
>> If we want to hotplug a range in [B, C] on node0, it looks not that bad. While
>> if we want to hotplug a range in [B, C] on node1, it will introduce the
>> overlapped zone. Because the range [B, C] intersects none of the existing
>> zones on node1.
>>
>> Do you think this is possible?
>
>Yes, it is possible. I would be much more more surprised if it was real
>as well. Fixing that would require to use arch_zone_{lowest,highest}_possible_pfn
>which is not available after init section disappears and I am not even
>sure we should care. I would rather wait for a real life example of such
>a configuration to fix it.

Yep, not easy to fix, so wait for real case.

Or possible to add a line in commit log?

>--
>Michal Hocko
>SUSE Labs

--
Wei Yang
Help you, Help me


Attachments:
(No filename) (1.54 kB)
signature.asc (819.00 B)
Download all attachments