2017-07-08 01:31:19

by Wei Yang

[permalink] [raw]
Subject: [PATCH V3 0/3] Refine numa_emulation

My previous patch "x86/mm/numa: Remove numa_nodemask_from_meminfo()" hits a
problem in numa_emulation. The reason is numa_nodes_parsed is not set
correctly after emulation.

This patch set tries to fix this and also with two code refine.

Detailed discussions are in this thread:

https://lkml.org/lkml/2017/3/13/1230

and test result is posted :

https://lkml.org/lkml/2017/4/10/641

V3:
* remote the error branch and split the loop into a function
* refine the comment

V2:
* refresh the change log based on David comments
* use nodes_clear()

Wei Yang (3):
x86/numa_emulation: refine the calculation of max_emu_nid and
dfl_phys_nid
x86/numa_emulation: assign physnode_mask directly from
numa_nodes_parsed
x86/numa_emulation: restructures numa_nodes_parsed from emulated nodes

arch/x86/mm/numa_emulation.c | 55 ++++++++++++++++++++++++++------------------
1 file changed, 32 insertions(+), 23 deletions(-)

--
2.11.0


2017-07-08 01:31:24

by Wei Yang

[permalink] [raw]
Subject: [PATCH V3 2/3] x86/numa_emulation: assign physnode_mask directly from numa_nodes_parsed

numa_init() has already called init_func(), which is responsible for
setting numa_nodes_parsed, so use this nodemask instead of re-finding it
when calling numa_emulation().

This patch gets the physnode_mask directly from numa_nodes_parsed. At
the same time, it corrects the comment of these two functions.

Signed-off-by: Wei Yang <[email protected]>
Acked-by: David Rientjes <[email protected]>
Reviewed-by: Borislav Petkov <[email protected]>
---
arch/x86/mm/numa_emulation.c | 18 ++++++++----------
1 file changed, 8 insertions(+), 10 deletions(-)

diff --git a/arch/x86/mm/numa_emulation.c b/arch/x86/mm/numa_emulation.c
index a6d55308660f..80904ede2e7f 100644
--- a/arch/x86/mm/numa_emulation.c
+++ b/arch/x86/mm/numa_emulation.c
@@ -75,13 +75,15 @@ static int __init emu_setup_memblk(struct numa_meminfo *ei,

/*
* Sets up nr_nodes fake nodes interleaved over physical nodes ranging from addr
- * to max_addr. The return value is the number of nodes allocated.
+ * to max_addr.
+ *
+ * Returns zero on success or negative on error.
*/
static int __init split_nodes_interleave(struct numa_meminfo *ei,
struct numa_meminfo *pi,
u64 addr, u64 max_addr, int nr_nodes)
{
- nodemask_t physnode_mask = NODE_MASK_NONE;
+ nodemask_t physnode_mask = numa_nodes_parsed;
u64 size;
int big;
int nid = 0;
@@ -116,9 +118,6 @@ static int __init split_nodes_interleave(struct numa_meminfo *ei,
return -1;
}

- for (i = 0; i < pi->nr_blks; i++)
- node_set(pi->blk[i].nid, physnode_mask);
-
/*
* Continue to fill physical nodes with fake nodes until there is no
* memory left on any of them.
@@ -200,13 +199,15 @@ static u64 __init find_end_of_node(u64 start, u64 max_addr, u64 size)

/*
* Sets up fake nodes of `size' interleaved over physical nodes ranging from
- * `addr' to `max_addr'. The return value is the number of nodes allocated.
+ * `addr' to `max_addr'.
+ *
+ * Returns zero on success or negative on error.
*/
static int __init split_nodes_size_interleave(struct numa_meminfo *ei,
struct numa_meminfo *pi,
u64 addr, u64 max_addr, u64 size)
{
- nodemask_t physnode_mask = NODE_MASK_NONE;
+ nodemask_t physnode_mask = numa_nodes_parsed;
u64 min_size;
int nid = 0;
int i, ret;
@@ -231,9 +232,6 @@ static int __init split_nodes_size_interleave(struct numa_meminfo *ei,
}
size &= FAKE_NODE_MIN_HASH_MASK;

- for (i = 0; i < pi->nr_blks; i++)
- node_set(pi->blk[i].nid, physnode_mask);
-
/*
* Fill physical nodes with fake nodes of size until there is no memory
* left on any of them.
--
2.11.0

2017-07-08 01:31:22

by Wei Yang

[permalink] [raw]
Subject: [PATCH V3 1/3] x86/numa_emulation: refine the calculation of max_emu_nid and dfl_phys_nid

max_emu_nid and dfl_phys_nid is calculated from emu_nid_to_phys[], which is
calculated in split_nodes_xxx_interleave(). From the logic in these
functions, it is assured the emu_nid_to_phys[] has meaningful value if it
return successfully and ensures dfl_phys_nid will get a valid value.

This patch removes the error branch to check invalid dfl_phys_nid and
abstract this part to a function for readability.

Signed-off-by: Wei Yang <[email protected]>
---
arch/x86/mm/numa_emulation.c | 30 +++++++++++++++++-------------
1 file changed, 17 insertions(+), 13 deletions(-)

diff --git a/arch/x86/mm/numa_emulation.c b/arch/x86/mm/numa_emulation.c
index a8f90ce3dedf..a6d55308660f 100644
--- a/arch/x86/mm/numa_emulation.c
+++ b/arch/x86/mm/numa_emulation.c
@@ -280,6 +280,22 @@ static int __init split_nodes_size_interleave(struct numa_meminfo *ei,
return 0;
}

+int __init setup_emu2phys_nid(int *dfl_phys_nid)
+{
+ int i, max_emu_nid = 0;
+
+ *dfl_phys_nid = NUMA_NO_NODE;
+ for (i = 0; i < ARRAY_SIZE(emu_nid_to_phys); i++) {
+ if (emu_nid_to_phys[i] != NUMA_NO_NODE) {
+ max_emu_nid = i;
+ if (*dfl_phys_nid == NUMA_NO_NODE)
+ *dfl_phys_nid = emu_nid_to_phys[i];
+ }
+ }
+
+ return max_emu_nid;
+}
+
/**
* numa_emulation - Emulate NUMA nodes
* @numa_meminfo: NUMA configuration to massage
@@ -376,19 +392,7 @@ void __init numa_emulation(struct numa_meminfo *numa_meminfo, int numa_dist_cnt)
* Determine the max emulated nid and the default phys nid to use
* for unmapped nodes.
*/
- max_emu_nid = 0;
- dfl_phys_nid = NUMA_NO_NODE;
- for (i = 0; i < ARRAY_SIZE(emu_nid_to_phys); i++) {
- if (emu_nid_to_phys[i] != NUMA_NO_NODE) {
- max_emu_nid = i;
- if (dfl_phys_nid == NUMA_NO_NODE)
- dfl_phys_nid = emu_nid_to_phys[i];
- }
- }
- if (dfl_phys_nid == NUMA_NO_NODE) {
- pr_warning("NUMA: Warning: can't determine default physical node, disabling emulation\n");
- goto no_emu;
- }
+ max_emu_nid = setup_emu2phys_nid(&dfl_phys_nid);

/* commit */
*numa_meminfo = ei;
--
2.11.0

2017-07-08 01:31:29

by Wei Yang

[permalink] [raw]
Subject: [PATCH V3 3/3] x86/numa_emulation: restructures numa_nodes_parsed from emulated nodes

By emulating numa, it may contains more or less nodes than physical nodes.
In numa_emulation(), numa_meminfo/numa_distance/__apicid_to_node is
restructured according to emulated nodes, except numa_nodes_parsed.

This patch restructures numa_nodes_parsed from emulated nodes.

Signed-off-by: Wei Yang <[email protected]>
Acked-by: David Rientjes <[email protected]>
---
arch/x86/mm/numa_emulation.c | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/arch/x86/mm/numa_emulation.c b/arch/x86/mm/numa_emulation.c
index 80904ede2e7f..d805162e6045 100644
--- a/arch/x86/mm/numa_emulation.c
+++ b/arch/x86/mm/numa_emulation.c
@@ -395,6 +395,13 @@ void __init numa_emulation(struct numa_meminfo *numa_meminfo, int numa_dist_cnt)
/* commit */
*numa_meminfo = ei;

+ /* Make sure numa_nodes_parsed only contains emulated nodes */
+ nodes_clear(numa_nodes_parsed);
+ for (i = 0; i < ARRAY_SIZE(ei.blk); i++)
+ if (ei.blk[i].start != ei.blk[i].end &&
+ ei.blk[i].nid != NUMA_NO_NODE)
+ node_set(ei.blk[i].nid, numa_nodes_parsed);
+
/*
* Transform __apicid_to_node table to use emulated nids by
* reverse-mapping phys_nid. The maps should always exist but fall
--
2.11.0

Subject: [tip:x86/debug] x86/numa_emulation: Refine the calculation of max_emu_nid and dfl_phys_nid

Commit-ID: 158f424f427e686816bc64cd623e3bfc3390dfb0
Gitweb: http://git.kernel.org/tip/158f424f427e686816bc64cd623e3bfc3390dfb0
Author: Wei Yang <[email protected]>
AuthorDate: Sat, 8 Jul 2017 09:30:57 +0800
Committer: Ingo Molnar <[email protected]>
CommitDate: Tue, 18 Jul 2017 11:16:49 +0200

x86/numa_emulation: Refine the calculation of max_emu_nid and dfl_phys_nid

max_emu_nid and dfl_phys_nid is calculated from emu_nid_to_phys[], which is
calculated in split_nodes_xxx_interleave(). From the logic in these
functions, it is assured the emu_nid_to_phys[] has meaningful value if it
return successfully and ensures dfl_phys_nid will get a valid value.

This patch removes the error branch to check invalid dfl_phys_nid and
abstracts out this part to a function for readability.

Signed-off-by: Wei Yang <[email protected]>
Acked-by: Thomas Gleixner <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/mm/numa_emulation.c | 30 +++++++++++++++++-------------
1 file changed, 17 insertions(+), 13 deletions(-)

diff --git a/arch/x86/mm/numa_emulation.c b/arch/x86/mm/numa_emulation.c
index a8f90ce..a6d5530 100644
--- a/arch/x86/mm/numa_emulation.c
+++ b/arch/x86/mm/numa_emulation.c
@@ -280,6 +280,22 @@ static int __init split_nodes_size_interleave(struct numa_meminfo *ei,
return 0;
}

+int __init setup_emu2phys_nid(int *dfl_phys_nid)
+{
+ int i, max_emu_nid = 0;
+
+ *dfl_phys_nid = NUMA_NO_NODE;
+ for (i = 0; i < ARRAY_SIZE(emu_nid_to_phys); i++) {
+ if (emu_nid_to_phys[i] != NUMA_NO_NODE) {
+ max_emu_nid = i;
+ if (*dfl_phys_nid == NUMA_NO_NODE)
+ *dfl_phys_nid = emu_nid_to_phys[i];
+ }
+ }
+
+ return max_emu_nid;
+}
+
/**
* numa_emulation - Emulate NUMA nodes
* @numa_meminfo: NUMA configuration to massage
@@ -376,19 +392,7 @@ void __init numa_emulation(struct numa_meminfo *numa_meminfo, int numa_dist_cnt)
* Determine the max emulated nid and the default phys nid to use
* for unmapped nodes.
*/
- max_emu_nid = 0;
- dfl_phys_nid = NUMA_NO_NODE;
- for (i = 0; i < ARRAY_SIZE(emu_nid_to_phys); i++) {
- if (emu_nid_to_phys[i] != NUMA_NO_NODE) {
- max_emu_nid = i;
- if (dfl_phys_nid == NUMA_NO_NODE)
- dfl_phys_nid = emu_nid_to_phys[i];
- }
- }
- if (dfl_phys_nid == NUMA_NO_NODE) {
- pr_warning("NUMA: Warning: can't determine default physical node, disabling emulation\n");
- goto no_emu;
- }
+ max_emu_nid = setup_emu2phys_nid(&dfl_phys_nid);

/* commit */
*numa_meminfo = ei;

Subject: [tip:x86/debug] x86/numa_emulation: Assign physnode_mask directly from numa_nodes_parsed

Commit-ID: d80a9eb3c78d7d0c823a8224cd6e3b37ebdfd8cd
Gitweb: http://git.kernel.org/tip/d80a9eb3c78d7d0c823a8224cd6e3b37ebdfd8cd
Author: Wei Yang <[email protected]>
AuthorDate: Sat, 8 Jul 2017 09:30:58 +0800
Committer: Ingo Molnar <[email protected]>
CommitDate: Tue, 18 Jul 2017 11:16:49 +0200

x86/numa_emulation: Assign physnode_mask directly from numa_nodes_parsed

numa_init() has already called init_func(), which is responsible for
setting numa_nodes_parsed, so use this nodemask instead of re-finding it
when calling numa_emulation().

This patch gets the physnode_mask directly from numa_nodes_parsed. At
the same time, it corrects the comment of these two functions.

Signed-off-by: Wei Yang <[email protected]>
Reviewed-by: Borislav Petkov <[email protected]>
Acked-by: Thomas Gleixner <[email protected]>
Acked-by: David Rientjes <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/mm/numa_emulation.c | 18 ++++++++----------
1 file changed, 8 insertions(+), 10 deletions(-)

diff --git a/arch/x86/mm/numa_emulation.c b/arch/x86/mm/numa_emulation.c
index a6d5530..80904ed 100644
--- a/arch/x86/mm/numa_emulation.c
+++ b/arch/x86/mm/numa_emulation.c
@@ -75,13 +75,15 @@ static int __init emu_setup_memblk(struct numa_meminfo *ei,

/*
* Sets up nr_nodes fake nodes interleaved over physical nodes ranging from addr
- * to max_addr. The return value is the number of nodes allocated.
+ * to max_addr.
+ *
+ * Returns zero on success or negative on error.
*/
static int __init split_nodes_interleave(struct numa_meminfo *ei,
struct numa_meminfo *pi,
u64 addr, u64 max_addr, int nr_nodes)
{
- nodemask_t physnode_mask = NODE_MASK_NONE;
+ nodemask_t physnode_mask = numa_nodes_parsed;
u64 size;
int big;
int nid = 0;
@@ -116,9 +118,6 @@ static int __init split_nodes_interleave(struct numa_meminfo *ei,
return -1;
}

- for (i = 0; i < pi->nr_blks; i++)
- node_set(pi->blk[i].nid, physnode_mask);
-
/*
* Continue to fill physical nodes with fake nodes until there is no
* memory left on any of them.
@@ -200,13 +199,15 @@ static u64 __init find_end_of_node(u64 start, u64 max_addr, u64 size)

/*
* Sets up fake nodes of `size' interleaved over physical nodes ranging from
- * `addr' to `max_addr'. The return value is the number of nodes allocated.
+ * `addr' to `max_addr'.
+ *
+ * Returns zero on success or negative on error.
*/
static int __init split_nodes_size_interleave(struct numa_meminfo *ei,
struct numa_meminfo *pi,
u64 addr, u64 max_addr, u64 size)
{
- nodemask_t physnode_mask = NODE_MASK_NONE;
+ nodemask_t physnode_mask = numa_nodes_parsed;
u64 min_size;
int nid = 0;
int i, ret;
@@ -231,9 +232,6 @@ static int __init split_nodes_size_interleave(struct numa_meminfo *ei,
}
size &= FAKE_NODE_MIN_HASH_MASK;

- for (i = 0; i < pi->nr_blks; i++)
- node_set(pi->blk[i].nid, physnode_mask);
-
/*
* Fill physical nodes with fake nodes of size until there is no memory
* left on any of them.

Subject: [tip:x86/debug] x86/numa_emulation: Recalculate numa_nodes_parsed from emulated nodes

Commit-ID: 4f167201edda7cd7525cc7f23944731ef5dd99a8
Gitweb: http://git.kernel.org/tip/4f167201edda7cd7525cc7f23944731ef5dd99a8
Author: Wei Yang <[email protected]>
AuthorDate: Sat, 8 Jul 2017 09:30:59 +0800
Committer: Ingo Molnar <[email protected]>
CommitDate: Tue, 18 Jul 2017 11:16:49 +0200

x86/numa_emulation: Recalculate numa_nodes_parsed from emulated nodes

When emulating NUMA, the kernel's emulated NUMA configuration may contain
more or less nodes than there are physical nodes.

In numa_emulation(), we recalculate numa_meminfo/numa_distance/__apicid_to_node
according to the number of emulated nodes, except numa_nodes_parsed, which is
arguably an omission.

Recalculate numa_nodes_parsed as well.

Signed-off-by: Wei Yang <[email protected]>
Acked-by: Thomas Gleixner <[email protected]>
Acked-by: David Rientjes <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
[ Changelog fixes. ]
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/mm/numa_emulation.c | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/arch/x86/mm/numa_emulation.c b/arch/x86/mm/numa_emulation.c
index 80904ed..d805162 100644
--- a/arch/x86/mm/numa_emulation.c
+++ b/arch/x86/mm/numa_emulation.c
@@ -395,6 +395,13 @@ void __init numa_emulation(struct numa_meminfo *numa_meminfo, int numa_dist_cnt)
/* commit */
*numa_meminfo = ei;

+ /* Make sure numa_nodes_parsed only contains emulated nodes */
+ nodes_clear(numa_nodes_parsed);
+ for (i = 0; i < ARRAY_SIZE(ei.blk); i++)
+ if (ei.blk[i].start != ei.blk[i].end &&
+ ei.blk[i].nid != NUMA_NO_NODE)
+ node_set(ei.blk[i].nid, numa_nodes_parsed);
+
/*
* Transform __apicid_to_node table to use emulated nids by
* reverse-mapping phys_nid. The maps should always exist but fall

2017-07-18 11:04:04

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH V3 1/3] x86/numa_emulation: refine the calculation of max_emu_nid and dfl_phys_nid

On Sat, Jul 08, 2017 at 09:30:57AM +0800, Wei Yang wrote:
> max_emu_nid and dfl_phys_nid is calculated from emu_nid_to_phys[], which is
> calculated in split_nodes_xxx_interleave(). From the logic in these

$ git grep split_nodes_xxx_interleave
$

> functions, it is assured the emu_nid_to_phys[] has meaningful value if it
> return successfully and ensures dfl_phys_nid will get a valid value.
>
> This patch removes the error branch to check invalid dfl_phys_nid and

So the check doesn't hurt anyone.

On the contrary - it is an "assertion" of sorts in otherwise complex
code and actually documents the fact that by then emu_nid_to_phys[]
needs to be setup properly.

And it is especially useful if someone decides to change that code in
the future, for whatever reason, and gets to hit that check - it'll even
be helpful in that case.

So I'd vote for keeping that check and not doing anything.

While we're at it, never say "this patch" in a commit message - that is
tautologically obvious.

--
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--

2017-07-26 03:19:37

by Wei Yang

[permalink] [raw]
Subject: Re: [PATCH V3 1/3] x86/numa_emulation: refine the calculation of max_emu_nid and dfl_phys_nid

On Tue, Jul 18, 2017 at 01:03:39PM +0200, Borislav Petkov wrote:
>On Sat, Jul 08, 2017 at 09:30:57AM +0800, Wei Yang wrote:
>> max_emu_nid and dfl_phys_nid is calculated from emu_nid_to_phys[], which is
>> calculated in split_nodes_xxx_interleave(). From the logic in these
>
>$ git grep split_nodes_xxx_interleave
>$
>
>> functions, it is assured the emu_nid_to_phys[] has meaningful value if it
>> return successfully and ensures dfl_phys_nid will get a valid value.
>>
>> This patch removes the error branch to check invalid dfl_phys_nid and
>
>So the check doesn't hurt anyone.
>
>On the contrary - it is an "assertion" of sorts in otherwise complex
>code and actually documents the fact that by then emu_nid_to_phys[]
>needs to be setup properly.
>
>And it is especially useful if someone decides to change that code in
>the future, for whatever reason, and gets to hit that check - it'll even
>be helpful in that case.
>
>So I'd vote for keeping that check and not doing anything.
>
>While we're at it, never say "this patch" in a commit message - that is
>tautologically obvious.
>

Hi, Borislav

Thanks for your comment, I will this in my mind.

>--
>Regards/Gruss,
> Boris.
>
>ECO tip #101: Trim your mails when you reply.
>--

--
Wei Yang
Help you, Help me


Attachments:
(No filename) (1.24 kB)
signature.asc (819.00 B)
Download all attachments