2023-07-11 20:40:22

by Steve Wahl

[permalink] [raw]
Subject: [PATCH] x86/platform/uv: Abort UV initialization when reduced nr_cpus requires it

When nr_cpus is set to a smaller number than actually present, there
is some node-to-socket mapping info we won't get access to in
build_socket_tables(). This could later result in using a -1 value
for some array indexing, and eventual kernel page faults.

To avoid this, if any unfilled table entries are found, print a
warning message, and resume initializing, acting as if this is not a
UV system. UV features will be unavailable, but we will not cause
kernel dumps.

This is a condition we expect only in platform debugging situations,
not in day-to-day operation.

Fixes: 8a50c5851927 ("x86/platform/uv: UV support for sub-NUMA clustering")
Signed-off-by: Steve Wahl <[email protected]>
---
arch/x86/kernel/apic/x2apic_uv_x.c | 58 ++++++++++++++++++++++++------
1 file changed, 47 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kernel/apic/x2apic_uv_x.c b/arch/x86/kernel/apic/x2apic_uv_x.c
index d9384d5b4b8e..8cf3f61b0000 100644
--- a/arch/x86/kernel/apic/x2apic_uv_x.c
+++ b/arch/x86/kernel/apic/x2apic_uv_x.c
@@ -1567,11 +1567,13 @@ static void __init free_1_to_1_table(unsigned short **tp, char *tname, int min,
* If the number of nodes is >1 per socket, socket to node table will
* contain lowest node number on that socket.
*/
-static void __init build_socket_tables(void)
+static int __init build_socket_tables(void)
{
struct uv_gam_range_entry *gre = uv_gre_table;
int nums, numn, nump;
- int cpu, i, lnid;
+ int cpu, i, lnid, nid;
+ int sockid;
+ int rc = 0;
int minsock = _min_socket;
int maxsock = _max_socket;
int minpnode = _min_pnode;
@@ -1580,11 +1582,12 @@ static void __init build_socket_tables(void)
if (!gre) {
if (is_uv2_hub() || is_uv3_hub()) {
pr_info("UV: No UVsystab socket table, ignoring\n");
- return;
+ return 0;
}
pr_err("UV: Error: UVsystab address translations not available!\n");
WARN_ON_ONCE(!gre);
- return;
+ rc = -EINVAL;
+ goto err_free_tables;
}

numn = num_possible_nodes();
@@ -1596,10 +1599,8 @@ static void __init build_socket_tables(void)
|| (alloc_conv_table(nums, &_socket_to_pnode) < 0)
|| (alloc_conv_table(numn, &_node_to_socket) < 0)
|| (alloc_conv_table(nums, &_socket_to_node) < 0)) {
- kfree(_pnode_to_socket);
- kfree(_socket_to_pnode);
- kfree(_node_to_socket);
- return;
+ rc = -ENOMEM;
+ goto err_free_tables;
}

/* Fill in pnode/node/addr conversion list values: */
@@ -1623,9 +1624,9 @@ static void __init build_socket_tables(void)
/* Set socket -> node values: */
lnid = NUMA_NO_NODE;
for_each_possible_cpu(cpu) {
- int nid = cpu_to_node(cpu);
- int apicid, sockid;
+ int apicid;

+ nid = cpu_to_node(cpu);
if (lnid == nid)
continue;
lnid = nid;
@@ -1647,6 +1648,28 @@ static void __init build_socket_tables(void)
_socket_to_node[sockid - minsock]);
}

+ /*
+ * If nr_cpus=<val> is used to reduce the cpu count below
+ * what's actually present, the cpu loop above may not find
+ * all the node to socket mappings needed to complete these
+ * tables. Abort UV init and act like a non-uv system if this
+ * happens.
+ */
+ for_each_node(nid) {
+ if (_node_to_socket[nid] == SOCK_EMPTY) {
+ pr_err("UV: Incomplete node table (nr_cpus too small?)\n");
+ rc = -EINVAL;
+ goto err_free_tables;
+ }
+ }
+ for (sockid = 0; sockid < nums; sockid++) {
+ if (_socket_to_node[sockid] == SOCK_EMPTY) {
+ pr_err("UV: Incomplete socket table (nr_cpus too small?)\n");
+ rc = -EINVAL;
+ goto err_free_tables;
+ }
+ }
+
/*
* If e.g. socket id == pnode for all pnodes,
* system runs faster by removing corresponding conversion table.
@@ -1655,6 +1678,17 @@ static void __init build_socket_tables(void)
FREE_1_TO_1_TABLE(_node_to_socket, _min_socket, nums, numn);
FREE_1_TO_1_TABLE(_socket_to_pnode, _min_pnode, nums, nump);
FREE_1_TO_1_TABLE(_pnode_to_socket, _min_pnode, nums, nump);
+
+ return 0;
+
+ err_free_tables:
+ kfree(_pnode_to_socket);
+ kfree(_socket_to_pnode);
+ kfree(_node_to_socket);
+ kfree(_socket_to_node);
+ /* make is_uv_system() return false from now on */
+ uv_system_type = UV_NONE;
+ return rc;
}

/* Check which reboot to use */
@@ -1763,7 +1797,9 @@ static void __init uv_system_init_hub(void)
return;
}

- build_socket_tables();
+ if (build_socket_tables() < 0)
+ return;
+
build_uv_gr_table();
set_block_size();
uv_init_hub_info(&hub_info);
--
2.26.2



2023-07-11 23:55:52

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH] x86/platform/uv: Abort UV initialization when reduced nr_cpus requires it

On 7/11/23 13:26, Steve Wahl wrote:
> When nr_cpus is set to a smaller number than actually present, there
> is some node-to-socket mapping info we won't get access to in

First of all, no "we's" in commit messages.

> https://www.kernel.org/doc/html/next/process/maintainer-tip.html#changelog

> build_socket_tables(). This could later result in using a -1 value
> for some array indexing, and eventual kernel page faults.
>
> To avoid this, if any unfilled table entries are found, print a
> warning message, and resume initializing, acting as if this is not a
> UV system. UV features will be unavailable, but we will not cause
> kernel dumps.
>
> This is a condition we expect only in platform debugging situations,
> not in day-to-day operation.

This seems like a hack.

The real problem is that you've got an online Linux NUMA node with no
CPUs. uv_system_init_hub() (probably) goes and does:

> for_each_node(nodeid)
> __uv_hub_info_list[nodeid] = uv_hub_info_list_blade[uv_node_to_blade_id(nodeid)];

But the node=>blade lookup uses socket numbers. No CPUs means no socket
numbers. You _have_ the blade information _somewhere_. Is there really
no other way to map it to a NUMA node than using the CPU apicid?

2023-07-12 21:54:08

by Steve Wahl

[permalink] [raw]
Subject: Re: [PATCH] x86/platform/uv: Abort UV initialization when reduced nr_cpus requires it

On Tue, Jul 11, 2023 at 04:07:55PM -0700, Dave Hansen wrote:
> On 7/11/23 13:26, Steve Wahl wrote:
> > When nr_cpus is set to a smaller number than actually present, there
> > is some node-to-socket mapping info we won't get access to in
>
> First of all, no "we's" in commit messages.
>
> > https://www.kernel.org/doc/html/next/process/maintainer-tip.html#changelog

Ah, I was trying to be imperative in the description of what to do,
but didn't understand it applied as much to the description of what
happened in the past that needs to be fixed. I will fix this.

> > build_socket_tables(). This could later result in using a -1 value
> > for some array indexing, and eventual kernel page faults.
> >
> > To avoid this, if any unfilled table entries are found, print a
> > warning message, and resume initializing, acting as if this is not a
> > UV system. UV features will be unavailable, but we will not cause
> > kernel dumps.
> >
> > This is a condition we expect only in platform debugging situations,
> > not in day-to-day operation.
>
> This seems like a hack.
>
> The real problem is that you've got an online Linux NUMA node with no
> CPUs. uv_system_init_hub() (probably) goes and does:
>
> > for_each_node(nodeid)
> > __uv_hub_info_list[nodeid] = uv_hub_info_list_blade[uv_node_to_blade_id(nodeid)];
>
> But the node=>blade lookup uses socket numbers. No CPUs means no socket
> numbers. You _have_ the blade information _somewhere_. Is there really
> no other way to map it to a NUMA node than using the CPU apicid?

I will see if I can find a better place to obtain this information.

Thank you.

--> Steve Wahl

--
Steve Wahl, Hewlett Packard Enterprise