2024-04-30 09:27:28

by Robert Richter

[permalink] [raw]
Subject: [PATCH v6 0/7] SRAT/CEDT fixes and updates

Some fixes and updates for SRAT/CEDT parsing code. Patches can be
applied individually and are independent.

First patch fixes a page fault during boot (fix as suggested by Dan).

Patches 2 to 4 remove architectural code no longer needed.

Patches 5 to 7 add diagnostic printouts for CEDT.

Changelog:

v6:
* rebased onto cxl/fixes
* fixed 0day build errors in patch #1:
https://github.com/intel-lab-lkp/linux/commits/Robert-Richter/x86-numa-Fix-SRAT-lookup-of-CFMWS-ranges-with-numa_fill_memblks/20240429-205337

v5:
* dropped: "x86/numa: Fix SRAT lookup of CFMWS ranges with
numa_fill_memblks()"
* added: "ACPI/NUMA: Return memblk modification state from
numa_fill_memblks()"
* conditionally print CEDT extended memblks

v4:
* updated SOB chains and desription
* added patch "x86/numa: Remove numa_fill_memblks() from sparsemem.h
using __weak"
* Reordered patches to move CEDT table printout as an option at the
end
* split print table patch and added: "ACPI/NUMA: Add log messages for
memory ranges found in CEDT"

v3:
* Rebased onto v6.9-rc1
* Fixing x86 build error in sparsemem.h [Dan/Alison]
* Added CEDT node info [Alison]
* Use pr_debug() for table output [Dan]
* Refactoring split in 3 patches [Dan]
* Fixed performance regression introduced [kbot]
* Fixed checkpatch issues [Dan]

Robert Richter (7):
x86/numa: Fix SRAT lookup of CFMWS ranges with numa_fill_memblks()
ACPI/NUMA: Remove architecture dependent remainings
ACPI/NUMA: Squash acpi_numa_slit_init() into acpi_parse_slit()
ACPI/NUMA: Squash acpi_numa_memory_affinity_init() into
acpi_parse_memory_affinity()
ACPI/NUMA: Return memblk modification state from numa_fill_memblks()
ACPI/NUMA: Add log messages for memory ranges found in CEDT
ACPI/NUMA: Print CXL Early Discovery Table (CEDT)

arch/x86/include/asm/sparsemem.h | 2 -
arch/x86/mm/numa.c | 37 +++---
drivers/acpi/numa/srat.c | 207 +++++++++++++++++++++++--------
include/linux/acpi.h | 5 -
include/linux/numa.h | 7 +-
5 files changed, 176 insertions(+), 82 deletions(-)


base-commit: 5d211c7090590033581175d6405ae40917ca3a06
--
2.39.2



2024-04-30 09:27:48

by Robert Richter

[permalink] [raw]
Subject: [PATCH v6 1/7] x86/numa: Fix SRAT lookup of CFMWS ranges with numa_fill_memblks()

For configurations that have the kconfig option NUMA_KEEP_MEMINFO
disabled numa_fill_memblks() only returns with NUMA_NO_MEMBLK (-1).
SRAT lookup fails then because an existing SRAT memory range cannot be
found for a CFMWS address range. This causes the addition of a
duplicate numa_memblk with a different node id and a subsequent page
fault and kernel crash during boot.

Fix this by making numa_fill_memblks() always available regardless of
NUMA_KEEP_MEMINFO.

The fix also removes numa_fill_memblks() from sparsemem.h using
__weak.

From Dan:

"""
It just feels like numa_fill_memblks() has absolutely no business being
defined in arch/x86/include/asm/sparsemem.h.

The only use for numa_fill_memblks() is to arrange for NUMA nodes to be
applied to memory ranges hot-onlined by the CXL driver.

It belongs right next to numa_add_memblk(), and I suspect
arch/x86/include/asm/sparsemem.h was only chosen to avoid figuring out
what to do about the fact that linux/numa.h does not include asm/numa.h
and that all implementations either provide numa_add_memblk() or select
the generic implementation.

So I would prefer that this do the proper fix and get
numa_fill_memblks() completely out of the sparsemem.h path.

Something like the following which boots for me.
"""

Note that the issue was initially introduced with [1]. But since
phys_to_target_node() was originally used that returned the valid node
0, an additional numa_memblk was not added. Though, the node id was
wrong too, a message is seen then in the logs:

kernel/numa.c: pr_info_once("Unknown target node for memory at 0x%llx, assuming node 0\n",

[1] commit fd49f99c1809 ("ACPI: NUMA: Add a node and memblk for each
CFMWS not in SRAT")

Suggested-by: Dan Williams <[email protected]>
Link: https://lore.kernel.org/all/[email protected]/
Fixes: 8f1004679987 ("ACPI/NUMA: Apply SRAT proximity domain to entire CFMWS window")
Cc: Derick Marks <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Alison Schofield <[email protected]>
Signed-off-by: Robert Richter <[email protected]>
---
Authorship can be changed to Dan's if he wants to but that needs his
Signed-off-by.
Signed-off-by: Robert Richter <[email protected]>
---
arch/x86/include/asm/sparsemem.h | 2 --
arch/x86/mm/numa.c | 4 ++--
drivers/acpi/numa/srat.c | 5 +++++
include/linux/numa.h | 7 +------
4 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/sparsemem.h b/arch/x86/include/asm/sparsemem.h
index 1be13b2dfe8b..64df897c0ee3 100644
--- a/arch/x86/include/asm/sparsemem.h
+++ b/arch/x86/include/asm/sparsemem.h
@@ -37,8 +37,6 @@ extern int phys_to_target_node(phys_addr_t start);
#define phys_to_target_node phys_to_target_node
extern int memory_add_physaddr_to_nid(u64 start);
#define memory_add_physaddr_to_nid memory_add_physaddr_to_nid
-extern int numa_fill_memblks(u64 start, u64 end);
-#define numa_fill_memblks numa_fill_memblks
#endif
#endif /* __ASSEMBLY__ */

diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index 65e9a6e391c0..ce84ba86e69e 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -929,6 +929,8 @@ int memory_add_physaddr_to_nid(u64 start)
}
EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid);

+#endif
+
static int __init cmp_memblk(const void *a, const void *b)
{
const struct numa_memblk *ma = *(const struct numa_memblk **)a;
@@ -1001,5 +1003,3 @@ int __init numa_fill_memblks(u64 start, u64 end)
}
return 0;
}
-
-#endif
diff --git a/drivers/acpi/numa/srat.c b/drivers/acpi/numa/srat.c
index e45e64993c50..3b09fd39eeb4 100644
--- a/drivers/acpi/numa/srat.c
+++ b/drivers/acpi/numa/srat.c
@@ -208,6 +208,11 @@ int __init srat_disabled(void)
return acpi_numa < 0;
}

+__weak int __init numa_fill_memblks(u64 start, u64 end)
+{
+ return NUMA_NO_MEMBLK;
+}
+
#if defined(CONFIG_X86) || defined(CONFIG_ARM64) || defined(CONFIG_LOONGARCH)
/*
* Callback for SLIT parsing. pxm_to_node() returns NUMA_NO_NODE for
diff --git a/include/linux/numa.h b/include/linux/numa.h
index 915033a75731..1d43371fafd2 100644
--- a/include/linux/numa.h
+++ b/include/linux/numa.h
@@ -36,12 +36,7 @@ int memory_add_physaddr_to_nid(u64 start);
int phys_to_target_node(u64 start);
#endif

-#ifndef numa_fill_memblks
-static inline int __init numa_fill_memblks(u64 start, u64 end)
-{
- return NUMA_NO_MEMBLK;
-}
-#endif
+int numa_fill_memblks(u64 start, u64 end);

#else /* !CONFIG_NUMA */
static inline int numa_nearest_node(int node, unsigned int state)
--
2.39.2


2024-04-30 09:28:01

by Robert Richter

[permalink] [raw]
Subject: [PATCH v6 3/7] ACPI/NUMA: Squash acpi_numa_slit_init() into acpi_parse_slit()

After removing architectural code the helper function
acpi_numa_slit_init() is no longer needed. Squash it into
acpi_parse_slit(). No functional changes intended.

Reviewed-by: Dan Williams <[email protected]>
Signed-off-by: Robert Richter <[email protected]>
---
drivers/acpi/numa/srat.c | 23 +++++++++--------------
1 file changed, 9 insertions(+), 14 deletions(-)

diff --git a/drivers/acpi/numa/srat.c b/drivers/acpi/numa/srat.c
index e4d53e3660fd..430ddcfb8312 100644
--- a/drivers/acpi/numa/srat.c
+++ b/drivers/acpi/numa/srat.c
@@ -218,10 +218,16 @@ __weak int __init numa_fill_memblks(u64 start, u64 end)
* I/O localities since SRAT does not list them. I/O localities are
* not supported at this point.
*/
-static void __init acpi_numa_slit_init(struct acpi_table_slit *slit)
+static int __init acpi_parse_slit(struct acpi_table_header *table)
{
+ struct acpi_table_slit *slit = (struct acpi_table_slit *)table;
int i, j;

+ if (!slit_valid(slit)) {
+ pr_info("SLIT table looks invalid. Not used.\n");
+ return -EINVAL;
+ }
+
for (i = 0; i < slit->locality_count; i++) {
const int from_node = pxm_to_node(i);

@@ -238,6 +244,8 @@ static void __init acpi_numa_slit_init(struct acpi_table_slit *slit)
slit->entry[slit->locality_count * i + j]);
}
}
+
+ return 0;
}

static int __init
@@ -341,19 +349,6 @@ static int __init acpi_parse_cfmws(union acpi_subtable_headers *header,
return 0;
}

-static int __init acpi_parse_slit(struct acpi_table_header *table)
-{
- struct acpi_table_slit *slit = (struct acpi_table_slit *)table;
-
- if (!slit_valid(slit)) {
- pr_info("SLIT table looks invalid. Not used.\n");
- return -EINVAL;
- }
- acpi_numa_slit_init(slit);
-
- return 0;
-}
-
void __init __weak
acpi_numa_x2apic_affinity_init(struct acpi_srat_x2apic_cpu_affinity *pa)
{
--
2.39.2


2024-04-30 09:28:23

by Robert Richter

[permalink] [raw]
Subject: [PATCH v6 4/7] ACPI/NUMA: Squash acpi_numa_memory_affinity_init() into acpi_parse_memory_affinity()

After removing architectural code the helper function
acpi_numa_memory_affinity_init() is no longer needed. Squash it into
acpi_parse_memory_affinity(). No functional changes intended.

While at it, fixing checkpatch complaints in code moved.

Reported-by: kernel test robot <[email protected]>
Closes: https://lore.kernel.org/oe-lkp/[email protected]
Reviewed-by: Dan Williams <[email protected]>
Signed-off-by: Robert Richter <[email protected]>
---
drivers/acpi/numa/srat.c | 40 +++++++++++++++++-----------------------
1 file changed, 17 insertions(+), 23 deletions(-)

diff --git a/drivers/acpi/numa/srat.c b/drivers/acpi/numa/srat.c
index 430ddcfb8312..e3f26e71637a 100644
--- a/drivers/acpi/numa/srat.c
+++ b/drivers/acpi/numa/srat.c
@@ -248,22 +248,30 @@ static int __init acpi_parse_slit(struct acpi_table_header *table)
return 0;
}

+static int parsed_numa_memblks __initdata;
+
static int __init
-acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
+acpi_parse_memory_affinity(union acpi_subtable_headers *header,
+ const unsigned long table_end)
{
+ struct acpi_srat_mem_affinity *ma;
u64 start, end;
u32 hotpluggable;
int node, pxm;

+ ma = (struct acpi_srat_mem_affinity *)header;
+
+ acpi_table_print_srat_entry(&header->common);
+
if (srat_disabled())
- goto out_err;
+ return 0;
if (ma->header.length < sizeof(struct acpi_srat_mem_affinity)) {
pr_err("SRAT: Unexpected header length: %d\n",
ma->header.length);
goto out_err_bad_srat;
}
if ((ma->flags & ACPI_SRAT_MEM_ENABLED) == 0)
- goto out_err;
+ return 0;
hotpluggable = IS_ENABLED(CONFIG_MEMORY_HOTPLUG) &&
(ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE);

@@ -301,11 +309,15 @@ acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)

max_possible_pfn = max(max_possible_pfn, PFN_UP(end - 1));

+ parsed_numa_memblks++;
+
return 0;
+
out_err_bad_srat:
+ /* Just disable SRAT, but do not fail and ignore errors. */
bad_srat();
-out_err:
- return -EINVAL;
+
+ return 0;
}

static int __init acpi_parse_cfmws(union acpi_subtable_headers *header,
@@ -438,24 +450,6 @@ acpi_parse_gi_affinity(union acpi_subtable_headers *header,
}
#endif /* defined(CONFIG_X86) || defined (CONFIG_ARM64) */

-static int __initdata parsed_numa_memblks;
-
-static int __init
-acpi_parse_memory_affinity(union acpi_subtable_headers * header,
- const unsigned long end)
-{
- struct acpi_srat_mem_affinity *memory_affinity;
-
- memory_affinity = (struct acpi_srat_mem_affinity *)header;
-
- acpi_table_print_srat_entry(&header->common);
-
- /* let architecture-dependent part to do it */
- if (!acpi_numa_memory_affinity_init(memory_affinity))
- parsed_numa_memblks++;
- return 0;
-}
-
static int __init acpi_parse_srat(struct acpi_table_header *table)
{
struct acpi_table_srat *srat = (struct acpi_table_srat *)table;
--
2.39.2


2024-04-30 09:28:37

by Robert Richter

[permalink] [raw]
Subject: [PATCH v6 5/7] ACPI/NUMA: Return memblk modification state from numa_fill_memblks()

When registering a memory range a possibly overlapping memory block
will be extended instead of creating a new one. If both ranges exactly
overlap, the blocks remain unchanged and are just reused. The
information if a memblock was extended is useful for diagnostics.

Change return code of numa_fill_memblks() to also report if memblocks
have been modified.

Link: https://lore.kernel.org/all/ZiqnbD0CB9WUL1zu@aschofie-mobl2/T/#u
Cc: Alison Schofield <[email protected]>
Signed-off-by: Robert Richter <[email protected]>
---
arch/x86/mm/numa.c | 33 ++++++++++++++++++---------------
drivers/acpi/numa/srat.c | 5 +++--
2 files changed, 21 insertions(+), 17 deletions(-)

diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index ce84ba86e69e..e34e96d57656 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -950,15 +950,16 @@ static struct numa_memblk *numa_memblk_list[NR_NODE_MEMBLKS] __initdata;
* address range @start-@end
*
* RETURNS:
- * 0 : Success
- * NUMA_NO_MEMBLK : No memblks exist in address range @start-@end
+ * NUMA_NO_MEMBLK if no memblks exist in address range @start-@end,
+ * zero on success without blocks modified and non-zero positive
+ * values on success with blocks modified.
*/

int __init numa_fill_memblks(u64 start, u64 end)
{
struct numa_memblk **blk = &numa_memblk_list[0];
struct numa_meminfo *mi = &numa_meminfo;
- int count = 0;
+ int count = 0, modified = 0;
u64 prev_end;

/*
@@ -981,25 +982,27 @@ int __init numa_fill_memblks(u64 start, u64 end)
/* Sort the list of pointers in memblk->start order */
sort(&blk[0], count, sizeof(blk[0]), cmp_memblk, NULL);

- /* Make sure the first/last memblks include start/end */
- blk[0]->start = min(blk[0]->start, start);
- blk[count - 1]->end = max(blk[count - 1]->end, end);
-
/*
* Fill any gaps by tracking the previous memblks
* end address and backfilling to it if needed.
*/
- prev_end = blk[0]->end;
- for (int i = 1; i < count; i++) {
+ prev_end = start;
+ for (int i = 0; i < count; i++) {
struct numa_memblk *curr = blk[i];

- if (prev_end >= curr->start) {
- if (prev_end < curr->end)
- prev_end = curr->end;
- } else {
+ if (prev_end < curr->start) {
curr->start = prev_end;
- prev_end = curr->end;
+ modified = 1;
}
+
+ if (prev_end < curr->end)
+ prev_end = curr->end;
}
- return 0;
+
+ if (blk[count - 1]->end < end) {
+ blk[count - 1]->end = end;
+ modified = 1;
+ }
+
+ return modified;
}
diff --git a/drivers/acpi/numa/srat.c b/drivers/acpi/numa/srat.c
index e3f26e71637a..76b39a6d3aef 100644
--- a/drivers/acpi/numa/srat.c
+++ b/drivers/acpi/numa/srat.c
@@ -326,7 +326,7 @@ static int __init acpi_parse_cfmws(union acpi_subtable_headers *header,
struct acpi_cedt_cfmws *cfmws;
int *fake_pxm = arg;
u64 start, end;
- int node;
+ int node, modified;

cfmws = (struct acpi_cedt_cfmws *)header;
start = cfmws->base_hpa;
@@ -338,7 +338,8 @@ static int __init acpi_parse_cfmws(union acpi_subtable_headers *header,
* found for any portion of the window to cover the entire
* window.
*/
- if (!numa_fill_memblks(start, end))
+ modified = numa_fill_memblks(start, end);
+ if (modified != NUMA_NO_MEMBLK)
return 0;

/* No SRAT description. Create a new node. */
--
2.39.2


2024-04-30 09:37:54

by Robert Richter

[permalink] [raw]
Subject: [PATCH v6 6/7] ACPI/NUMA: Add log messages for memory ranges found in CEDT

Adding a pr_info() when successfully adding a CFMWS memory range.

Suggested-by: Alison Schofield <[email protected]>
Signed-off-by: Robert Richter <[email protected]>
---
drivers/acpi/numa/srat.c | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/acpi/numa/srat.c b/drivers/acpi/numa/srat.c
index 76b39a6d3aef..34ecf2dc912f 100644
--- a/drivers/acpi/numa/srat.c
+++ b/drivers/acpi/numa/srat.c
@@ -339,8 +339,12 @@ static int __init acpi_parse_cfmws(union acpi_subtable_headers *header,
* window.
*/
modified = numa_fill_memblks(start, end);
- if (modified != NUMA_NO_MEMBLK)
+ if (modified != NUMA_NO_MEMBLK) {
+ if (modified)
+ pr_info("CEDT: memblk extended [mem %#010Lx-%#010Lx]\n",
+ (unsigned long long) start, (unsigned long long) end - 1);
return 0;
+ }

/* No SRAT description. Create a new node. */
node = acpi_map_pxm_to_node(*fake_pxm);
@@ -355,8 +359,13 @@ static int __init acpi_parse_cfmws(union acpi_subtable_headers *header,
pr_warn("ACPI NUMA: Failed to add memblk for CFMWS node %d [mem %#llx-%#llx]\n",
node, start, end);
}
+
node_set(node, numa_nodes_parsed);

+ pr_info("CEDT: Node %u PXM %u [mem %#010Lx-%#010Lx]\n",
+ node, *fake_pxm,
+ (unsigned long long) start, (unsigned long long) end - 1);
+
/* Set the next available fake_pxm value */
(*fake_pxm)++;
return 0;
--
2.39.2


2024-04-30 09:37:56

by Robert Richter

[permalink] [raw]
Subject: [PATCH v6 2/7] ACPI/NUMA: Remove architecture dependent remainings

With the removal of the Itanium architecture [1] the last architecture
dependent functions:

acpi_numa_slit_init(), acpi_numa_memory_affinity_init()

were removed. Remove its remainings in the header files too and make
them static.

[1] commit cf8e8658100d ("arch: Remove Itanium (IA-64) architecture")

Reviewed-by: Dan Williams <[email protected]>
Signed-off-by: Robert Richter <[email protected]>
---
drivers/acpi/numa/srat.c | 16 ++--------------
include/linux/acpi.h | 5 -----
2 files changed, 2 insertions(+), 19 deletions(-)

diff --git a/drivers/acpi/numa/srat.c b/drivers/acpi/numa/srat.c
index 3b09fd39eeb4..e4d53e3660fd 100644
--- a/drivers/acpi/numa/srat.c
+++ b/drivers/acpi/numa/srat.c
@@ -213,13 +213,12 @@ __weak int __init numa_fill_memblks(u64 start, u64 end)
return NUMA_NO_MEMBLK;
}

-#if defined(CONFIG_X86) || defined(CONFIG_ARM64) || defined(CONFIG_LOONGARCH)
/*
* Callback for SLIT parsing. pxm_to_node() returns NUMA_NO_NODE for
* I/O localities since SRAT does not list them. I/O localities are
* not supported at this point.
*/
-void __init acpi_numa_slit_init(struct acpi_table_slit *slit)
+static void __init acpi_numa_slit_init(struct acpi_table_slit *slit)
{
int i, j;

@@ -241,11 +240,7 @@ void __init acpi_numa_slit_init(struct acpi_table_slit *slit)
}
}

-/*
- * Default callback for parsing of the Proximity Domain <-> Memory
- * Area mappings
- */
-int __init
+static int __init
acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
{
u64 start, end;
@@ -345,13 +340,6 @@ static int __init acpi_parse_cfmws(union acpi_subtable_headers *header,
(*fake_pxm)++;
return 0;
}
-#else
-static int __init acpi_parse_cfmws(union acpi_subtable_headers *header,
- void *arg, const unsigned long table_end)
-{
- return 0;
-}
-#endif /* defined(CONFIG_X86) || defined (CONFIG_ARM64) */

static int __init acpi_parse_slit(struct acpi_table_header *table)
{
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index 34829f2c517a..2c227b61a452 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -242,9 +242,6 @@ static inline bool acpi_gicc_is_usable(struct acpi_madt_generic_interrupt *gicc)
return gicc->flags & ACPI_MADT_ENABLED;
}

-/* the following numa functions are architecture-dependent */
-void acpi_numa_slit_init (struct acpi_table_slit *slit);
-
#if defined(CONFIG_X86) || defined(CONFIG_LOONGARCH)
void acpi_numa_processor_affinity_init (struct acpi_srat_cpu_affinity *pa);
#else
@@ -267,8 +264,6 @@ static inline void
acpi_numa_gicc_affinity_init(struct acpi_srat_gicc_affinity *pa) { }
#endif

-int acpi_numa_memory_affinity_init (struct acpi_srat_mem_affinity *ma);
-
#ifndef PHYS_CPUID_INVALID
typedef u32 phys_cpuid_t;
#define PHYS_CPUID_INVALID (phys_cpuid_t)(-1)
--
2.39.2


2024-04-30 09:37:58

by Robert Richter

[permalink] [raw]
Subject: [PATCH v6 7/7] ACPI/NUMA: Print CXL Early Discovery Table (CEDT)

The CEDT contains similar entries as the SRAT. For diagnostic reasons
print the CEDT same style as the SRAT.

Signed-off-by: Robert Richter <[email protected]>
---
drivers/acpi/numa/srat.c | 111 +++++++++++++++++++++++++++++++++++++++
1 file changed, 111 insertions(+)

diff --git a/drivers/acpi/numa/srat.c b/drivers/acpi/numa/srat.c
index 34ecf2dc912f..fa21d4d5fccf 100644
--- a/drivers/acpi/numa/srat.c
+++ b/drivers/acpi/numa/srat.c
@@ -320,6 +320,114 @@ acpi_parse_memory_affinity(union acpi_subtable_headers *header,
return 0;
}

+static int __init
+__acpi_table_print_cedt_entry(union acpi_subtable_headers *__header,
+ void *arg, const unsigned long table_end)
+{
+ struct acpi_cedt_header *header = (struct acpi_cedt_header *)__header;
+
+ switch (header->type) {
+ case ACPI_CEDT_TYPE_CHBS:
+ {
+ struct acpi_cedt_chbs *p =
+ (struct acpi_cedt_chbs *)header;
+
+ if (header->length < sizeof(struct acpi_cedt_chbs)) {
+ pr_warn("CEDT: unsupported CHBS entry: size %d\n",
+ header->length);
+ break;
+ }
+
+ pr_debug("CEDT: CHBS (0x%llx length 0x%llx uid %lu) %s (%d)\n",
+ (unsigned long long)p->base,
+ (unsigned long long)p->length,
+ (unsigned long)p->uid,
+ (p->cxl_version == ACPI_CEDT_CHBS_VERSION_CXL11) ?
+ "cxl11" :
+ (p->cxl_version == ACPI_CEDT_CHBS_VERSION_CXL20) ?
+ "cxl20" :
+ "unsupported version",
+ p->cxl_version);
+ }
+ break;
+ case ACPI_CEDT_TYPE_CFMWS:
+ {
+ struct acpi_cedt_cfmws *p =
+ (struct acpi_cedt_cfmws *)header;
+ int eiw_to_ways[] = {1, 2, 4, 8, 16, 3, 6, 12};
+ int targets = -1;
+
+ if (header->length < sizeof(struct acpi_cedt_cfmws)) {
+ pr_warn("CEDT: unsupported CFMWS entry: size %d\n",
+ header->length);
+ break;
+ }
+
+ if (p->interleave_ways < ARRAY_SIZE(eiw_to_ways))
+ targets = eiw_to_ways[p->interleave_ways];
+ if (header->length < struct_size(
+ p, interleave_targets, targets))
+ targets = -1;
+
+ pr_debug("CEDT: CFMWS (0x%llx length 0x%llx) with %d target%s",
+ (unsigned long long)p->base_hpa,
+ (unsigned long long)p->window_size,
+ targets, targets > 1 ? "s" : "");
+ for (int i = 0; i < targets; i++)
+ pr_cont("%s%lu", i ? ", " : " (",
+ (unsigned long)p->interleave_targets[i]);
+ pr_cont("%s%s%s%s%s%s\n",
+ targets > 0 ? ")" : "",
+ (p->restrictions & ACPI_CEDT_CFMWS_RESTRICT_TYPE2) ?
+ " type2" : "",
+ (p->restrictions & ACPI_CEDT_CFMWS_RESTRICT_TYPE3) ?
+ " type3" : "",
+ (p->restrictions & ACPI_CEDT_CFMWS_RESTRICT_VOLATILE) ?
+ " volatile" : "",
+ (p->restrictions & ACPI_CEDT_CFMWS_RESTRICT_PMEM) ?
+ " pmem" : "",
+ (p->restrictions & ACPI_CEDT_CFMWS_RESTRICT_FIXED) ?
+ " fixed" : "");
+ }
+ break;
+ case ACPI_CEDT_TYPE_CXIMS:
+ {
+ struct acpi_cedt_cxims *p =
+ (struct acpi_cedt_cxims *)header;
+
+ if (header->length < sizeof(struct acpi_cedt_cxims)) {
+ pr_warn("CEDT: unsupported CXIMS entry: size %d\n",
+ header->length);
+ break;
+ }
+
+ pr_debug("CEDT: CXIMS (hbig %u nr_xormaps %u)\n",
+ (unsigned int)p->hbig,
+ (unsigned int)p->nr_xormaps);
+ }
+ break;
+ default:
+ pr_warn("CEDT: Found unsupported entry (type = 0x%x)\n",
+ header->type);
+ break;
+ }
+
+ return 0;
+}
+
+static void __init acpi_table_print_cedt_entry(enum acpi_cedt_type id)
+{
+ acpi_table_parse_cedt(id, __acpi_table_print_cedt_entry, NULL);
+}
+
+static void __init acpi_table_print_cedt(void)
+{
+ /* Print only implemented CEDT types */
+ acpi_table_print_cedt_entry(ACPI_CEDT_TYPE_CHBS);
+ acpi_table_print_cedt_entry(ACPI_CEDT_TYPE_CFMWS);
+ acpi_table_print_cedt_entry(ACPI_CEDT_TYPE_CXIMS);
+}
+
static int __init acpi_parse_cfmws(union acpi_subtable_headers *header,
void *arg, const unsigned long table_end)
{
@@ -518,6 +626,9 @@ int __init acpi_numa_init(void)
/* SLIT: System Locality Information Table */
acpi_table_parse(ACPI_SIG_SLIT, acpi_parse_slit);

+ /* CEDT: CXL Early Discovery Table */
+ acpi_table_print_cedt();
+
/*
* CXL Fixed Memory Window Structures (CFMWS) must be parsed
* after the SRAT. Create NUMA Nodes for CXL memory ranges that
--
2.39.2


2024-04-30 14:49:14

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [PATCH v6 1/7] x86/numa: Fix SRAT lookup of CFMWS ranges with numa_fill_memblks()

On Tue, 30 Apr 2024 11:21:54 +0200
Robert Richter <[email protected]> wrote:

> For configurations that have the kconfig option NUMA_KEEP_MEMINFO
> disabled numa_fill_memblks() only returns with NUMA_NO_MEMBLK (-1).
> SRAT lookup fails then because an existing SRAT memory range cannot be
> found for a CFMWS address range. This causes the addition of a
> duplicate numa_memblk with a different node id and a subsequent page
> fault and kernel crash during boot.
>
> Fix this by making numa_fill_memblks() always available regardless of
> NUMA_KEEP_MEMINFO.
>
> The fix also removes numa_fill_memblks() from sparsemem.h using
> __weak.
>
> From Dan:
>
> """
> It just feels like numa_fill_memblks() has absolutely no business being
> defined in arch/x86/include/asm/sparsemem.h.
>
> The only use for numa_fill_memblks() is to arrange for NUMA nodes to be
> applied to memory ranges hot-onlined by the CXL driver.
>
> It belongs right next to numa_add_memblk(), and I suspect
> arch/x86/include/asm/sparsemem.h was only chosen to avoid figuring out
> what to do about the fact that linux/numa.h does not include asm/numa.h
> and that all implementations either provide numa_add_memblk() or select
> the generic implementation.
>
> So I would prefer that this do the proper fix and get
> numa_fill_memblks() completely out of the sparsemem.h path.
>
> Something like the following which boots for me.
> """
>
> Note that the issue was initially introduced with [1]. But since
> phys_to_target_node() was originally used that returned the valid node
> 0, an additional numa_memblk was not added. Though, the node id was
> wrong too, a message is seen then in the logs:
>
> kernel/numa.c: pr_info_once("Unknown target node for memory at 0x%llx, assuming node 0\n",
>
> [1] commit fd49f99c1809 ("ACPI: NUMA: Add a node and memblk for each
> CFMWS not in SRAT")
>
> Suggested-by: Dan Williams <[email protected]>
> Link: https://lore.kernel.org/all/[email protected]/
> Fixes: 8f1004679987 ("ACPI/NUMA: Apply SRAT proximity domain to entire CFMWS window")
> Cc: Derick Marks <[email protected]>
> Cc: Dan Williams <[email protected]>
> Cc: Alison Schofield <[email protected]>
> Signed-off-by: Robert Richter <[email protected]>

Whilst I'm not particularly keen on an arch specific solution for this
and the stub is effectively pointless beyond making the build work, I guess
this works well enough for now.

Reviewed-by: Jonathan Cameron <[email protected]>

I was aiming to post the ARM64 handling this cycle but it hasn't quite happened yet :(
Maybe we can look at whether there is a better level share at than
the whole function once that is done.

Jonathan

> ---
> Authorship can be changed to Dan's if he wants to but that needs his
> Signed-off-by.
> Signed-off-by: Robert Richter <[email protected]>
> ---
> arch/x86/include/asm/sparsemem.h | 2 --
> arch/x86/mm/numa.c | 4 ++--
> drivers/acpi/numa/srat.c | 5 +++++
> include/linux/numa.h | 7 +------
> 4 files changed, 8 insertions(+), 10 deletions(-)
>
> diff --git a/arch/x86/include/asm/sparsemem.h b/arch/x86/include/asm/sparsemem.h
> index 1be13b2dfe8b..64df897c0ee3 100644
> --- a/arch/x86/include/asm/sparsemem.h
> +++ b/arch/x86/include/asm/sparsemem.h
> @@ -37,8 +37,6 @@ extern int phys_to_target_node(phys_addr_t start);
> #define phys_to_target_node phys_to_target_node
> extern int memory_add_physaddr_to_nid(u64 start);
> #define memory_add_physaddr_to_nid memory_add_physaddr_to_nid
> -extern int numa_fill_memblks(u64 start, u64 end);
> -#define numa_fill_memblks numa_fill_memblks
> #endif
> #endif /* __ASSEMBLY__ */
>
> diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
> index 65e9a6e391c0..ce84ba86e69e 100644
> --- a/arch/x86/mm/numa.c
> +++ b/arch/x86/mm/numa.c
> @@ -929,6 +929,8 @@ int memory_add_physaddr_to_nid(u64 start)
> }
> EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid);
>
> +#endif
> +
> static int __init cmp_memblk(const void *a, const void *b)
> {
> const struct numa_memblk *ma = *(const struct numa_memblk **)a;
> @@ -1001,5 +1003,3 @@ int __init numa_fill_memblks(u64 start, u64 end)
> }
> return 0;
> }
> -
> -#endif
> diff --git a/drivers/acpi/numa/srat.c b/drivers/acpi/numa/srat.c
> index e45e64993c50..3b09fd39eeb4 100644
> --- a/drivers/acpi/numa/srat.c
> +++ b/drivers/acpi/numa/srat.c
> @@ -208,6 +208,11 @@ int __init srat_disabled(void)
> return acpi_numa < 0;
> }
>
> +__weak int __init numa_fill_memblks(u64 start, u64 end)
> +{
> + return NUMA_NO_MEMBLK;
> +}
> +
> #if defined(CONFIG_X86) || defined(CONFIG_ARM64) || defined(CONFIG_LOONGARCH)
> /*
> * Callback for SLIT parsing. pxm_to_node() returns NUMA_NO_NODE for
> diff --git a/include/linux/numa.h b/include/linux/numa.h
> index 915033a75731..1d43371fafd2 100644
> --- a/include/linux/numa.h
> +++ b/include/linux/numa.h
> @@ -36,12 +36,7 @@ int memory_add_physaddr_to_nid(u64 start);
> int phys_to_target_node(u64 start);
> #endif
>
> -#ifndef numa_fill_memblks
> -static inline int __init numa_fill_memblks(u64 start, u64 end)
> -{
> - return NUMA_NO_MEMBLK;
> -}
> -#endif
> +int numa_fill_memblks(u64 start, u64 end);
>
> #else /* !CONFIG_NUMA */
> static inline int numa_nearest_node(int node, unsigned int state)


2024-04-30 14:54:58

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [PATCH v6 3/7] ACPI/NUMA: Squash acpi_numa_slit_init() into acpi_parse_slit()

On Tue, 30 Apr 2024 11:21:56 +0200
Robert Richter <[email protected]> wrote:

> After removing architectural code the helper function
> acpi_numa_slit_init() is no longer needed. Squash it into
> acpi_parse_slit(). No functional changes intended.
>
> Reviewed-by: Dan Williams <[email protected]>
> Signed-off-by: Robert Richter <[email protected]>
Make sense
Reviewed-by: Jonathan Cameron <[email protected]>

2024-04-30 14:57:22

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [PATCH v6 2/7] ACPI/NUMA: Remove architecture dependent remainings

On Tue, 30 Apr 2024 11:21:55 +0200
Robert Richter <[email protected]> wrote:

> With the removal of the Itanium architecture [1] the last architecture
> dependent functions:
>
> acpi_numa_slit_init(), acpi_numa_memory_affinity_init()
>
> were removed. Remove its remainings in the header files too and make
> them static.
>
> [1] commit cf8e8658100d ("arch: Remove Itanium (IA-64) architecture")
>
> Reviewed-by: Dan Williams <[email protected]>
> Signed-off-by: Robert Richter <[email protected]>
Trivial comment inline.

Reviewed-by: Jonathan Cameron <[email protected]>

> ---
> drivers/acpi/numa/srat.c | 16 ++--------------
> include/linux/acpi.h | 5 -----
> 2 files changed, 2 insertions(+), 19 deletions(-)
>
> diff --git a/drivers/acpi/numa/srat.c b/drivers/acpi/numa/srat.c
> index 3b09fd39eeb4..e4d53e3660fd 100644
> --- a/drivers/acpi/numa/srat.c
> +++ b/drivers/acpi/numa/srat.c
> @@ -213,13 +213,12 @@ __weak int __init numa_fill_memblks(u64 start, u64 end)
> return NUMA_NO_MEMBLK;
> }
>
> -#if defined(CONFIG_X86) || defined(CONFIG_ARM64) || defined(CONFIG_LOONGARCH)
> /*
> * Callback for SLIT parsing. pxm_to_node() returns NUMA_NO_NODE for

Doesn't like like a callback to me. It's just a normal function.

> * I/O localities since SRAT does not list them. I/O localities are
> * not supported at this point.
> */
> -void __init acpi_numa_slit_init(struct acpi_table_slit *slit)
> +static void __init acpi_numa_slit_init(struct acpi_table_slit *slit)
> {
> int i, j;
>
> @@ -241,11 +240,7 @@ void __init acpi_numa_slit_init(struct acpi_table_slit *slit)
> }
> }
>
> -/*
> - * Default callback for parsing of the Proximity Domain <-> Memory
> - * Area mappings
> - */
> -int __init
> +static int __init
> acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
> {
> u64 start, end;
> @@ -345,13 +340,6 @@ static int __init acpi_parse_cfmws(union acpi_subtable_headers *header,
> (*fake_pxm)++;
> return 0;
> }
> -#else
> -static int __init acpi_parse_cfmws(union acpi_subtable_headers *header,
> - void *arg, const unsigned long table_end)
> -{
> - return 0;
> -}
> -#endif /* defined(CONFIG_X86) || defined (CONFIG_ARM64) */
>
> static int __init acpi_parse_slit(struct acpi_table_header *table)
> {
> diff --git a/include/linux/acpi.h b/include/linux/acpi.h
> index 34829f2c517a..2c227b61a452 100644
> --- a/include/linux/acpi.h
> +++ b/include/linux/acpi.h
> @@ -242,9 +242,6 @@ static inline bool acpi_gicc_is_usable(struct acpi_madt_generic_interrupt *gicc)
> return gicc->flags & ACPI_MADT_ENABLED;
> }
>
> -/* the following numa functions are architecture-dependent */
> -void acpi_numa_slit_init (struct acpi_table_slit *slit);
> -
> #if defined(CONFIG_X86) || defined(CONFIG_LOONGARCH)
> void acpi_numa_processor_affinity_init (struct acpi_srat_cpu_affinity *pa);
> #else
> @@ -267,8 +264,6 @@ static inline void
> acpi_numa_gicc_affinity_init(struct acpi_srat_gicc_affinity *pa) { }
> #endif
>
> -int acpi_numa_memory_affinity_init (struct acpi_srat_mem_affinity *ma);
> -
> #ifndef PHYS_CPUID_INVALID
> typedef u32 phys_cpuid_t;
> #define PHYS_CPUID_INVALID (phys_cpuid_t)(-1)


2024-04-30 15:14:31

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [PATCH v6 5/7] ACPI/NUMA: Return memblk modification state from numa_fill_memblks()

On Tue, 30 Apr 2024 11:21:58 +0200
Robert Richter <[email protected]> wrote:

> When registering a memory range a possibly overlapping memory block
> will be extended instead of creating a new one. If both ranges exactly
> overlap, the blocks remain unchanged and are just reused. The
> information if a memblock was extended is useful for diagnostics.
>
> Change return code of numa_fill_memblks() to also report if memblocks
> have been modified.
>
> Link: https://lore.kernel.org/all/ZiqnbD0CB9WUL1zu@aschofie-mobl2/T/#u
> Cc: Alison Schofield <[email protected]>
> Signed-off-by: Robert Richter <[email protected]>

Seems correct to me.

Reviewed-by: Jonathan Cameron <[email protected]>

2024-04-30 15:34:53

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [PATCH v6 6/7] ACPI/NUMA: Add log messages for memory ranges found in CEDT

On Tue, 30 Apr 2024 11:21:59 +0200
Robert Richter <[email protected]> wrote:

> Adding a pr_info() when successfully adding a CFMWS memory range.
>
> Suggested-by: Alison Schofield <[email protected]>
> Signed-off-by: Robert Richter <[email protected]>

Hmm. I'm a bit doubtful this will work for other architectures
as arm64 at least has two sets of memblocks and the holes probably
want to go in memblock.reserved rather than in memblock.memory.
I think we would want to reflect where the extra memblks was added.

However. I'm not 100% sure on what that ends up like as I've not
written an appropriate numa_fill_memblks() yet, so I guess for
now it's fine here, and maybe it will get pushed into the arch
specific code when a second architecture implements numa_fill_memblks()
if some architectures want to return more detailed info.

So, I've argued myself around to
Reviewed-by: Jonathan Cameron <[email protected]>

Jonathan

p.s. Unrelated whitespace change but I guess can cope with that...

> ---
> drivers/acpi/numa/srat.c | 11 ++++++++++-
> 1 file changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/acpi/numa/srat.c b/drivers/acpi/numa/srat.c
> index 76b39a6d3aef..34ecf2dc912f 100644
> --- a/drivers/acpi/numa/srat.c
> +++ b/drivers/acpi/numa/srat.c
> @@ -339,8 +339,12 @@ static int __init acpi_parse_cfmws(union acpi_subtable_headers *header,
> * window.
> */
> modified = numa_fill_memblks(start, end);
> - if (modified != NUMA_NO_MEMBLK)
> + if (modified != NUMA_NO_MEMBLK) {
> + if (modified)
> + pr_info("CEDT: memblk extended [mem %#010Lx-%#010Lx]\n",
> + (unsigned long long) start, (unsigned long long) end - 1);
> return 0;
> + }
>
> /* No SRAT description. Create a new node. */
> node = acpi_map_pxm_to_node(*fake_pxm);
> @@ -355,8 +359,13 @@ static int __init acpi_parse_cfmws(union acpi_subtable_headers *header,
> pr_warn("ACPI NUMA: Failed to add memblk for CFMWS node %d [mem %#llx-%#llx]\n",
> node, start, end);
> }
> +
> node_set(node, numa_nodes_parsed);
>
> + pr_info("CEDT: Node %u PXM %u [mem %#010Lx-%#010Lx]\n",
> + node, *fake_pxm,
> + (unsigned long long) start, (unsigned long long) end - 1);
> +
> /* Set the next available fake_pxm value */
> (*fake_pxm)++;
> return 0;


2024-04-30 15:38:04

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [PATCH v6 4/7] ACPI/NUMA: Squash acpi_numa_memory_affinity_init() into acpi_parse_memory_affinity()

On Tue, 30 Apr 2024 11:21:57 +0200
Robert Richter <[email protected]> wrote:

> After removing architectural code the helper function
> acpi_numa_memory_affinity_init() is no longer needed. Squash it into
> acpi_parse_memory_affinity(). No functional changes intended.
>
> While at it, fixing checkpatch complaints in code moved.
>
> Reported-by: kernel test robot <[email protected]>
> Closes: https://lore.kernel.org/oe-lkp/[email protected]
> Reviewed-by: Dan Williams <[email protected]>
> Signed-off-by: Robert Richter <[email protected]>

Looks good to me. Printing an SRAT entry we then throw away is a bit
odd but it always did that.


Reviewed-by: Jonathan Cameron <[email protected]>



2024-04-30 15:49:21

by Alison Schofield

[permalink] [raw]
Subject: Re: [PATCH v6 5/7] ACPI/NUMA: Return memblk modification state from numa_fill_memblks()

On Tue, Apr 30, 2024 at 11:21:58AM +0200, Robert Richter wrote:
> When registering a memory range a possibly overlapping memory block
> will be extended instead of creating a new one. If both ranges exactly
> overlap, the blocks remain unchanged and are just reused. The
> information if a memblock was extended is useful for diagnostics.
>
> Change return code of numa_fill_memblks() to also report if memblocks
> have been modified.


Reviewed-by: Alison Schofield <[email protected]>


>
> Link: https://lore.kernel.org/all/ZiqnbD0CB9WUL1zu@aschofie-mobl2/T/#u
> Cc: Alison Schofield <[email protected]>
> Signed-off-by: Robert Richter <[email protected]>
> ---
> arch/x86/mm/numa.c | 33 ++++++++++++++++++---------------
> drivers/acpi/numa/srat.c | 5 +++--
> 2 files changed, 21 insertions(+), 17 deletions(-)

snip

2024-04-30 15:51:19

by Alison Schofield

[permalink] [raw]
Subject: Re: [PATCH v6 6/7] ACPI/NUMA: Add log messages for memory ranges found in CEDT

On Tue, Apr 30, 2024 at 11:21:59AM +0200, Robert Richter wrote:
> Adding a pr_info() when successfully adding a CFMWS memory range.

Reviewed-by: Alison Schofield <[email protected]>

>
> Suggested-by: Alison Schofield <[email protected]>
> Signed-off-by: Robert Richter <[email protected]>
> ---
> drivers/acpi/numa/srat.c | 11 ++++++++++-
> 1 file changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/acpi/numa/srat.c b/drivers/acpi/numa/srat.c
> index 76b39a6d3aef..34ecf2dc912f 100644
> --- a/drivers/acpi/numa/srat.c
> +++ b/drivers/acpi/numa/srat.c
> @@ -339,8 +339,12 @@ static int __init acpi_parse_cfmws(union acpi_subtable_headers *header,
> * window.
> */
> modified = numa_fill_memblks(start, end);
> - if (modified != NUMA_NO_MEMBLK)
> + if (modified != NUMA_NO_MEMBLK) {
> + if (modified)
> + pr_info("CEDT: memblk extended [mem %#010Lx-%#010Lx]\n",
> + (unsigned long long) start, (unsigned long long) end - 1);
> return 0;
> + }
>
> /* No SRAT description. Create a new node. */
> node = acpi_map_pxm_to_node(*fake_pxm);
> @@ -355,8 +359,13 @@ static int __init acpi_parse_cfmws(union acpi_subtable_headers *header,
> pr_warn("ACPI NUMA: Failed to add memblk for CFMWS node %d [mem %#llx-%#llx]\n",
> node, start, end);
> }
> +
> node_set(node, numa_nodes_parsed);
>
> + pr_info("CEDT: Node %u PXM %u [mem %#010Lx-%#010Lx]\n",
> + node, *fake_pxm,
> + (unsigned long long) start, (unsigned long long) end - 1);
> +
> /* Set the next available fake_pxm value */
> (*fake_pxm)++;
> return 0;
> --
> 2.39.2
>

2024-04-30 16:01:53

by Alison Schofield

[permalink] [raw]
Subject: Re: [PATCH v6 4/7] ACPI/NUMA: Squash acpi_numa_memory_affinity_init() into acpi_parse_memory_affinity()

On Tue, Apr 30, 2024 at 11:21:57AM +0200, Robert Richter wrote:
> After removing architectural code the helper function
> acpi_numa_memory_affinity_init() is no longer needed. Squash it into
> acpi_parse_memory_affinity(). No functional changes intended.
>
> While at it, fixing checkpatch complaints in code moved.
>

Reviewed-by: Alison Schofield <[email protected]>

> Reported-by: kernel test robot <[email protected]>
> Closes: https://lore.kernel.org/oe-lkp/[email protected]
> Reviewed-by: Dan Williams <[email protected]>
> Signed-off-by: Robert Richter <[email protected]>
> ---
> drivers/acpi/numa/srat.c | 40 +++++++++++++++++-----------------------
> 1 file changed, 17 insertions(+), 23 deletions(-)
>
> diff --git a/drivers/acpi/numa/srat.c b/drivers/acpi/numa/srat.c
> index 430ddcfb8312..e3f26e71637a 100644
> --- a/drivers/acpi/numa/srat.c
> +++ b/drivers/acpi/numa/srat.c
> @@ -248,22 +248,30 @@ static int __init acpi_parse_slit(struct acpi_table_header *table)
> return 0;
> }
>
> +static int parsed_numa_memblks __initdata;
> +
> static int __init
> -acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
> +acpi_parse_memory_affinity(union acpi_subtable_headers *header,
> + const unsigned long table_end)
> {
> + struct acpi_srat_mem_affinity *ma;
> u64 start, end;
> u32 hotpluggable;
> int node, pxm;
>
> + ma = (struct acpi_srat_mem_affinity *)header;
> +
> + acpi_table_print_srat_entry(&header->common);
> +
> if (srat_disabled())
> - goto out_err;
> + return 0;
> if (ma->header.length < sizeof(struct acpi_srat_mem_affinity)) {
> pr_err("SRAT: Unexpected header length: %d\n",
> ma->header.length);
> goto out_err_bad_srat;
> }
> if ((ma->flags & ACPI_SRAT_MEM_ENABLED) == 0)
> - goto out_err;
> + return 0;
> hotpluggable = IS_ENABLED(CONFIG_MEMORY_HOTPLUG) &&
> (ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE);
>
> @@ -301,11 +309,15 @@ acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
>
> max_possible_pfn = max(max_possible_pfn, PFN_UP(end - 1));
>
> + parsed_numa_memblks++;
> +
> return 0;
> +
> out_err_bad_srat:
> + /* Just disable SRAT, but do not fail and ignore errors. */
> bad_srat();
> -out_err:
> - return -EINVAL;
> +
> + return 0;
> }
>
> static int __init acpi_parse_cfmws(union acpi_subtable_headers *header,
> @@ -438,24 +450,6 @@ acpi_parse_gi_affinity(union acpi_subtable_headers *header,
> }
> #endif /* defined(CONFIG_X86) || defined (CONFIG_ARM64) */
>
> -static int __initdata parsed_numa_memblks;
> -
> -static int __init
> -acpi_parse_memory_affinity(union acpi_subtable_headers * header,
> - const unsigned long end)
> -{
> - struct acpi_srat_mem_affinity *memory_affinity;
> -
> - memory_affinity = (struct acpi_srat_mem_affinity *)header;
> -
> - acpi_table_print_srat_entry(&header->common);
> -
> - /* let architecture-dependent part to do it */
> - if (!acpi_numa_memory_affinity_init(memory_affinity))
> - parsed_numa_memblks++;
> - return 0;
> -}
> -
> static int __init acpi_parse_srat(struct acpi_table_header *table)
> {
> struct acpi_table_srat *srat = (struct acpi_table_srat *)table;
> --
> 2.39.2
>

2024-04-30 16:02:22

by Alison Schofield

[permalink] [raw]
Subject: Re: [PATCH v6 3/7] ACPI/NUMA: Squash acpi_numa_slit_init() into acpi_parse_slit()

On Tue, Apr 30, 2024 at 11:21:56AM +0200, Robert Richter wrote:
> After removing architectural code the helper function
> acpi_numa_slit_init() is no longer needed. Squash it into
> acpi_parse_slit(). No functional changes intended.

Reviewed-by: Alison Schofield <[email protected]>

>
> Reviewed-by: Dan Williams <[email protected]>
> Signed-off-by: Robert Richter <[email protected]>
> ---
> drivers/acpi/numa/srat.c | 23 +++++++++--------------
> 1 file changed, 9 insertions(+), 14 deletions(-)
>
> diff --git a/drivers/acpi/numa/srat.c b/drivers/acpi/numa/srat.c
> index e4d53e3660fd..430ddcfb8312 100644
> --- a/drivers/acpi/numa/srat.c
> +++ b/drivers/acpi/numa/srat.c
> @@ -218,10 +218,16 @@ __weak int __init numa_fill_memblks(u64 start, u64 end)
> * I/O localities since SRAT does not list them. I/O localities are
> * not supported at this point.
> */
> -static void __init acpi_numa_slit_init(struct acpi_table_slit *slit)
> +static int __init acpi_parse_slit(struct acpi_table_header *table)
> {
> + struct acpi_table_slit *slit = (struct acpi_table_slit *)table;
> int i, j;
>
> + if (!slit_valid(slit)) {
> + pr_info("SLIT table looks invalid. Not used.\n");
> + return -EINVAL;
> + }
> +
> for (i = 0; i < slit->locality_count; i++) {
> const int from_node = pxm_to_node(i);
>
> @@ -238,6 +244,8 @@ static void __init acpi_numa_slit_init(struct acpi_table_slit *slit)
> slit->entry[slit->locality_count * i + j]);
> }
> }
> +
> + return 0;
> }
>
> static int __init
> @@ -341,19 +349,6 @@ static int __init acpi_parse_cfmws(union acpi_subtable_headers *header,
> return 0;
> }
>
> -static int __init acpi_parse_slit(struct acpi_table_header *table)
> -{
> - struct acpi_table_slit *slit = (struct acpi_table_slit *)table;
> -
> - if (!slit_valid(slit)) {
> - pr_info("SLIT table looks invalid. Not used.\n");
> - return -EINVAL;
> - }
> - acpi_numa_slit_init(slit);
> -
> - return 0;
> -}
> -
> void __init __weak
> acpi_numa_x2apic_affinity_init(struct acpi_srat_x2apic_cpu_affinity *pa)
> {
> --
> 2.39.2
>

2024-04-30 16:02:34

by Alison Schofield

[permalink] [raw]
Subject: Re: [PATCH v6 2/7] ACPI/NUMA: Remove architecture dependent remainings

On Tue, Apr 30, 2024 at 11:21:55AM +0200, Robert Richter wrote:
> With the removal of the Itanium architecture [1] the last architecture
> dependent functions:
>
> acpi_numa_slit_init(), acpi_numa_memory_affinity_init()
>
> were removed. Remove its remainings in the header files too and make
> them static.
>
> [1] commit cf8e8658100d ("arch: Remove Itanium (IA-64) architecture")


Reviewed-by: Alison Schofield <[email protected]>

>
> Reviewed-by: Dan Williams <[email protected]>
> Signed-off-by: Robert Richter <[email protected]>
> ---
> drivers/acpi/numa/srat.c | 16 ++--------------
> include/linux/acpi.h | 5 -----
> 2 files changed, 2 insertions(+), 19 deletions(-)
>
> diff --git a/drivers/acpi/numa/srat.c b/drivers/acpi/numa/srat.c
> index 3b09fd39eeb4..e4d53e3660fd 100644
> --- a/drivers/acpi/numa/srat.c
> +++ b/drivers/acpi/numa/srat.c
> @@ -213,13 +213,12 @@ __weak int __init numa_fill_memblks(u64 start, u64 end)
> return NUMA_NO_MEMBLK;
> }
>
> -#if defined(CONFIG_X86) || defined(CONFIG_ARM64) || defined(CONFIG_LOONGARCH)
> /*
> * Callback for SLIT parsing. pxm_to_node() returns NUMA_NO_NODE for
> * I/O localities since SRAT does not list them. I/O localities are
> * not supported at this point.
> */
> -void __init acpi_numa_slit_init(struct acpi_table_slit *slit)
> +static void __init acpi_numa_slit_init(struct acpi_table_slit *slit)
> {
> int i, j;
>
> @@ -241,11 +240,7 @@ void __init acpi_numa_slit_init(struct acpi_table_slit *slit)
> }
> }
>
> -/*
> - * Default callback for parsing of the Proximity Domain <-> Memory
> - * Area mappings
> - */
> -int __init
> +static int __init
> acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
> {
> u64 start, end;
> @@ -345,13 +340,6 @@ static int __init acpi_parse_cfmws(union acpi_subtable_headers *header,
> (*fake_pxm)++;
> return 0;
> }
> -#else
> -static int __init acpi_parse_cfmws(union acpi_subtable_headers *header,
> - void *arg, const unsigned long table_end)
> -{
> - return 0;
> -}
> -#endif /* defined(CONFIG_X86) || defined (CONFIG_ARM64) */
>
> static int __init acpi_parse_slit(struct acpi_table_header *table)
> {
> diff --git a/include/linux/acpi.h b/include/linux/acpi.h
> index 34829f2c517a..2c227b61a452 100644
> --- a/include/linux/acpi.h
> +++ b/include/linux/acpi.h
> @@ -242,9 +242,6 @@ static inline bool acpi_gicc_is_usable(struct acpi_madt_generic_interrupt *gicc)
> return gicc->flags & ACPI_MADT_ENABLED;
> }
>
> -/* the following numa functions are architecture-dependent */
> -void acpi_numa_slit_init (struct acpi_table_slit *slit);
> -
> #if defined(CONFIG_X86) || defined(CONFIG_LOONGARCH)
> void acpi_numa_processor_affinity_init (struct acpi_srat_cpu_affinity *pa);
> #else
> @@ -267,8 +264,6 @@ static inline void
> acpi_numa_gicc_affinity_init(struct acpi_srat_gicc_affinity *pa) { }
> #endif
>
> -int acpi_numa_memory_affinity_init (struct acpi_srat_mem_affinity *ma);
> -
> #ifndef PHYS_CPUID_INVALID
> typedef u32 phys_cpuid_t;
> #define PHYS_CPUID_INVALID (phys_cpuid_t)(-1)
> --
> 2.39.2
>

2024-04-30 16:17:23

by Alison Schofield

[permalink] [raw]
Subject: Re: [PATCH v6 1/7] x86/numa: Fix SRAT lookup of CFMWS ranges with numa_fill_memblks()

On Tue, Apr 30, 2024 at 11:21:54AM +0200, Robert Richter wrote:
> For configurations that have the kconfig option NUMA_KEEP_MEMINFO
> disabled numa_fill_memblks() only returns with NUMA_NO_MEMBLK (-1).
> SRAT lookup fails then because an existing SRAT memory range cannot be
> found for a CFMWS address range. This causes the addition of a
> duplicate numa_memblk with a different node id and a subsequent page
> fault and kernel crash during boot.
>
> Fix this by making numa_fill_memblks() always available regardless of
> NUMA_KEEP_MEMINFO.
>
> The fix also removes numa_fill_memblks() from sparsemem.h using
> __weak.
>
> From Dan:
>
> """
> It just feels like numa_fill_memblks() has absolutely no business being
> defined in arch/x86/include/asm/sparsemem.h.
>
> The only use for numa_fill_memblks() is to arrange for NUMA nodes to be
> applied to memory ranges hot-onlined by the CXL driver.
>
> It belongs right next to numa_add_memblk(), and I suspect
> arch/x86/include/asm/sparsemem.h was only chosen to avoid figuring out
> what to do about the fact that linux/numa.h does not include asm/numa.h
> and that all implementations either provide numa_add_memblk() or select
> the generic implementation.
>
> So I would prefer that this do the proper fix and get
> numa_fill_memblks() completely out of the sparsemem.h path.
>
> Something like the following which boots for me.
> """
>
> Note that the issue was initially introduced with [1]. But since
> phys_to_target_node() was originally used that returned the valid node
> 0, an additional numa_memblk was not added. Though, the node id was
> wrong too, a message is seen then in the logs:
>
> kernel/numa.c: pr_info_once("Unknown target node for memory at 0x%llx, assuming node 0\n",
>
> [1] commit fd49f99c1809 ("ACPI: NUMA: Add a node and memblk for each
> CFMWS not in SRAT")
>

For the commit log above -
Perhaps the Dan quote can be reduced to a note about the implementation
choice. Folks can look up the Lore thread if history is needed.

For the code -
Reviewed-by: Alison Schofield <[email protected]>


> Suggested-by: Dan Williams <[email protected]>
> Link: https://lore.kernel.org/all/[email protected]/
> Fixes: 8f1004679987 ("ACPI/NUMA: Apply SRAT proximity domain to entire CFMWS window")
> Cc: Derick Marks <[email protected]>
> Cc: Dan Williams <[email protected]>
> Cc: Alison Schofield <[email protected]>
> Signed-off-by: Robert Richter <[email protected]>
> ---
> Authorship can be changed to Dan's if he wants to but that needs his
> Signed-off-by.
> Signed-off-by: Robert Richter <[email protected]>
> ---
> arch/x86/include/asm/sparsemem.h | 2 --
> arch/x86/mm/numa.c | 4 ++--
> drivers/acpi/numa/srat.c | 5 +++++
> include/linux/numa.h | 7 +------
> 4 files changed, 8 insertions(+), 10 deletions(-)
>
> diff --git a/arch/x86/include/asm/sparsemem.h b/arch/x86/include/asm/sparsemem.h
> index 1be13b2dfe8b..64df897c0ee3 100644
> --- a/arch/x86/include/asm/sparsemem.h
> +++ b/arch/x86/include/asm/sparsemem.h
> @@ -37,8 +37,6 @@ extern int phys_to_target_node(phys_addr_t start);
> #define phys_to_target_node phys_to_target_node
> extern int memory_add_physaddr_to_nid(u64 start);
> #define memory_add_physaddr_to_nid memory_add_physaddr_to_nid
> -extern int numa_fill_memblks(u64 start, u64 end);
> -#define numa_fill_memblks numa_fill_memblks
> #endif
> #endif /* __ASSEMBLY__ */
>
> diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
> index 65e9a6e391c0..ce84ba86e69e 100644
> --- a/arch/x86/mm/numa.c
> +++ b/arch/x86/mm/numa.c
> @@ -929,6 +929,8 @@ int memory_add_physaddr_to_nid(u64 start)
> }
> EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid);
>
> +#endif
> +
> static int __init cmp_memblk(const void *a, const void *b)
> {
> const struct numa_memblk *ma = *(const struct numa_memblk **)a;
> @@ -1001,5 +1003,3 @@ int __init numa_fill_memblks(u64 start, u64 end)
> }
> return 0;
> }
> -
> -#endif
> diff --git a/drivers/acpi/numa/srat.c b/drivers/acpi/numa/srat.c
> index e45e64993c50..3b09fd39eeb4 100644
> --- a/drivers/acpi/numa/srat.c
> +++ b/drivers/acpi/numa/srat.c
> @@ -208,6 +208,11 @@ int __init srat_disabled(void)
> return acpi_numa < 0;
> }
>
> +__weak int __init numa_fill_memblks(u64 start, u64 end)
> +{
> + return NUMA_NO_MEMBLK;
> +}
> +
> #if defined(CONFIG_X86) || defined(CONFIG_ARM64) || defined(CONFIG_LOONGARCH)
> /*
> * Callback for SLIT parsing. pxm_to_node() returns NUMA_NO_NODE for
> diff --git a/include/linux/numa.h b/include/linux/numa.h
> index 915033a75731..1d43371fafd2 100644
> --- a/include/linux/numa.h
> +++ b/include/linux/numa.h
> @@ -36,12 +36,7 @@ int memory_add_physaddr_to_nid(u64 start);
> int phys_to_target_node(u64 start);
> #endif
>
> -#ifndef numa_fill_memblks
> -static inline int __init numa_fill_memblks(u64 start, u64 end)
> -{
> - return NUMA_NO_MEMBLK;
> -}
> -#endif
> +int numa_fill_memblks(u64 start, u64 end);
>
> #else /* !CONFIG_NUMA */
> static inline int numa_nearest_node(int node, unsigned int state)
> --
> 2.39.2
>

2024-04-30 16:23:34

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [PATCH v6 7/7] ACPI/NUMA: Print CXL Early Discovery Table (CEDT)

On Tue, 30 Apr 2024 11:22:00 +0200
Robert Richter <[email protected]> wrote:

> The CEDT contains similar entries as the SRAT. For diagnostic reasons
> print the CEDT same style as the SRAT.
>
> Signed-off-by: Robert Richter <[email protected]>
I'm fairly sure the interleave ways conversion is wrong.
Otherwise all trivial stuff.

Jonathan

> ---
> drivers/acpi/numa/srat.c | 111 +++++++++++++++++++++++++++++++++++++++
> 1 file changed, 111 insertions(+)
>
> diff --git a/drivers/acpi/numa/srat.c b/drivers/acpi/numa/srat.c
> index 34ecf2dc912f..fa21d4d5fccf 100644
> --- a/drivers/acpi/numa/srat.c
> +++ b/drivers/acpi/numa/srat.c
> @@ -320,6 +320,114 @@ acpi_parse_memory_affinity(union acpi_subtable_headers *header,
> return 0;
> }
>
> +static int __init
> +__acpi_table_print_cedt_entry(union acpi_subtable_headers *__header,
> + void *arg, const unsigned long table_end)
> +{
> + struct acpi_cedt_header *header = (struct acpi_cedt_header *)__header;
> +
> + switch (header->type) {
> + case ACPI_CEDT_TYPE_CHBS:
> + {
> + struct acpi_cedt_chbs *p =
> + (struct acpi_cedt_chbs *)header;
> +
> + if (header->length < sizeof(struct acpi_cedt_chbs)) {
> + pr_warn("CEDT: unsupported CHBS entry: size %d\n",
> + header->length);
> + break;

Might as well return.

> + }
> +
> + pr_debug("CEDT: CHBS (0x%llx length 0x%llx uid %lu) %s (%d)\n",
> + (unsigned long long)p->base,
> + (unsigned long long)p->length,

The printk docs https://docs.kernel.org/core-api/printk-formats.html
suggest you shouldn't need the casts though I appreciate other functions in here
are doing this.

> + (unsigned long)p->uid,
> + (p->cxl_version == ACPI_CEDT_CHBS_VERSION_CXL11) ?
> + "cxl11" :
> + (p->cxl_version == ACPI_CEDT_CHBS_VERSION_CXL20) ?
> + "cxl20" :
> + "unsupported version",

That seems harsh. Like all ACPI tables, these should be backwards compatible.
So not so much unsupported as "newer version". Breakage happens, but it is rare
and for the rest of the kernel I don' think we check this.

Also can we switch to the 3.1 spec terms. RCH etc. Though the term for 2.0+ in the
table definition for CHBS is the nasty:
"Host Bridge that is associated with one or more CXL root ports."

> + p->cxl_version);
> + }
> + break;

Trivial but I love early returns as they tend to avoid lots of scrolling to see where
the break goes and it is unlikely there will ever be anything to do after this.

> + case ACPI_CEDT_TYPE_CFMWS:
> + {
> + struct acpi_cedt_cfmws *p =
> + (struct acpi_cedt_cfmws *)header;
> + int eiw_to_ways[] = {1, 2, 4, 8, 16, 3, 6, 12};
> + int targets = -1;
> +
> + if (header->length < sizeof(struct acpi_cedt_cfmws)) {
> + pr_warn("CEDT: unsupported CFMWS entry: size %d\n",
> + header->length);
> + break;

Might as well return.

> + }
> +
> + if (p->interleave_ways < ARRAY_SIZE(eiw_to_ways))
> + targets = eiw_to_ways[p->interleave_ways];

That looks wrong for 3, 6, 12 as index is 0x8, 0x9, 0xA not 5 6 7
Don't we have a function to decode this somewhere than can be reused?

> + if (header->length < struct_size(
> + p, interleave_targets, targets))
> + targets = -1;
> +
> + pr_debug("CEDT: CFMWS (0x%llx length 0x%llx) with %d target%s",
> + (unsigned long long)p->base_hpa,
> + (unsigned long long)p->window_size,
> + targets, targets > 1 ? "s" : "");
> + for (int i = 0; i < targets; i++)
> + pr_cont("%s%lu", i ? ", " : " (",
> + (unsigned long)p->interleave_targets[i]);
> + pr_cont("%s%s%s%s%s%s\n",
> + targets > 0 ? ")" : "",
> + (p->restrictions & ACPI_CEDT_CFMWS_RESTRICT_TYPE2) ?
> + " type2" : "",
> + (p->restrictions & ACPI_CEDT_CFMWS_RESTRICT_TYPE3) ?
> + " type3" : "",
> + (p->restrictions & ACPI_CEDT_CFMWS_RESTRICT_VOLATILE) ?
> + " volatile" : "",
> + (p->restrictions & ACPI_CEDT_CFMWS_RESTRICT_PMEM) ?
> + " pmem" : "",
> + (p->restrictions & ACPI_CEDT_CFMWS_RESTRICT_FIXED) ?
> + " fixed" : "");
> + }
> + break;
return

> + case ACPI_CEDT_TYPE_CXIMS:
> + {
> + struct acpi_cedt_cxims *p =
> + (struct acpi_cedt_cxims *)header;
> +
> + if (header->length < sizeof(struct acpi_cedt_cxims)) {
> + pr_warn("CEDT: unsupported CXIMS entry: size %d\n",
> + header->length);
> + break;
return
> + }
> +
> + pr_debug("CEDT: CXIMS (hbig %u nr_xormaps %u)\n",
> + (unsigned int)p->hbig,
> + (unsigned int)p->nr_xormaps);
> + }
> + break;
return
> + default:
> + pr_warn("CEDT: Found unsupported entry (type = 0x%x)\n",
> + header->type);
> + break;
return
> + }
> +
> + return 0;
> +}
> +
> +static void __init acpi_table_print_cedt_entry(enum acpi_cedt_type id)
> +{
> + acpi_table_parse_cedt(id, __acpi_table_print_cedt_entry, NULL);
> +}
> +
> +static void __init acpi_table_print_cedt(void)
> +{
> + /* Print only implemented CEDT types */
> + acpi_table_print_cedt_entry(ACPI_CEDT_TYPE_CHBS);
> + acpi_table_print_cedt_entry(ACPI_CEDT_TYPE_CFMWS);
> + acpi_table_print_cedt_entry(ACPI_CEDT_TYPE_CXIMS);
> +}
> +
> static int __init acpi_parse_cfmws(union acpi_subtable_headers *header,
> void *arg, const unsigned long table_end)
> {
> @@ -518,6 +626,9 @@ int __init acpi_numa_init(void)
> /* SLIT: System Locality Information Table */
> acpi_table_parse(ACPI_SIG_SLIT, acpi_parse_slit);
>
> + /* CEDT: CXL Early Discovery Table */
> + acpi_table_print_cedt();
> +
> /*
> * CXL Fixed Memory Window Structures (CFMWS) must be parsed
> * after the SRAT. Create NUMA Nodes for CXL memory ranges that


2024-04-30 16:23:53

by Alison Schofield

[permalink] [raw]
Subject: Re: [PATCH v6 7/7] ACPI/NUMA: Print CXL Early Discovery Table (CEDT)

On Tue, Apr 30, 2024 at 11:22:00AM +0200, Robert Richter wrote:
> The CEDT contains similar entries as the SRAT. For diagnostic reasons
> print the CEDT same style as the SRAT.


Reviewed-by: Alison Schofield <[email protected]>

>
> Signed-off-by: Robert Richter <[email protected]>
> ---
> drivers/acpi/numa/srat.c | 111 +++++++++++++++++++++++++++++++++++++++
> 1 file changed, 111 insertions(+)
>
> diff --git a/drivers/acpi/numa/srat.c b/drivers/acpi/numa/srat.c
> index 34ecf2dc912f..fa21d4d5fccf 100644
> --- a/drivers/acpi/numa/srat.c
> +++ b/drivers/acpi/numa/srat.c
> @@ -320,6 +320,114 @@ acpi_parse_memory_affinity(union acpi_subtable_headers *header,
> return 0;
> }
>
> +static int __init
> +__acpi_table_print_cedt_entry(union acpi_subtable_headers *__header,
> + void *arg, const unsigned long table_end)
> +{
> + struct acpi_cedt_header *header = (struct acpi_cedt_header *)__header;
> +
> + switch (header->type) {
> + case ACPI_CEDT_TYPE_CHBS:
> + {
> + struct acpi_cedt_chbs *p =
> + (struct acpi_cedt_chbs *)header;
> +
> + if (header->length < sizeof(struct acpi_cedt_chbs)) {
> + pr_warn("CEDT: unsupported CHBS entry: size %d\n",
> + header->length);
> + break;
> + }
> +
> + pr_debug("CEDT: CHBS (0x%llx length 0x%llx uid %lu) %s (%d)\n",
> + (unsigned long long)p->base,
> + (unsigned long long)p->length,
> + (unsigned long)p->uid,
> + (p->cxl_version == ACPI_CEDT_CHBS_VERSION_CXL11) ?
> + "cxl11" :
> + (p->cxl_version == ACPI_CEDT_CHBS_VERSION_CXL20) ?
> + "cxl20" :
> + "unsupported version",
> + p->cxl_version);
> + }
> + break;
> + case ACPI_CEDT_TYPE_CFMWS:
> + {
> + struct acpi_cedt_cfmws *p =
> + (struct acpi_cedt_cfmws *)header;
> + int eiw_to_ways[] = {1, 2, 4, 8, 16, 3, 6, 12};
> + int targets = -1;
> +
> + if (header->length < sizeof(struct acpi_cedt_cfmws)) {
> + pr_warn("CEDT: unsupported CFMWS entry: size %d\n",
> + header->length);
> + break;
> + }
> +
> + if (p->interleave_ways < ARRAY_SIZE(eiw_to_ways))
> + targets = eiw_to_ways[p->interleave_ways];
> + if (header->length < struct_size(
> + p, interleave_targets, targets))
> + targets = -1;
> +
> + pr_debug("CEDT: CFMWS (0x%llx length 0x%llx) with %d target%s",
> + (unsigned long long)p->base_hpa,
> + (unsigned long long)p->window_size,
> + targets, targets > 1 ? "s" : "");
> + for (int i = 0; i < targets; i++)
> + pr_cont("%s%lu", i ? ", " : " (",
> + (unsigned long)p->interleave_targets[i]);
> + pr_cont("%s%s%s%s%s%s\n",
> + targets > 0 ? ")" : "",
> + (p->restrictions & ACPI_CEDT_CFMWS_RESTRICT_TYPE2) ?
> + " type2" : "",
> + (p->restrictions & ACPI_CEDT_CFMWS_RESTRICT_TYPE3) ?
> + " type3" : "",
> + (p->restrictions & ACPI_CEDT_CFMWS_RESTRICT_VOLATILE) ?
> + " volatile" : "",
> + (p->restrictions & ACPI_CEDT_CFMWS_RESTRICT_PMEM) ?
> + " pmem" : "",
> + (p->restrictions & ACPI_CEDT_CFMWS_RESTRICT_FIXED) ?
> + " fixed" : "");
> + }
> + break;
> + case ACPI_CEDT_TYPE_CXIMS:
> + {
> + struct acpi_cedt_cxims *p =
> + (struct acpi_cedt_cxims *)header;
> +
> + if (header->length < sizeof(struct acpi_cedt_cxims)) {
> + pr_warn("CEDT: unsupported CXIMS entry: size %d\n",
> + header->length);
> + break;
> + }
> +
> + pr_debug("CEDT: CXIMS (hbig %u nr_xormaps %u)\n",
> + (unsigned int)p->hbig,
> + (unsigned int)p->nr_xormaps);
> + }
> + break;
> + default:
> + pr_warn("CEDT: Found unsupported entry (type = 0x%x)\n",
> + header->type);
> + break;
> + }
> +
> + return 0;
> +}
> +
> +static void __init acpi_table_print_cedt_entry(enum acpi_cedt_type id)
> +{
> + acpi_table_parse_cedt(id, __acpi_table_print_cedt_entry, NULL);
> +}
> +
> +static void __init acpi_table_print_cedt(void)
> +{
> + /* Print only implemented CEDT types */
> + acpi_table_print_cedt_entry(ACPI_CEDT_TYPE_CHBS);
> + acpi_table_print_cedt_entry(ACPI_CEDT_TYPE_CFMWS);
> + acpi_table_print_cedt_entry(ACPI_CEDT_TYPE_CXIMS);
> +}
> +
> static int __init acpi_parse_cfmws(union acpi_subtable_headers *header,
> void *arg, const unsigned long table_end)
> {
> @@ -518,6 +626,9 @@ int __init acpi_numa_init(void)
> /* SLIT: System Locality Information Table */
> acpi_table_parse(ACPI_SIG_SLIT, acpi_parse_slit);
>
> + /* CEDT: CXL Early Discovery Table */
> + acpi_table_print_cedt();
> +
> /*
> * CXL Fixed Memory Window Structures (CFMWS) must be parsed
> * after the SRAT. Create NUMA Nodes for CXL memory ranges that
> --
> 2.39.2
>

2024-04-30 16:42:25

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH v6 1/7] x86/numa: Fix SRAT lookup of CFMWS ranges with numa_fill_memblks()

Robert Richter wrote:
> For configurations that have the kconfig option NUMA_KEEP_MEMINFO
> disabled numa_fill_memblks() only returns with NUMA_NO_MEMBLK (-1).
> SRAT lookup fails then because an existing SRAT memory range cannot be
> found for a CFMWS address range. This causes the addition of a
> duplicate numa_memblk with a different node id and a subsequent page
> fault and kernel crash during boot.
>
> Fix this by making numa_fill_memblks() always available regardless of
> NUMA_KEEP_MEMINFO.
>
> The fix also removes numa_fill_memblks() from sparsemem.h using
> __weak.
>
> From Dan:
>
> """
> It just feels like numa_fill_memblks() has absolutely no business being
> defined in arch/x86/include/asm/sparsemem.h.
>
> The only use for numa_fill_memblks() is to arrange for NUMA nodes to be
> applied to memory ranges hot-onlined by the CXL driver.
>
> It belongs right next to numa_add_memblk(), and I suspect
> arch/x86/include/asm/sparsemem.h was only chosen to avoid figuring out
> what to do about the fact that linux/numa.h does not include asm/numa.h
> and that all implementations either provide numa_add_memblk() or select
> the generic implementation.
>
> So I would prefer that this do the proper fix and get
> numa_fill_memblks() completely out of the sparsemem.h path.
>
> Something like the following which boots for me.
> """
>
> Note that the issue was initially introduced with [1]. But since
> phys_to_target_node() was originally used that returned the valid node
> 0, an additional numa_memblk was not added. Though, the node id was
> wrong too, a message is seen then in the logs:
>
> kernel/numa.c: pr_info_once("Unknown target node for memory at 0x%llx, assuming node 0\n",
>
> [1] commit fd49f99c1809 ("ACPI: NUMA: Add a node and memblk for each
> CFMWS not in SRAT")
>
> Suggested-by: Dan Williams <[email protected]>
> Link: https://lore.kernel.org/all/[email protected]/
> Fixes: 8f1004679987 ("ACPI/NUMA: Apply SRAT proximity domain to entire CFMWS window")
> Cc: Derick Marks <[email protected]>
> Cc: Dan Williams <[email protected]>
> Cc: Alison Schofield <[email protected]>
> Signed-off-by: Robert Richter <[email protected]>

Looks good,

Reviewed-by: Dan Williams <[email protected]>

2024-04-30 17:06:58

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH v6 5/7] ACPI/NUMA: Return memblk modification state from numa_fill_memblks()

Robert Richter wrote:
> When registering a memory range a possibly overlapping memory block
> will be extended instead of creating a new one. If both ranges exactly
> overlap, the blocks remain unchanged and are just reused. The
> information if a memblock was extended is useful for diagnostics.

What diagnostic flow is this useful for?

I feel like this post-hoc debug prints for problems we never expect to
have again, or is there an enduring need for this?

2024-04-30 17:14:18

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH v6 6/7] ACPI/NUMA: Add log messages for memory ranges found in CEDT

Robert Richter wrote:
> Adding a pr_info() when successfully adding a CFMWS memory range.
>
> Suggested-by: Alison Schofield <[email protected]>
> Signed-off-by: Robert Richter <[email protected]>
> ---
> drivers/acpi/numa/srat.c | 11 ++++++++++-
> 1 file changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/acpi/numa/srat.c b/drivers/acpi/numa/srat.c
> index 76b39a6d3aef..34ecf2dc912f 100644
> --- a/drivers/acpi/numa/srat.c
> +++ b/drivers/acpi/numa/srat.c
> @@ -339,8 +339,12 @@ static int __init acpi_parse_cfmws(union acpi_subtable_headers *header,
> * window.
> */
> modified = numa_fill_memblks(start, end);
> - if (modified != NUMA_NO_MEMBLK)
> + if (modified != NUMA_NO_MEMBLK) {
> + if (modified)
> + pr_info("CEDT: memblk extended [mem %#010Lx-%#010Lx]\n",
> + (unsigned long long) start, (unsigned long long) end - 1);

This one still feels too verbose to me, can it not be gleaned from other
startup messages?

2024-05-02 12:00:25

by Robert Richter

[permalink] [raw]
Subject: Re: [PATCH v6 1/7] x86/numa: Fix SRAT lookup of CFMWS ranges with numa_fill_memblks()

On 30.04.24 15:48:56, Jonathan Cameron wrote:
> On Tue, 30 Apr 2024 11:21:54 +0200
> Robert Richter <[email protected]> wrote:
>
> > For configurations that have the kconfig option NUMA_KEEP_MEMINFO
> > disabled numa_fill_memblks() only returns with NUMA_NO_MEMBLK (-1).
> > SRAT lookup fails then because an existing SRAT memory range cannot be
> > found for a CFMWS address range. This causes the addition of a
> > duplicate numa_memblk with a different node id and a subsequent page
> > fault and kernel crash during boot.
> >
> > Fix this by making numa_fill_memblks() always available regardless of
> > NUMA_KEEP_MEMINFO.
> >
> > The fix also removes numa_fill_memblks() from sparsemem.h using
> > __weak.
> >
> > From Dan:
> >
> > """
> > It just feels like numa_fill_memblks() has absolutely no business being
> > defined in arch/x86/include/asm/sparsemem.h.
> >
> > The only use for numa_fill_memblks() is to arrange for NUMA nodes to be
> > applied to memory ranges hot-onlined by the CXL driver.
> >
> > It belongs right next to numa_add_memblk(), and I suspect
> > arch/x86/include/asm/sparsemem.h was only chosen to avoid figuring out
> > what to do about the fact that linux/numa.h does not include asm/numa.h
> > and that all implementations either provide numa_add_memblk() or select
> > the generic implementation.
> >
> > So I would prefer that this do the proper fix and get
> > numa_fill_memblks() completely out of the sparsemem.h path.
> >
> > Something like the following which boots for me.
> > """
> >
> > Note that the issue was initially introduced with [1]. But since
> > phys_to_target_node() was originally used that returned the valid node
> > 0, an additional numa_memblk was not added. Though, the node id was
> > wrong too, a message is seen then in the logs:
> >
> > kernel/numa.c: pr_info_once("Unknown target node for memory at 0x%llx, assuming node 0\n",
> >
> > [1] commit fd49f99c1809 ("ACPI: NUMA: Add a node and memblk for each
> > CFMWS not in SRAT")
> >
> > Suggested-by: Dan Williams <[email protected]>
> > Link: https://lore.kernel.org/all/[email protected]/
> > Fixes: 8f1004679987 ("ACPI/NUMA: Apply SRAT proximity domain to entire CFMWS window")
> > Cc: Derick Marks <[email protected]>
> > Cc: Dan Williams <[email protected]>
> > Cc: Alison Schofield <[email protected]>
> > Signed-off-by: Robert Richter <[email protected]>
>
> Whilst I'm not particularly keen on an arch specific solution for this
> and the stub is effectively pointless beyond making the build work, I guess
> this works well enough for now.
>
> Reviewed-by: Jonathan Cameron <[email protected]>
>
> I was aiming to post the ARM64 handling this cycle but it hasn't quite happened yet :(
> Maybe we can look at whether there is a better level share at than
> the whole function once that is done.

Thanks for review.

It seems better to change x86 to use the generic implementation of
numa_add_memblk() in drivers/base/arch_numa.c. That already contains
code to deal with and merge overlapping blocks, it also checks memory
attributes. But that is not scope of this patch.

-Robert

2024-05-02 12:11:31

by Robert Richter

[permalink] [raw]
Subject: Re: [PATCH v6 1/7] x86/numa: Fix SRAT lookup of CFMWS ranges with numa_fill_memblks()

On 30.04.24 09:16:56, Alison Schofield wrote:
> On Tue, Apr 30, 2024 at 11:21:54AM +0200, Robert Richter wrote:
> > For configurations that have the kconfig option NUMA_KEEP_MEMINFO
> > disabled numa_fill_memblks() only returns with NUMA_NO_MEMBLK (-1).
> > SRAT lookup fails then because an existing SRAT memory range cannot be
> > found for a CFMWS address range. This causes the addition of a
> > duplicate numa_memblk with a different node id and a subsequent page
> > fault and kernel crash during boot.
> >
> > Fix this by making numa_fill_memblks() always available regardless of
> > NUMA_KEEP_MEMINFO.
> >
> > The fix also removes numa_fill_memblks() from sparsemem.h using
> > __weak.
> >
> > From Dan:
> >
> > """
> > It just feels like numa_fill_memblks() has absolutely no business being
> > defined in arch/x86/include/asm/sparsemem.h.
> >
> > The only use for numa_fill_memblks() is to arrange for NUMA nodes to be
> > applied to memory ranges hot-onlined by the CXL driver.
> >
> > It belongs right next to numa_add_memblk(), and I suspect
> > arch/x86/include/asm/sparsemem.h was only chosen to avoid figuring out
> > what to do about the fact that linux/numa.h does not include asm/numa.h
> > and that all implementations either provide numa_add_memblk() or select
> > the generic implementation.
> >
> > So I would prefer that this do the proper fix and get
> > numa_fill_memblks() completely out of the sparsemem.h path.
> >
> > Something like the following which boots for me.
> > """
> >
> > Note that the issue was initially introduced with [1]. But since
> > phys_to_target_node() was originally used that returned the valid node
> > 0, an additional numa_memblk was not added. Though, the node id was
> > wrong too, a message is seen then in the logs:
> >
> > kernel/numa.c: pr_info_once("Unknown target node for memory at 0x%llx, assuming node 0\n",
> >
> > [1] commit fd49f99c1809 ("ACPI: NUMA: Add a node and memblk for each
> > CFMWS not in SRAT")
> >
>
> For the commit log above -
> Perhaps the Dan quote can be reduced to a note about the implementation
> choice. Folks can look up the Lore thread if history is needed.
>
> For the code -
> Reviewed-by: Alison Schofield <[email protected]>

Adjusted description, thanks.

-Robert

2024-05-02 12:45:44

by Robert Richter

[permalink] [raw]
Subject: Re: [PATCH v6 5/7] ACPI/NUMA: Return memblk modification state from numa_fill_memblks()

On 30.04.24 10:05:39, Dan Williams wrote:
> Robert Richter wrote:
> > When registering a memory range a possibly overlapping memory block
> > will be extended instead of creating a new one. If both ranges exactly
> > overlap, the blocks remain unchanged and are just reused. The
> > information if a memblock was extended is useful for diagnostics.
>
> What diagnostic flow is this useful for?
>
> I feel like this post-hoc debug prints for problems we never expect to
> have again, or is there an enduring need for this?

As we are already targeting -rc7, I am going to drop the printout
patches to not block the first patches in this series and will post
them again after the next merge window.

The SRAT logging is very useful to get a picture of the firmware
provided mem blocks. For NUMA debugging these are very helpful, esp.
when working with unmodified generic or distribution kernels. As CFMWS
entries add additional similar information as SRAT, that information
is no longer complete on systems with a CEDT.

The patches just enable the same level of logging for CFMWS as for
SRAT which just adds a single line (info level) per CFMWS entry (in
total just a few). The table printouts are at debug level already. Of
course, there are always other ways to get this information from, but
a grep of dmesg makes things very easy to check things.

Thanks,

-Robert

2024-05-02 12:54:10

by Robert Richter

[permalink] [raw]
Subject: Re: [PATCH v6 7/7] ACPI/NUMA: Print CXL Early Discovery Table (CEDT)

On 30.04.24 17:22:31, Jonathan Cameron wrote:
> On Tue, 30 Apr 2024 11:22:00 +0200
> Robert Richter <[email protected]> wrote:
>
> > The CEDT contains similar entries as the SRAT. For diagnostic reasons
> > print the CEDT same style as the SRAT.
> >
> > Signed-off-by: Robert Richter <[email protected]>
> I'm fairly sure the interleave ways conversion is wrong.
> Otherwise all trivial stuff.

Yes, that's broken.

I will come up with an update of the last 3 patches of the series
after the merge window. Let's see we can also solve the arch dependend
issue we are facing in patch #6.

Thanks,

-Robert

2024-05-02 16:45:54

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [PATCH v6 1/7] x86/numa: Fix SRAT lookup of CFMWS ranges with numa_fill_memblks()

On Thu, 2 May 2024 13:59:52 +0200
Robert Richter <[email protected]> wrote:

> On 30.04.24 15:48:56, Jonathan Cameron wrote:
> > On Tue, 30 Apr 2024 11:21:54 +0200
> > Robert Richter <[email protected]> wrote:
> >
> > > For configurations that have the kconfig option NUMA_KEEP_MEMINFO
> > > disabled numa_fill_memblks() only returns with NUMA_NO_MEMBLK (-1).
> > > SRAT lookup fails then because an existing SRAT memory range cannot be
> > > found for a CFMWS address range. This causes the addition of a
> > > duplicate numa_memblk with a different node id and a subsequent page
> > > fault and kernel crash during boot.
> > >
> > > Fix this by making numa_fill_memblks() always available regardless of
> > > NUMA_KEEP_MEMINFO.
> > >
> > > The fix also removes numa_fill_memblks() from sparsemem.h using
> > > __weak.
> > >
> > > From Dan:
> > >
> > > """
> > > It just feels like numa_fill_memblks() has absolutely no business being
> > > defined in arch/x86/include/asm/sparsemem.h.
> > >
> > > The only use for numa_fill_memblks() is to arrange for NUMA nodes to be
> > > applied to memory ranges hot-onlined by the CXL driver.
> > >
> > > It belongs right next to numa_add_memblk(), and I suspect
> > > arch/x86/include/asm/sparsemem.h was only chosen to avoid figuring out
> > > what to do about the fact that linux/numa.h does not include asm/numa.h
> > > and that all implementations either provide numa_add_memblk() or select
> > > the generic implementation.
> > >
> > > So I would prefer that this do the proper fix and get
> > > numa_fill_memblks() completely out of the sparsemem.h path.
> > >
> > > Something like the following which boots for me.
> > > """
> > >
> > > Note that the issue was initially introduced with [1]. But since
> > > phys_to_target_node() was originally used that returned the valid node
> > > 0, an additional numa_memblk was not added. Though, the node id was
> > > wrong too, a message is seen then in the logs:
> > >
> > > kernel/numa.c: pr_info_once("Unknown target node for memory at 0x%llx, assuming node 0\n",
> > >
> > > [1] commit fd49f99c1809 ("ACPI: NUMA: Add a node and memblk for each
> > > CFMWS not in SRAT")
> > >
> > > Suggested-by: Dan Williams <[email protected]>
> > > Link: https://lore.kernel.org/all/[email protected]/
> > > Fixes: 8f1004679987 ("ACPI/NUMA: Apply SRAT proximity domain to entire CFMWS window")
> > > Cc: Derick Marks <[email protected]>
> > > Cc: Dan Williams <[email protected]>
> > > Cc: Alison Schofield <[email protected]>
> > > Signed-off-by: Robert Richter <[email protected]>
> >
> > Whilst I'm not particularly keen on an arch specific solution for this
> > and the stub is effectively pointless beyond making the build work, I guess
> > this works well enough for now.
> >
> > Reviewed-by: Jonathan Cameron <[email protected]>
> >
> > I was aiming to post the ARM64 handling this cycle but it hasn't quite happened yet :(
> > Maybe we can look at whether there is a better level share at than
> > the whole function once that is done.
>
> Thanks for review.
>
> It seems better to change x86 to use the generic implementation of
> numa_add_memblk() in drivers/base/arch_numa.c. That already contains
> code to deal with and merge overlapping blocks, it also checks memory
> attributes. But that is not scope of this patch.
>
There is some history that Dan pointed me at a while back... Maybe this one is fine
but in general people have tried and given up on unifying x86 memblock handling
with the generic version :(
https://lore.kernel.org/linux-mm/159457121480.754248.17292511837648775358.stgit@dwillia2-desk3.amr.corp.intel.com/

> -Robert