This series attempts to move the ARM64 numa implementation to common
code so that RISC-V can leverage that as well instead of reimplementing
it again.
RISC-V specific bits are based on initial work done by Greentime Hu [1] but
modified to reuse the common implementation to avoid duplication.
[1] https://lkml.org/lkml/2020/1/10/233
This series has been tested on qemu with numa enabled for both RISC-V & ARM64.
It would be great if somebody can test it on numa capable ARM64 hardware platforms.
This patch series doesn't modify the maintainers list for the common code (arch_numa)
as I am not sure if somebody from ARM64 community or Greg should take up the
maintainership. Ganapatrao was the original author of the arm64 version.
I would be happy to update that in the next revision once it is decided.
# numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3
node 0 size: 486 MB
node 0 free: 470 MB
node 1 cpus: 4 5 6 7
node 1 size: 424 MB
node 1 free: 408 MB
node distances:
node 0 1
0: 10 20
1: 20 10
# numactl -show
policy: default
preferred node: current
physcpubind: 0 1 2 3 4 5 6 7
cpubind: 0 1
nodebind: 0 1
membind: 0 1
The patches are also available at
https://github.com/atishp04/linux/tree/5.10_numa_unified_v4
For RISC-V, the following qemu series is a pre-requisite(already available in upstream)
https://patchwork.kernel.org/project/qemu-devel/list/?series=303313
Testing:
RISC-V:
Tested in Qemu and 2 socket OmniXtend FPGA.
ARM64:
2 socket kunpeng920 (4 nodes around 250G a node)
Tested-by: Jonathan Cameron <[email protected]>
There may be some minor conflicts with Mike's cleanup series [2] depending on the
order in which these two series are being accepted. I can rebase on top his series
if required.
[2] https://lkml.org/lkml/2020/8/18/754
Changes from v3->v4:
1. Removed redundant duplicate header.
2. Added Reviewed-by tags.
Changes from v2->v3:
1. Added Acked-by/Reviewed-by tags.
2. Replaced asm/acpi.h with linux/acpi.h
3. Defined arch_acpi_numa_init as static.
Changes from v1->v2:
1. Replaced ARM64 specific compile time protection with ACPI specific ones.
2. Dropped common pcibus_to_node changes. Added required changes in RISC-V.
3. Fixed few typos.
Atish Patra (4):
numa: Move numa implementation to common code
arm64, numa: Change the numa init functions name to be generic
riscv: Separate memory init from paging init
riscv: Add numa support for riscv64 platform
Greentime Hu (1):
riscv: Add support pte_protnone and pmd_protnone if
CONFIG_NUMA_BALANCING
arch/arm64/Kconfig | 1 +
arch/arm64/include/asm/numa.h | 45 +----------------
arch/arm64/kernel/acpi_numa.c | 13 -----
arch/arm64/mm/Makefile | 1 -
arch/arm64/mm/init.c | 4 +-
arch/riscv/Kconfig | 31 +++++++++++-
arch/riscv/include/asm/mmzone.h | 13 +++++
arch/riscv/include/asm/numa.h | 8 +++
arch/riscv/include/asm/pci.h | 14 ++++++
arch/riscv/include/asm/pgtable.h | 21 ++++++++
arch/riscv/kernel/setup.c | 11 ++++-
arch/riscv/kernel/smpboot.c | 12 ++++-
arch/riscv/mm/init.c | 10 +++-
drivers/base/Kconfig | 6 +++
drivers/base/Makefile | 1 +
.../mm/numa.c => drivers/base/arch_numa.c | 30 ++++++++++--
include/asm-generic/numa.h | 49 +++++++++++++++++++
17 files changed, 199 insertions(+), 71 deletions(-)
create mode 100644 arch/riscv/include/asm/mmzone.h
create mode 100644 arch/riscv/include/asm/numa.h
rename arch/arm64/mm/numa.c => drivers/base/arch_numa.c (95%)
create mode 100644 include/asm-generic/numa.h
--
2.25.1
From: Greentime Hu <[email protected]>
These two functions are used to distinguish between PROT_NONENUMA
protections and hinting fault protections.
Signed-off-by: Greentime Hu <[email protected]>
---
arch/riscv/include/asm/pgtable.h | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index 515b42f98d34..2751110675e6 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -183,6 +183,11 @@ static inline unsigned long pmd_page_vaddr(pmd_t pmd)
return (unsigned long)pfn_to_virt(pmd_val(pmd) >> _PAGE_PFN_SHIFT);
}
+static inline pte_t pmd_pte(pmd_t pmd)
+{
+ return __pte(pmd_val(pmd));
+}
+
/* Yields the page frame number (PFN) of a page table entry */
static inline unsigned long pte_pfn(pte_t pte)
{
@@ -286,6 +291,21 @@ static inline pte_t pte_mkhuge(pte_t pte)
return pte;
}
+#ifdef CONFIG_NUMA_BALANCING
+/*
+ * See the comment in include/asm-generic/pgtable.h
+ */
+static inline int pte_protnone(pte_t pte)
+{
+ return (pte_val(pte) & (_PAGE_PRESENT | _PAGE_PROT_NONE)) == _PAGE_PROT_NONE;
+}
+
+static inline int pmd_protnone(pmd_t pmd)
+{
+ return pte_protnone(pmd_pte(pmd));
+}
+#endif
+
/* Modify page protection bits */
static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
{
--
2.25.1
Currently, we perform some memory init functions in paging init. But,
that will be an issue for NUMA support where DT needs to be flattened
before numa initialization and memblock_present can only be called
after numa initialization.
Move memory initialization related functions to a separate function.
Signed-off-by: Atish Patra <[email protected]>
Reviewed-by: Greentime Hu <[email protected]>
---
arch/riscv/include/asm/pgtable.h | 1 +
arch/riscv/kernel/setup.c | 1 +
arch/riscv/mm/init.c | 6 +++++-
3 files changed, 7 insertions(+), 1 deletion(-)
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index eaea1f717010..515b42f98d34 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -466,6 +466,7 @@ static inline void __kernel_map_pages(struct page *page, int numpages, int enabl
extern void *dtb_early_va;
void setup_bootmem(void);
void paging_init(void);
+void misc_mem_init(void);
#define FIRST_USER_ADDRESS 0
diff --git a/arch/riscv/kernel/setup.c b/arch/riscv/kernel/setup.c
index 2c6dd329312b..07fa6d13367e 100644
--- a/arch/riscv/kernel/setup.c
+++ b/arch/riscv/kernel/setup.c
@@ -78,6 +78,7 @@ void __init setup_arch(char **cmdline_p)
#else
unflatten_device_tree();
#endif
+ misc_mem_init();
#ifdef CONFIG_SWIOTLB
swiotlb_init(1);
diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
index ed6e83871112..114c3966aadb 100644
--- a/arch/riscv/mm/init.c
+++ b/arch/riscv/mm/init.c
@@ -565,8 +565,12 @@ static void __init resource_init(void)
void __init paging_init(void)
{
setup_vm_final();
- sparse_init();
setup_zero_page();
+}
+
+void __init misc_mem_init(void)
+{
+ sparse_init();
zone_sizes_init();
resource_init();
}
--
2.25.1
As we are using generic numa implementation code, modify the acpi & numa
init functions name to indicate that generic implementation.
Signed-off-by: Atish Patra <[email protected]>
Reviewed-by: Jonathan Cameron <[email protected]>
Tested-by: Jonathan Cameron <[email protected]>
---
arch/arm64/kernel/acpi_numa.c | 13 -------------
arch/arm64/mm/init.c | 4 ++--
drivers/base/arch_numa.c | 30 +++++++++++++++++++++++++-----
include/asm-generic/numa.h | 4 ++--
4 files changed, 29 insertions(+), 22 deletions(-)
diff --git a/arch/arm64/kernel/acpi_numa.c b/arch/arm64/kernel/acpi_numa.c
index 7ff800045434..96502ff92af5 100644
--- a/arch/arm64/kernel/acpi_numa.c
+++ b/arch/arm64/kernel/acpi_numa.c
@@ -117,16 +117,3 @@ void __init acpi_numa_gicc_affinity_init(struct acpi_srat_gicc_affinity *pa)
node_set(node, numa_nodes_parsed);
}
-
-int __init arm64_acpi_numa_init(void)
-{
- int ret;
-
- ret = acpi_numa_init();
- if (ret) {
- pr_info("Failed to initialise from firmware\n");
- return ret;
- }
-
- return srat_disabled() ? -EINVAL : 0;
-}
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 481d22c32a2e..93b660229e1d 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -418,10 +418,10 @@ void __init bootmem_init(void)
max_pfn = max_low_pfn = max;
min_low_pfn = min;
- arm64_numa_init();
+ arch_numa_init();
/*
- * must be done after arm64_numa_init() which calls numa_init() to
+ * must be done after arch_numa_init() which calls numa_init() to
* initialize node_online_map that gets used in hugetlb_cma_reserve()
* while allocating required CMA size across online nodes.
*/
diff --git a/drivers/base/arch_numa.c b/drivers/base/arch_numa.c
index 73f8b49d485c..74b4f2ddad70 100644
--- a/drivers/base/arch_numa.c
+++ b/drivers/base/arch_numa.c
@@ -13,7 +13,6 @@
#include <linux/module.h>
#include <linux/of.h>
-#include <asm/acpi.h>
#include <asm/sections.h>
struct pglist_data *node_data[MAX_NUMNODES] __read_mostly;
@@ -444,16 +443,37 @@ static int __init dummy_numa_init(void)
return 0;
}
+#ifdef CONFIG_ACPI_NUMA
+static int __init arch_acpi_numa_init(void)
+{
+ int ret;
+
+ ret = acpi_numa_init();
+ if (ret) {
+ pr_info("Failed to initialise from firmware\n");
+ return ret;
+ }
+
+ return srat_disabled() ? -EINVAL : 0;
+}
+#else
+static int __init arch_acpi_numa_init(void)
+{
+ return -EOPNOTSUPP;
+}
+
+#endif
+
/**
- * arm64_numa_init() - Initialize NUMA
+ * arch_numa_init() - Initialize NUMA
*
* Try each configured NUMA initialization method until one succeeds. The
- * last fallback is dummy single node config encomapssing whole memory.
+ * last fallback is dummy single node config encompassing whole memory.
*/
-void __init arm64_numa_init(void)
+void __init arch_numa_init(void)
{
if (!numa_off) {
- if (!acpi_disabled && !numa_init(arm64_acpi_numa_init))
+ if (!acpi_disabled && !numa_init(arch_acpi_numa_init))
return;
if (acpi_disabled && !numa_init(of_numa_init))
return;
diff --git a/include/asm-generic/numa.h b/include/asm-generic/numa.h
index 2718d5a6ff03..e7962db4ba44 100644
--- a/include/asm-generic/numa.h
+++ b/include/asm-generic/numa.h
@@ -27,7 +27,7 @@ static inline const struct cpumask *cpumask_of_node(int node)
}
#endif
-void __init arm64_numa_init(void);
+void __init arch_numa_init(void);
int __init numa_add_memblk(int nodeid, u64 start, u64 end);
void __init numa_set_distance(int from, int to, int distance);
void __init numa_free_distance(void);
@@ -41,7 +41,7 @@ void numa_remove_cpu(unsigned int cpu);
static inline void numa_store_cpu_info(unsigned int cpu) { }
static inline void numa_add_cpu(unsigned int cpu) { }
static inline void numa_remove_cpu(unsigned int cpu) { }
-static inline void arm64_numa_init(void) { }
+static inline void arch_numa_init(void) { }
static inline void early_map_cpu_to_node(unsigned int cpu, int nid) { }
#endif /* CONFIG_NUMA */
--
2.25.1
Use the generic numa implementation to add NUMA support for RISC-V.
This is based on Greentime's patch[1] but modified to use generic NUMA
implementation and few more fixes.
[1] https://lkml.org/lkml/2020/1/10/233
Co-developed-by: Greentime Hu <[email protected]>
Signed-off-by: Greentime Hu <[email protected]>
Signed-off-by: Atish Patra <[email protected]>
---
arch/riscv/Kconfig | 31 ++++++++++++++++++++++++++++++-
arch/riscv/include/asm/mmzone.h | 13 +++++++++++++
arch/riscv/include/asm/numa.h | 8 ++++++++
arch/riscv/include/asm/pci.h | 14 ++++++++++++++
arch/riscv/kernel/setup.c | 10 ++++++++--
arch/riscv/kernel/smpboot.c | 12 +++++++++++-
arch/riscv/mm/init.c | 4 +++-
7 files changed, 87 insertions(+), 5 deletions(-)
create mode 100644 arch/riscv/include/asm/mmzone.h
create mode 100644 arch/riscv/include/asm/numa.h
diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index df18372861d8..7beb6ddb6eb1 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -137,7 +137,7 @@ config PAGE_OFFSET
default 0xffffffe000000000 if 64BIT && MAXPHYSMEM_128GB
config ARCH_FLATMEM_ENABLE
- def_bool y
+ def_bool !NUMA
config ARCH_SPARSEMEM_ENABLE
def_bool y
@@ -295,6 +295,35 @@ config TUNE_GENERIC
endchoice
+# Common NUMA Features
+config NUMA
+ bool "NUMA Memory Allocation and Scheduler Support"
+ select GENERIC_ARCH_NUMA
+ select OF_NUMA
+ select ARCH_SUPPORTS_NUMA_BALANCING
+ help
+ Enable NUMA (Non-Uniform Memory Access) support.
+
+ The kernel will try to allocate memory used by a CPU on the
+ local memory of the CPU and add some more NUMA awareness to the kernel.
+
+config NODES_SHIFT
+ int "Maximum NUMA Nodes (as a power of 2)"
+ range 1 10
+ default "2"
+ depends on NEED_MULTIPLE_NODES
+ help
+ Specify the maximum number of NUMA Nodes available on the target
+ system. Increases memory reserved to accommodate various tables.
+
+config USE_PERCPU_NUMA_NODE_ID
+ def_bool y
+ depends on NUMA
+
+config NEED_PER_CPU_EMBED_FIRST_CHUNK
+ def_bool y
+ depends on NUMA
+
config RISCV_ISA_C
bool "Emit compressed instructions when building Linux"
default y
diff --git a/arch/riscv/include/asm/mmzone.h b/arch/riscv/include/asm/mmzone.h
new file mode 100644
index 000000000000..fa17e01d9ab2
--- /dev/null
+++ b/arch/riscv/include/asm/mmzone.h
@@ -0,0 +1,13 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __ASM_MMZONE_H
+#define __ASM_MMZONE_H
+
+#ifdef CONFIG_NUMA
+
+#include <asm/numa.h>
+
+extern struct pglist_data *node_data[];
+#define NODE_DATA(nid) (node_data[(nid)])
+
+#endif /* CONFIG_NUMA */
+#endif /* __ASM_MMZONE_H */
diff --git a/arch/riscv/include/asm/numa.h b/arch/riscv/include/asm/numa.h
new file mode 100644
index 000000000000..8c8cf4297cc3
--- /dev/null
+++ b/arch/riscv/include/asm/numa.h
@@ -0,0 +1,8 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __ASM_NUMA_H
+#define __ASM_NUMA_H
+
+#include <asm/topology.h>
+#include <asm-generic/numa.h>
+
+#endif /* __ASM_NUMA_H */
diff --git a/arch/riscv/include/asm/pci.h b/arch/riscv/include/asm/pci.h
index 1c473a1bd986..658e112c3ce7 100644
--- a/arch/riscv/include/asm/pci.h
+++ b/arch/riscv/include/asm/pci.h
@@ -32,6 +32,20 @@ static inline int pci_proc_domain(struct pci_bus *bus)
/* always show the domain in /proc */
return 1;
}
+
+#ifdef CONFIG_NUMA
+
+static inline int pcibus_to_node(struct pci_bus *bus)
+{
+ return dev_to_node(&bus->dev);
+}
+#ifndef cpumask_of_pcibus
+#define cpumask_of_pcibus(bus) (pcibus_to_node(bus) == -1 ? \
+ cpu_all_mask : \
+ cpumask_of_node(pcibus_to_node(bus)))
+#endif
+#endif /* CONFIG_NUMA */
+
#endif /* CONFIG_PCI */
#endif /* _ASM_RISCV_PCI_H */
diff --git a/arch/riscv/kernel/setup.c b/arch/riscv/kernel/setup.c
index 07fa6d13367e..53a806a9cbaf 100644
--- a/arch/riscv/kernel/setup.c
+++ b/arch/riscv/kernel/setup.c
@@ -101,13 +101,19 @@ void __init setup_arch(char **cmdline_p)
static int __init topology_init(void)
{
- int i;
+ int i, ret;
+
+ for_each_online_node(i)
+ register_one_node(i);
for_each_possible_cpu(i) {
struct cpu *cpu = &per_cpu(cpu_devices, i);
cpu->hotpluggable = cpu_has_hotplug(i);
- register_cpu(cpu, i);
+ ret = register_cpu(cpu, i);
+ if (unlikely(ret))
+ pr_warn("Warning: %s: register_cpu %d failed (%d)\n",
+ __func__, i, ret);
}
return 0;
diff --git a/arch/riscv/kernel/smpboot.c b/arch/riscv/kernel/smpboot.c
index 96167d55ed98..5e276c25646f 100644
--- a/arch/riscv/kernel/smpboot.c
+++ b/arch/riscv/kernel/smpboot.c
@@ -27,6 +27,7 @@
#include <asm/cpu_ops.h>
#include <asm/irq.h>
#include <asm/mmu_context.h>
+#include <asm/numa.h>
#include <asm/tlbflush.h>
#include <asm/sections.h>
#include <asm/sbi.h>
@@ -45,13 +46,18 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
{
int cpuid;
int ret;
+ unsigned int curr_cpuid;
+
+ curr_cpuid = smp_processor_id();
+ numa_store_cpu_info(curr_cpuid);
+ numa_add_cpu(curr_cpuid);
/* This covers non-smp usecase mandated by "nosmp" option */
if (max_cpus == 0)
return;
for_each_possible_cpu(cpuid) {
- if (cpuid == smp_processor_id())
+ if (cpuid == curr_cpuid)
continue;
if (cpu_ops[cpuid]->cpu_prepare) {
ret = cpu_ops[cpuid]->cpu_prepare(cpuid);
@@ -59,6 +65,7 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
continue;
}
set_cpu_present(cpuid, true);
+ numa_store_cpu_info(cpuid);
}
}
@@ -79,6 +86,7 @@ void __init setup_smp(void)
if (hart == cpuid_to_hartid_map(0)) {
BUG_ON(found_boot_cpu);
found_boot_cpu = 1;
+ early_map_cpu_to_node(0, of_node_to_nid(dn));
continue;
}
if (cpuid >= NR_CPUS) {
@@ -88,6 +96,7 @@ void __init setup_smp(void)
}
cpuid_to_hartid_map(cpuid) = hart;
+ early_map_cpu_to_node(cpuid, of_node_to_nid(dn));
cpuid++;
}
@@ -153,6 +162,7 @@ asmlinkage __visible void smp_callin(void)
current->active_mm = mm;
notify_cpu_starting(curr_cpuid);
+ numa_add_cpu(curr_cpuid);
update_siblings_masks(curr_cpuid);
set_cpu_online(curr_cpuid, 1);
diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
index 114c3966aadb..c4046e11d264 100644
--- a/arch/riscv/mm/init.c
+++ b/arch/riscv/mm/init.c
@@ -20,6 +20,7 @@
#include <asm/soc.h>
#include <asm/io.h>
#include <asm/ptdump.h>
+#include <asm/numa.h>
#include "../kernel/head.h"
@@ -185,7 +186,6 @@ void __init setup_bootmem(void)
early_init_fdt_scan_reserved_mem();
memblock_allow_resize();
- memblock_dump_all();
for_each_memblock(memory, reg) {
unsigned long start_pfn = memblock_region_memory_base_pfn(reg);
@@ -570,9 +570,11 @@ void __init paging_init(void)
void __init misc_mem_init(void)
{
+ arch_numa_init();
sparse_init();
zone_sizes_init();
resource_init();
+ memblock_dump_all();
}
#ifdef CONFIG_SPARSEMEM_VMEMMAP
--
2.25.1
ARM64 numa implementation is generic enough that RISC-V can reuse that
implementation with very minor cosmetic changes. This will help both
ARM64 and RISC-V in terms of maintanace and feature improvement
Move the numa implementation code to common directory so that both ISAs
can reuse this. This doesn't introduce any function changes for ARM64.
Signed-off-by: Atish Patra <[email protected]>
Acked-by: Jonathan Cameron <[email protected]>
Tested-by: Jonathan Cameron <[email protected]>
---
arch/arm64/Kconfig | 1 +
arch/arm64/include/asm/numa.h | 45 +----------------
arch/arm64/mm/Makefile | 1 -
drivers/base/Kconfig | 6 +++
drivers/base/Makefile | 1 +
.../mm/numa.c => drivers/base/arch_numa.c | 0
include/asm-generic/numa.h | 49 +++++++++++++++++++
7 files changed, 58 insertions(+), 45 deletions(-)
rename arch/arm64/mm/numa.c => drivers/base/arch_numa.c (100%)
create mode 100644 include/asm-generic/numa.h
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 6d232837cbee..955a0cf75b16 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -960,6 +960,7 @@ config HOTPLUG_CPU
# Common NUMA Features
config NUMA
bool "NUMA Memory Allocation and Scheduler Support"
+ select GENERIC_ARCH_NUMA
select ACPI_NUMA if ACPI
select OF_NUMA
help
diff --git a/arch/arm64/include/asm/numa.h b/arch/arm64/include/asm/numa.h
index 626ad01e83bf..8c8cf4297cc3 100644
--- a/arch/arm64/include/asm/numa.h
+++ b/arch/arm64/include/asm/numa.h
@@ -3,49 +3,6 @@
#define __ASM_NUMA_H
#include <asm/topology.h>
-
-#ifdef CONFIG_NUMA
-
-#define NR_NODE_MEMBLKS (MAX_NUMNODES * 2)
-
-int __node_distance(int from, int to);
-#define node_distance(a, b) __node_distance(a, b)
-
-extern nodemask_t numa_nodes_parsed __initdata;
-
-extern bool numa_off;
-
-/* Mappings between node number and cpus on that node. */
-extern cpumask_var_t node_to_cpumask_map[MAX_NUMNODES];
-void numa_clear_node(unsigned int cpu);
-
-#ifdef CONFIG_DEBUG_PER_CPU_MAPS
-const struct cpumask *cpumask_of_node(int node);
-#else
-/* Returns a pointer to the cpumask of CPUs on Node 'node'. */
-static inline const struct cpumask *cpumask_of_node(int node)
-{
- return node_to_cpumask_map[node];
-}
-#endif
-
-void __init arm64_numa_init(void);
-int __init numa_add_memblk(int nodeid, u64 start, u64 end);
-void __init numa_set_distance(int from, int to, int distance);
-void __init numa_free_distance(void);
-void __init early_map_cpu_to_node(unsigned int cpu, int nid);
-void numa_store_cpu_info(unsigned int cpu);
-void numa_add_cpu(unsigned int cpu);
-void numa_remove_cpu(unsigned int cpu);
-
-#else /* CONFIG_NUMA */
-
-static inline void numa_store_cpu_info(unsigned int cpu) { }
-static inline void numa_add_cpu(unsigned int cpu) { }
-static inline void numa_remove_cpu(unsigned int cpu) { }
-static inline void arm64_numa_init(void) { }
-static inline void early_map_cpu_to_node(unsigned int cpu, int nid) { }
-
-#endif /* CONFIG_NUMA */
+#include <asm-generic/numa.h>
#endif /* __ASM_NUMA_H */
diff --git a/arch/arm64/mm/Makefile b/arch/arm64/mm/Makefile
index d91030f0ffee..928c308b044b 100644
--- a/arch/arm64/mm/Makefile
+++ b/arch/arm64/mm/Makefile
@@ -6,7 +6,6 @@ obj-y := dma-mapping.o extable.o fault.o init.o \
obj-$(CONFIG_HUGETLB_PAGE) += hugetlbpage.o
obj-$(CONFIG_PTDUMP_CORE) += dump.o
obj-$(CONFIG_PTDUMP_DEBUGFS) += ptdump_debugfs.o
-obj-$(CONFIG_NUMA) += numa.o
obj-$(CONFIG_DEBUG_VIRTUAL) += physaddr.o
KASAN_SANITIZE_physaddr.o += n
diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
index 8d7001712062..c5956c8845cc 100644
--- a/drivers/base/Kconfig
+++ b/drivers/base/Kconfig
@@ -210,4 +210,10 @@ config GENERIC_ARCH_TOPOLOGY
appropriate scaling, sysfs interface for reading capacity values at
runtime.
+config GENERIC_ARCH_NUMA
+ bool
+ help
+ Enable support for generic NUMA implementation. Currently, RISC-V
+ and ARM64 uses it.
+
endmenu
diff --git a/drivers/base/Makefile b/drivers/base/Makefile
index 157452080f3d..c3d02c644222 100644
--- a/drivers/base/Makefile
+++ b/drivers/base/Makefile
@@ -23,6 +23,7 @@ obj-$(CONFIG_PINCTRL) += pinctrl.o
obj-$(CONFIG_DEV_COREDUMP) += devcoredump.o
obj-$(CONFIG_GENERIC_MSI_IRQ_DOMAIN) += platform-msi.o
obj-$(CONFIG_GENERIC_ARCH_TOPOLOGY) += arch_topology.o
+obj-$(CONFIG_GENERIC_ARCH_NUMA) += arch_numa.o
obj-y += test/
diff --git a/arch/arm64/mm/numa.c b/drivers/base/arch_numa.c
similarity index 100%
rename from arch/arm64/mm/numa.c
rename to drivers/base/arch_numa.c
diff --git a/include/asm-generic/numa.h b/include/asm-generic/numa.h
new file mode 100644
index 000000000000..2718d5a6ff03
--- /dev/null
+++ b/include/asm-generic/numa.h
@@ -0,0 +1,49 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __ASM_GENERIC_NUMA_H
+#define __ASM_GENERIC_NUMA_H
+
+#ifdef CONFIG_NUMA
+
+#define NR_NODE_MEMBLKS (MAX_NUMNODES * 2)
+
+int __node_distance(int from, int to);
+#define node_distance(a, b) __node_distance(a, b)
+
+extern nodemask_t numa_nodes_parsed __initdata;
+
+extern bool numa_off;
+
+/* Mappings between node number and cpus on that node. */
+extern cpumask_var_t node_to_cpumask_map[MAX_NUMNODES];
+void numa_clear_node(unsigned int cpu);
+
+#ifdef CONFIG_DEBUG_PER_CPU_MAPS
+const struct cpumask *cpumask_of_node(int node);
+#else
+/* Returns a pointer to the cpumask of CPUs on Node 'node'. */
+static inline const struct cpumask *cpumask_of_node(int node)
+{
+ return node_to_cpumask_map[node];
+}
+#endif
+
+void __init arm64_numa_init(void);
+int __init numa_add_memblk(int nodeid, u64 start, u64 end);
+void __init numa_set_distance(int from, int to, int distance);
+void __init numa_free_distance(void);
+void __init early_map_cpu_to_node(unsigned int cpu, int nid);
+void numa_store_cpu_info(unsigned int cpu);
+void numa_add_cpu(unsigned int cpu);
+void numa_remove_cpu(unsigned int cpu);
+
+#else /* CONFIG_NUMA */
+
+static inline void numa_store_cpu_info(unsigned int cpu) { }
+static inline void numa_add_cpu(unsigned int cpu) { }
+static inline void numa_remove_cpu(unsigned int cpu) { }
+static inline void arm64_numa_init(void) { }
+static inline void early_map_cpu_to_node(unsigned int cpu, int nid) { }
+
+#endif /* CONFIG_NUMA */
+
+#endif /* __ASM_GENERIC_NUMA_H */
--
2.25.1
On Mon, Oct 5, 2020 at 5:18 PM Atish Patra <[email protected]> wrote:
>
> This series attempts to move the ARM64 numa implementation to common
> code so that RISC-V can leverage that as well instead of reimplementing
> it again.
>
> RISC-V specific bits are based on initial work done by Greentime Hu [1] but
> modified to reuse the common implementation to avoid duplication.
>
> [1] https://lkml.org/lkml/2020/1/10/233
>
> This series has been tested on qemu with numa enabled for both RISC-V & ARM64.
> It would be great if somebody can test it on numa capable ARM64 hardware platforms.
> This patch series doesn't modify the maintainers list for the common code (arch_numa)
> as I am not sure if somebody from ARM64 community or Greg should take up the
> maintainership. Ganapatrao was the original author of the arm64 version.
> I would be happy to update that in the next revision once it is decided.
>
> # numactl --hardware
> available: 2 nodes (0-1)
> node 0 cpus: 0 1 2 3
> node 0 size: 486 MB
> node 0 free: 470 MB
> node 1 cpus: 4 5 6 7
> node 1 size: 424 MB
> node 1 free: 408 MB
> node distances:
> node 0 1
> 0: 10 20
> 1: 20 10
> # numactl -show
> policy: default
> preferred node: current
> physcpubind: 0 1 2 3 4 5 6 7
> cpubind: 0 1
> nodebind: 0 1
> membind: 0 1
>
> The patches are also available at
> https://github.com/atishp04/linux/tree/5.10_numa_unified_v4
>
> For RISC-V, the following qemu series is a pre-requisite(already available in upstream)
> https://patchwork.kernel.org/project/qemu-devel/list/?series=303313
>
> Testing:
> RISC-V:
> Tested in Qemu and 2 socket OmniXtend FPGA.
>
> ARM64:
> 2 socket kunpeng920 (4 nodes around 250G a node)
> Tested-by: Jonathan Cameron <[email protected]>
>
> There may be some minor conflicts with Mike's cleanup series [2] depending on the
> order in which these two series are being accepted. I can rebase on top his series
> if required.
>
> [2] https://lkml.org/lkml/2020/8/18/754
>
> Changes from v3->v4:
> 1. Removed redundant duplicate header.
> 2. Added Reviewed-by tags.
>
> Changes from v2->v3:
> 1. Added Acked-by/Reviewed-by tags.
> 2. Replaced asm/acpi.h with linux/acpi.h
> 3. Defined arch_acpi_numa_init as static.
>
> Changes from v1->v2:
> 1. Replaced ARM64 specific compile time protection with ACPI specific ones.
> 2. Dropped common pcibus_to_node changes. Added required changes in RISC-V.
> 3. Fixed few typos.
>
> Atish Patra (4):
> numa: Move numa implementation to common code
> arm64, numa: Change the numa init functions name to be generic
> riscv: Separate memory init from paging init
> riscv: Add numa support for riscv64 platform
>
> Greentime Hu (1):
> riscv: Add support pte_protnone and pmd_protnone if
> CONFIG_NUMA_BALANCING
>
> arch/arm64/Kconfig | 1 +
> arch/arm64/include/asm/numa.h | 45 +----------------
> arch/arm64/kernel/acpi_numa.c | 13 -----
> arch/arm64/mm/Makefile | 1 -
> arch/arm64/mm/init.c | 4 +-
> arch/riscv/Kconfig | 31 +++++++++++-
> arch/riscv/include/asm/mmzone.h | 13 +++++
> arch/riscv/include/asm/numa.h | 8 +++
> arch/riscv/include/asm/pci.h | 14 ++++++
> arch/riscv/include/asm/pgtable.h | 21 ++++++++
> arch/riscv/kernel/setup.c | 11 ++++-
> arch/riscv/kernel/smpboot.c | 12 ++++-
> arch/riscv/mm/init.c | 10 +++-
> drivers/base/Kconfig | 6 +++
> drivers/base/Makefile | 1 +
> .../mm/numa.c => drivers/base/arch_numa.c | 30 ++++++++++--
> include/asm-generic/numa.h | 49 +++++++++++++++++++
> 17 files changed, 199 insertions(+), 71 deletions(-)
> create mode 100644 arch/riscv/include/asm/mmzone.h
> create mode 100644 arch/riscv/include/asm/numa.h
> rename arch/arm64/mm/numa.c => drivers/base/arch_numa.c (95%)
> create mode 100644 include/asm-generic/numa.h
>
> --
> 2.25.1
>
>
> _______________________________________________
> linux-riscv mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-riscv
Ping ?
--
Regards,
Atish
On Tue, 13 Oct 2020 13:19:35 PDT (-0700), [email protected] wrote:
> On Mon, Oct 5, 2020 at 5:18 PM Atish Patra <[email protected]> wrote:
>>
>> This series attempts to move the ARM64 numa implementation to common
>> code so that RISC-V can leverage that as well instead of reimplementing
>> it again.
>>
>> RISC-V specific bits are based on initial work done by Greentime Hu [1] but
>> modified to reuse the common implementation to avoid duplication.
>>
>> [1] https://lkml.org/lkml/2020/1/10/233
>>
>> This series has been tested on qemu with numa enabled for both RISC-V & ARM64.
>> It would be great if somebody can test it on numa capable ARM64 hardware platforms.
>> This patch series doesn't modify the maintainers list for the common code (arch_numa)
>> as I am not sure if somebody from ARM64 community or Greg should take up the
>> maintainership. Ganapatrao was the original author of the arm64 version.
>> I would be happy to update that in the next revision once it is decided.
>>
>> # numactl --hardware
>> available: 2 nodes (0-1)
>> node 0 cpus: 0 1 2 3
>> node 0 size: 486 MB
>> node 0 free: 470 MB
>> node 1 cpus: 4 5 6 7
>> node 1 size: 424 MB
>> node 1 free: 408 MB
>> node distances:
>> node 0 1
>> 0: 10 20
>> 1: 20 10
>> # numactl -show
>> policy: default
>> preferred node: current
>> physcpubind: 0 1 2 3 4 5 6 7
>> cpubind: 0 1
>> nodebind: 0 1
>> membind: 0 1
>>
>> The patches are also available at
>> https://github.com/atishp04/linux/tree/5.10_numa_unified_v4
>>
>> For RISC-V, the following qemu series is a pre-requisite(already available in upstream)
>> https://patchwork.kernel.org/project/qemu-devel/list/?series=303313
>>
>> Testing:
>> RISC-V:
>> Tested in Qemu and 2 socket OmniXtend FPGA.
>>
>> ARM64:
>> 2 socket kunpeng920 (4 nodes around 250G a node)
>> Tested-by: Jonathan Cameron <[email protected]>
>>
>> There may be some minor conflicts with Mike's cleanup series [2] depending on the
>> order in which these two series are being accepted. I can rebase on top his series
>> if required.
>>
>> [2] https://lkml.org/lkml/2020/8/18/754
>>
>> Changes from v3->v4:
>> 1. Removed redundant duplicate header.
>> 2. Added Reviewed-by tags.
>>
>> Changes from v2->v3:
>> 1. Added Acked-by/Reviewed-by tags.
>> 2. Replaced asm/acpi.h with linux/acpi.h
>> 3. Defined arch_acpi_numa_init as static.
>>
>> Changes from v1->v2:
>> 1. Replaced ARM64 specific compile time protection with ACPI specific ones.
>> 2. Dropped common pcibus_to_node changes. Added required changes in RISC-V.
>> 3. Fixed few typos.
>>
>> Atish Patra (4):
>> numa: Move numa implementation to common code
>> arm64, numa: Change the numa init functions name to be generic
>> riscv: Separate memory init from paging init
>> riscv: Add numa support for riscv64 platform
>>
>> Greentime Hu (1):
>> riscv: Add support pte_protnone and pmd_protnone if
>> CONFIG_NUMA_BALANCING
>>
>> arch/arm64/Kconfig | 1 +
>> arch/arm64/include/asm/numa.h | 45 +----------------
>> arch/arm64/kernel/acpi_numa.c | 13 -----
>> arch/arm64/mm/Makefile | 1 -
>> arch/arm64/mm/init.c | 4 +-
>> arch/riscv/Kconfig | 31 +++++++++++-
>> arch/riscv/include/asm/mmzone.h | 13 +++++
>> arch/riscv/include/asm/numa.h | 8 +++
>> arch/riscv/include/asm/pci.h | 14 ++++++
>> arch/riscv/include/asm/pgtable.h | 21 ++++++++
>> arch/riscv/kernel/setup.c | 11 ++++-
>> arch/riscv/kernel/smpboot.c | 12 ++++-
>> arch/riscv/mm/init.c | 10 +++-
>> drivers/base/Kconfig | 6 +++
>> drivers/base/Makefile | 1 +
>> .../mm/numa.c => drivers/base/arch_numa.c | 30 ++++++++++--
>> include/asm-generic/numa.h | 49 +++++++++++++++++++
>> 17 files changed, 199 insertions(+), 71 deletions(-)
>> create mode 100644 arch/riscv/include/asm/mmzone.h
>> create mode 100644 arch/riscv/include/asm/numa.h
>> rename arch/arm64/mm/numa.c => drivers/base/arch_numa.c (95%)
>> create mode 100644 include/asm-generic/numa.h
>>
>> --
>> 2.25.1
>>
>>
>> _______________________________________________
>> linux-riscv mailing list
>> [email protected]
>> http://lists.infradead.org/mailman/listinfo/linux-riscv
>
> Ping ?
This has been at the top of my inbox for a week or two now, I just haven't
gotten around to taking a look yet because of all the other fires going on.
Sorry.
On Tue, Oct 6, 2020 at 5:48 AM Atish Patra <[email protected]> wrote:
>
> Currently, we perform some memory init functions in paging init. But,
> that will be an issue for NUMA support where DT needs to be flattened
> before numa initialization and memblock_present can only be called
> after numa initialization.
>
> Move memory initialization related functions to a separate function.
>
> Signed-off-by: Atish Patra <[email protected]>
> Reviewed-by: Greentime Hu <[email protected]>
> ---
> arch/riscv/include/asm/pgtable.h | 1 +
> arch/riscv/kernel/setup.c | 1 +
> arch/riscv/mm/init.c | 6 +++++-
> 3 files changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
> index eaea1f717010..515b42f98d34 100644
> --- a/arch/riscv/include/asm/pgtable.h
> +++ b/arch/riscv/include/asm/pgtable.h
> @@ -466,6 +466,7 @@ static inline void __kernel_map_pages(struct page *page, int numpages, int enabl
> extern void *dtb_early_va;
> void setup_bootmem(void);
> void paging_init(void);
> +void misc_mem_init(void);
>
> #define FIRST_USER_ADDRESS 0
>
> diff --git a/arch/riscv/kernel/setup.c b/arch/riscv/kernel/setup.c
> index 2c6dd329312b..07fa6d13367e 100644
> --- a/arch/riscv/kernel/setup.c
> +++ b/arch/riscv/kernel/setup.c
> @@ -78,6 +78,7 @@ void __init setup_arch(char **cmdline_p)
> #else
> unflatten_device_tree();
> #endif
> + misc_mem_init();
>
> #ifdef CONFIG_SWIOTLB
> swiotlb_init(1);
> diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
> index ed6e83871112..114c3966aadb 100644
> --- a/arch/riscv/mm/init.c
> +++ b/arch/riscv/mm/init.c
> @@ -565,8 +565,12 @@ static void __init resource_init(void)
> void __init paging_init(void)
> {
> setup_vm_final();
> - sparse_init();
> setup_zero_page();
> +}
> +
> +void __init misc_mem_init(void)
> +{
> + sparse_init();
> zone_sizes_init();
> resource_init();
> }
> --
> 2.25.1
>
Looks good to me.
Reviewed-by: Anup Patel <[email protected]>
Regards,
Anup
On Tue, Oct 6, 2020 at 5:48 AM Atish Patra <[email protected]> wrote:
>
> From: Greentime Hu <[email protected]>
>
> These two functions are used to distinguish between PROT_NONENUMA
> protections and hinting fault protections.
>
> Signed-off-by: Greentime Hu <[email protected]>
> ---
> arch/riscv/include/asm/pgtable.h | 20 ++++++++++++++++++++
> 1 file changed, 20 insertions(+)
>
> diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
> index 515b42f98d34..2751110675e6 100644
> --- a/arch/riscv/include/asm/pgtable.h
> +++ b/arch/riscv/include/asm/pgtable.h
> @@ -183,6 +183,11 @@ static inline unsigned long pmd_page_vaddr(pmd_t pmd)
> return (unsigned long)pfn_to_virt(pmd_val(pmd) >> _PAGE_PFN_SHIFT);
> }
>
> +static inline pte_t pmd_pte(pmd_t pmd)
> +{
> + return __pte(pmd_val(pmd));
> +}
> +
> /* Yields the page frame number (PFN) of a page table entry */
> static inline unsigned long pte_pfn(pte_t pte)
> {
> @@ -286,6 +291,21 @@ static inline pte_t pte_mkhuge(pte_t pte)
> return pte;
> }
>
> +#ifdef CONFIG_NUMA_BALANCING
> +/*
> + * See the comment in include/asm-generic/pgtable.h
> + */
> +static inline int pte_protnone(pte_t pte)
> +{
> + return (pte_val(pte) & (_PAGE_PRESENT | _PAGE_PROT_NONE)) == _PAGE_PROT_NONE;
> +}
> +
> +static inline int pmd_protnone(pmd_t pmd)
> +{
> + return pte_protnone(pmd_pte(pmd));
> +}
> +#endif
> +
> /* Modify page protection bits */
> static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
> {
> --
> 2.25.1
>
Looks good to me.
Reviewed-by: Anup Patel <[email protected]>
Regards,
Anup
On Tue, Oct 6, 2020 at 5:48 AM Atish Patra <[email protected]> wrote:
>
> Use the generic numa implementation to add NUMA support for RISC-V.
> This is based on Greentime's patch[1] but modified to use generic NUMA
> implementation and few more fixes.
>
> [1] https://lkml.org/lkml/2020/1/10/233
>
> Co-developed-by: Greentime Hu <[email protected]>
> Signed-off-by: Greentime Hu <[email protected]>
> Signed-off-by: Atish Patra <[email protected]>
> ---
> arch/riscv/Kconfig | 31 ++++++++++++++++++++++++++++++-
> arch/riscv/include/asm/mmzone.h | 13 +++++++++++++
> arch/riscv/include/asm/numa.h | 8 ++++++++
> arch/riscv/include/asm/pci.h | 14 ++++++++++++++
> arch/riscv/kernel/setup.c | 10 ++++++++--
> arch/riscv/kernel/smpboot.c | 12 +++++++++++-
> arch/riscv/mm/init.c | 4 +++-
> 7 files changed, 87 insertions(+), 5 deletions(-)
> create mode 100644 arch/riscv/include/asm/mmzone.h
> create mode 100644 arch/riscv/include/asm/numa.h
>
> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> index df18372861d8..7beb6ddb6eb1 100644
> --- a/arch/riscv/Kconfig
> +++ b/arch/riscv/Kconfig
> @@ -137,7 +137,7 @@ config PAGE_OFFSET
> default 0xffffffe000000000 if 64BIT && MAXPHYSMEM_128GB
>
> config ARCH_FLATMEM_ENABLE
> - def_bool y
> + def_bool !NUMA
>
> config ARCH_SPARSEMEM_ENABLE
> def_bool y
> @@ -295,6 +295,35 @@ config TUNE_GENERIC
>
> endchoice
>
> +# Common NUMA Features
> +config NUMA
> + bool "NUMA Memory Allocation and Scheduler Support"
> + select GENERIC_ARCH_NUMA
> + select OF_NUMA
> + select ARCH_SUPPORTS_NUMA_BALANCING
> + help
> + Enable NUMA (Non-Uniform Memory Access) support.
> +
> + The kernel will try to allocate memory used by a CPU on the
> + local memory of the CPU and add some more NUMA awareness to the kernel.
> +
> +config NODES_SHIFT
> + int "Maximum NUMA Nodes (as a power of 2)"
> + range 1 10
> + default "2"
> + depends on NEED_MULTIPLE_NODES
> + help
> + Specify the maximum number of NUMA Nodes available on the target
> + system. Increases memory reserved to accommodate various tables.
> +
> +config USE_PERCPU_NUMA_NODE_ID
> + def_bool y
> + depends on NUMA
> +
> +config NEED_PER_CPU_EMBED_FIRST_CHUNK
> + def_bool y
> + depends on NUMA
> +
> config RISCV_ISA_C
> bool "Emit compressed instructions when building Linux"
> default y
> diff --git a/arch/riscv/include/asm/mmzone.h b/arch/riscv/include/asm/mmzone.h
> new file mode 100644
> index 000000000000..fa17e01d9ab2
> --- /dev/null
> +++ b/arch/riscv/include/asm/mmzone.h
> @@ -0,0 +1,13 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef __ASM_MMZONE_H
> +#define __ASM_MMZONE_H
> +
> +#ifdef CONFIG_NUMA
> +
> +#include <asm/numa.h>
> +
> +extern struct pglist_data *node_data[];
> +#define NODE_DATA(nid) (node_data[(nid)])
> +
> +#endif /* CONFIG_NUMA */
> +#endif /* __ASM_MMZONE_H */
> diff --git a/arch/riscv/include/asm/numa.h b/arch/riscv/include/asm/numa.h
> new file mode 100644
> index 000000000000..8c8cf4297cc3
> --- /dev/null
> +++ b/arch/riscv/include/asm/numa.h
> @@ -0,0 +1,8 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef __ASM_NUMA_H
> +#define __ASM_NUMA_H
> +
> +#include <asm/topology.h>
> +#include <asm-generic/numa.h>
> +
> +#endif /* __ASM_NUMA_H */
> diff --git a/arch/riscv/include/asm/pci.h b/arch/riscv/include/asm/pci.h
> index 1c473a1bd986..658e112c3ce7 100644
> --- a/arch/riscv/include/asm/pci.h
> +++ b/arch/riscv/include/asm/pci.h
> @@ -32,6 +32,20 @@ static inline int pci_proc_domain(struct pci_bus *bus)
> /* always show the domain in /proc */
> return 1;
> }
> +
> +#ifdef CONFIG_NUMA
> +
> +static inline int pcibus_to_node(struct pci_bus *bus)
> +{
> + return dev_to_node(&bus->dev);
> +}
> +#ifndef cpumask_of_pcibus
> +#define cpumask_of_pcibus(bus) (pcibus_to_node(bus) == -1 ? \
> + cpu_all_mask : \
> + cpumask_of_node(pcibus_to_node(bus)))
> +#endif
> +#endif /* CONFIG_NUMA */
> +
> #endif /* CONFIG_PCI */
>
> #endif /* _ASM_RISCV_PCI_H */
> diff --git a/arch/riscv/kernel/setup.c b/arch/riscv/kernel/setup.c
> index 07fa6d13367e..53a806a9cbaf 100644
> --- a/arch/riscv/kernel/setup.c
> +++ b/arch/riscv/kernel/setup.c
> @@ -101,13 +101,19 @@ void __init setup_arch(char **cmdline_p)
>
> static int __init topology_init(void)
> {
> - int i;
> + int i, ret;
> +
> + for_each_online_node(i)
> + register_one_node(i);
>
> for_each_possible_cpu(i) {
> struct cpu *cpu = &per_cpu(cpu_devices, i);
>
> cpu->hotpluggable = cpu_has_hotplug(i);
> - register_cpu(cpu, i);
> + ret = register_cpu(cpu, i);
> + if (unlikely(ret))
> + pr_warn("Warning: %s: register_cpu %d failed (%d)\n",
> + __func__, i, ret);
> }
>
> return 0;
> diff --git a/arch/riscv/kernel/smpboot.c b/arch/riscv/kernel/smpboot.c
> index 96167d55ed98..5e276c25646f 100644
> --- a/arch/riscv/kernel/smpboot.c
> +++ b/arch/riscv/kernel/smpboot.c
> @@ -27,6 +27,7 @@
> #include <asm/cpu_ops.h>
> #include <asm/irq.h>
> #include <asm/mmu_context.h>
> +#include <asm/numa.h>
> #include <asm/tlbflush.h>
> #include <asm/sections.h>
> #include <asm/sbi.h>
> @@ -45,13 +46,18 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
> {
> int cpuid;
> int ret;
> + unsigned int curr_cpuid;
> +
> + curr_cpuid = smp_processor_id();
> + numa_store_cpu_info(curr_cpuid);
> + numa_add_cpu(curr_cpuid);
>
> /* This covers non-smp usecase mandated by "nosmp" option */
> if (max_cpus == 0)
> return;
>
> for_each_possible_cpu(cpuid) {
> - if (cpuid == smp_processor_id())
> + if (cpuid == curr_cpuid)
> continue;
> if (cpu_ops[cpuid]->cpu_prepare) {
> ret = cpu_ops[cpuid]->cpu_prepare(cpuid);
> @@ -59,6 +65,7 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
> continue;
> }
> set_cpu_present(cpuid, true);
> + numa_store_cpu_info(cpuid);
> }
> }
>
> @@ -79,6 +86,7 @@ void __init setup_smp(void)
> if (hart == cpuid_to_hartid_map(0)) {
> BUG_ON(found_boot_cpu);
> found_boot_cpu = 1;
> + early_map_cpu_to_node(0, of_node_to_nid(dn));
> continue;
> }
> if (cpuid >= NR_CPUS) {
> @@ -88,6 +96,7 @@ void __init setup_smp(void)
> }
>
> cpuid_to_hartid_map(cpuid) = hart;
> + early_map_cpu_to_node(cpuid, of_node_to_nid(dn));
> cpuid++;
> }
>
> @@ -153,6 +162,7 @@ asmlinkage __visible void smp_callin(void)
> current->active_mm = mm;
>
> notify_cpu_starting(curr_cpuid);
> + numa_add_cpu(curr_cpuid);
> update_siblings_masks(curr_cpuid);
> set_cpu_online(curr_cpuid, 1);
>
> diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
> index 114c3966aadb..c4046e11d264 100644
> --- a/arch/riscv/mm/init.c
> +++ b/arch/riscv/mm/init.c
> @@ -20,6 +20,7 @@
> #include <asm/soc.h>
> #include <asm/io.h>
> #include <asm/ptdump.h>
> +#include <asm/numa.h>
>
> #include "../kernel/head.h"
>
> @@ -185,7 +186,6 @@ void __init setup_bootmem(void)
>
> early_init_fdt_scan_reserved_mem();
> memblock_allow_resize();
> - memblock_dump_all();
>
> for_each_memblock(memory, reg) {
> unsigned long start_pfn = memblock_region_memory_base_pfn(reg);
> @@ -570,9 +570,11 @@ void __init paging_init(void)
>
> void __init misc_mem_init(void)
> {
> + arch_numa_init();
> sparse_init();
> zone_sizes_init();
> resource_init();
> + memblock_dump_all();
> }
>
> #ifdef CONFIG_SPARSEMEM_VMEMMAP
> --
> 2.25.1
>
Looks good to me.
Reviewed-by: Anup Patel <[email protected]>
Regards,
Anup
On Mon, 05 Oct 2020 17:17:50 PDT (-0700), Atish Patra wrote:
> Currently, we perform some memory init functions in paging init. But,
> that will be an issue for NUMA support where DT needs to be flattened
> before numa initialization and memblock_present can only be called
> after numa initialization.
>
> Move memory initialization related functions to a separate function.
>
> Signed-off-by: Atish Patra <[email protected]>
> Reviewed-by: Greentime Hu <[email protected]>
> ---
> arch/riscv/include/asm/pgtable.h | 1 +
> arch/riscv/kernel/setup.c | 1 +
> arch/riscv/mm/init.c | 6 +++++-
> 3 files changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
> index eaea1f717010..515b42f98d34 100644
> --- a/arch/riscv/include/asm/pgtable.h
> +++ b/arch/riscv/include/asm/pgtable.h
> @@ -466,6 +466,7 @@ static inline void __kernel_map_pages(struct page *page, int numpages, int enabl
> extern void *dtb_early_va;
> void setup_bootmem(void);
> void paging_init(void);
> +void misc_mem_init(void);
>
> #define FIRST_USER_ADDRESS 0
>
> diff --git a/arch/riscv/kernel/setup.c b/arch/riscv/kernel/setup.c
> index 2c6dd329312b..07fa6d13367e 100644
> --- a/arch/riscv/kernel/setup.c
> +++ b/arch/riscv/kernel/setup.c
> @@ -78,6 +78,7 @@ void __init setup_arch(char **cmdline_p)
> #else
> unflatten_device_tree();
> #endif
> + misc_mem_init();
>
> #ifdef CONFIG_SWIOTLB
> swiotlb_init(1);
> diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
> index ed6e83871112..114c3966aadb 100644
> --- a/arch/riscv/mm/init.c
> +++ b/arch/riscv/mm/init.c
> @@ -565,8 +565,12 @@ static void __init resource_init(void)
> void __init paging_init(void)
> {
> setup_vm_final();
> - sparse_init();
> setup_zero_page();
> +}
> +
> +void __init misc_mem_init(void)
> +{
> + sparse_init();
> zone_sizes_init();
> resource_init();
> }
Reviewed-by: Palmer Dabbelt <[email protected]>
On Mon, 05 Oct 2020 17:17:51 PDT (-0700), Atish Patra wrote:
> From: Greentime Hu <[email protected]>
>
> These two functions are used to distinguish between PROT_NONENUMA
> protections and hinting fault protections.
>
> Signed-off-by: Greentime Hu <[email protected]>
> ---
> arch/riscv/include/asm/pgtable.h | 20 ++++++++++++++++++++
> 1 file changed, 20 insertions(+)
>
> diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
> index 515b42f98d34..2751110675e6 100644
> --- a/arch/riscv/include/asm/pgtable.h
> +++ b/arch/riscv/include/asm/pgtable.h
> @@ -183,6 +183,11 @@ static inline unsigned long pmd_page_vaddr(pmd_t pmd)
> return (unsigned long)pfn_to_virt(pmd_val(pmd) >> _PAGE_PFN_SHIFT);
> }
>
> +static inline pte_t pmd_pte(pmd_t pmd)
> +{
> + return __pte(pmd_val(pmd));
> +}
> +
> /* Yields the page frame number (PFN) of a page table entry */
> static inline unsigned long pte_pfn(pte_t pte)
> {
> @@ -286,6 +291,21 @@ static inline pte_t pte_mkhuge(pte_t pte)
> return pte;
> }
>
> +#ifdef CONFIG_NUMA_BALANCING
> +/*
> + * See the comment in include/asm-generic/pgtable.h
> + */
> +static inline int pte_protnone(pte_t pte)
> +{
> + return (pte_val(pte) & (_PAGE_PRESENT | _PAGE_PROT_NONE)) == _PAGE_PROT_NONE;
> +}
> +
> +static inline int pmd_protnone(pmd_t pmd)
> +{
> + return pte_protnone(pmd_pte(pmd));
> +}
> +#endif
> +
> /* Modify page protection bits */
> static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
> {
Reviewed-by: Palmer Dabbelt <[email protected]>
On Mon, 05 Oct 2020 17:17:52 PDT (-0700), Atish Patra wrote:
> Use the generic numa implementation to add NUMA support for RISC-V.
> This is based on Greentime's patch[1] but modified to use generic NUMA
> implementation and few more fixes.
>
> [1] https://lkml.org/lkml/2020/1/10/233
>
> Co-developed-by: Greentime Hu <[email protected]>
> Signed-off-by: Greentime Hu <[email protected]>
> Signed-off-by: Atish Patra <[email protected]>
> ---
> arch/riscv/Kconfig | 31 ++++++++++++++++++++++++++++++-
> arch/riscv/include/asm/mmzone.h | 13 +++++++++++++
> arch/riscv/include/asm/numa.h | 8 ++++++++
> arch/riscv/include/asm/pci.h | 14 ++++++++++++++
> arch/riscv/kernel/setup.c | 10 ++++++++--
> arch/riscv/kernel/smpboot.c | 12 +++++++++++-
> arch/riscv/mm/init.c | 4 +++-
> 7 files changed, 87 insertions(+), 5 deletions(-)
> create mode 100644 arch/riscv/include/asm/mmzone.h
> create mode 100644 arch/riscv/include/asm/numa.h
>
> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> index df18372861d8..7beb6ddb6eb1 100644
> --- a/arch/riscv/Kconfig
> +++ b/arch/riscv/Kconfig
> @@ -137,7 +137,7 @@ config PAGE_OFFSET
> default 0xffffffe000000000 if 64BIT && MAXPHYSMEM_128GB
>
> config ARCH_FLATMEM_ENABLE
> - def_bool y
> + def_bool !NUMA
>
> config ARCH_SPARSEMEM_ENABLE
> def_bool y
> @@ -295,6 +295,35 @@ config TUNE_GENERIC
>
> endchoice
>
> +# Common NUMA Features
> +config NUMA
> + bool "NUMA Memory Allocation and Scheduler Support"
> + select GENERIC_ARCH_NUMA
> + select OF_NUMA
> + select ARCH_SUPPORTS_NUMA_BALANCING
> + help
> + Enable NUMA (Non-Uniform Memory Access) support.
> +
> + The kernel will try to allocate memory used by a CPU on the
> + local memory of the CPU and add some more NUMA awareness to the kernel.
> +
> +config NODES_SHIFT
> + int "Maximum NUMA Nodes (as a power of 2)"
> + range 1 10
> + default "2"
> + depends on NEED_MULTIPLE_NODES
> + help
> + Specify the maximum number of NUMA Nodes available on the target
> + system. Increases memory reserved to accommodate various tables.
> +
> +config USE_PERCPU_NUMA_NODE_ID
> + def_bool y
> + depends on NUMA
> +
> +config NEED_PER_CPU_EMBED_FIRST_CHUNK
> + def_bool y
> + depends on NUMA
> +
> config RISCV_ISA_C
> bool "Emit compressed instructions when building Linux"
> default y
> diff --git a/arch/riscv/include/asm/mmzone.h b/arch/riscv/include/asm/mmzone.h
> new file mode 100644
> index 000000000000..fa17e01d9ab2
> --- /dev/null
> +++ b/arch/riscv/include/asm/mmzone.h
> @@ -0,0 +1,13 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef __ASM_MMZONE_H
> +#define __ASM_MMZONE_H
> +
> +#ifdef CONFIG_NUMA
> +
> +#include <asm/numa.h>
> +
> +extern struct pglist_data *node_data[];
> +#define NODE_DATA(nid) (node_data[(nid)])
> +
> +#endif /* CONFIG_NUMA */
> +#endif /* __ASM_MMZONE_H */
> diff --git a/arch/riscv/include/asm/numa.h b/arch/riscv/include/asm/numa.h
> new file mode 100644
> index 000000000000..8c8cf4297cc3
> --- /dev/null
> +++ b/arch/riscv/include/asm/numa.h
> @@ -0,0 +1,8 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef __ASM_NUMA_H
> +#define __ASM_NUMA_H
> +
> +#include <asm/topology.h>
> +#include <asm-generic/numa.h>
> +
> +#endif /* __ASM_NUMA_H */
> diff --git a/arch/riscv/include/asm/pci.h b/arch/riscv/include/asm/pci.h
> index 1c473a1bd986..658e112c3ce7 100644
> --- a/arch/riscv/include/asm/pci.h
> +++ b/arch/riscv/include/asm/pci.h
> @@ -32,6 +32,20 @@ static inline int pci_proc_domain(struct pci_bus *bus)
> /* always show the domain in /proc */
> return 1;
> }
> +
> +#ifdef CONFIG_NUMA
> +
> +static inline int pcibus_to_node(struct pci_bus *bus)
> +{
> + return dev_to_node(&bus->dev);
> +}
> +#ifndef cpumask_of_pcibus
> +#define cpumask_of_pcibus(bus) (pcibus_to_node(bus) == -1 ? \
> + cpu_all_mask : \
> + cpumask_of_node(pcibus_to_node(bus)))
> +#endif
> +#endif /* CONFIG_NUMA */
> +
> #endif /* CONFIG_PCI */
>
> #endif /* _ASM_RISCV_PCI_H */
> diff --git a/arch/riscv/kernel/setup.c b/arch/riscv/kernel/setup.c
> index 07fa6d13367e..53a806a9cbaf 100644
> --- a/arch/riscv/kernel/setup.c
> +++ b/arch/riscv/kernel/setup.c
> @@ -101,13 +101,19 @@ void __init setup_arch(char **cmdline_p)
>
> static int __init topology_init(void)
> {
> - int i;
> + int i, ret;
> +
> + for_each_online_node(i)
> + register_one_node(i);
>
> for_each_possible_cpu(i) {
> struct cpu *cpu = &per_cpu(cpu_devices, i);
>
> cpu->hotpluggable = cpu_has_hotplug(i);
> - register_cpu(cpu, i);
> + ret = register_cpu(cpu, i);
> + if (unlikely(ret))
> + pr_warn("Warning: %s: register_cpu %d failed (%d)\n",
> + __func__, i, ret);
> }
>
> return 0;
> diff --git a/arch/riscv/kernel/smpboot.c b/arch/riscv/kernel/smpboot.c
> index 96167d55ed98..5e276c25646f 100644
> --- a/arch/riscv/kernel/smpboot.c
> +++ b/arch/riscv/kernel/smpboot.c
> @@ -27,6 +27,7 @@
> #include <asm/cpu_ops.h>
> #include <asm/irq.h>
> #include <asm/mmu_context.h>
> +#include <asm/numa.h>
> #include <asm/tlbflush.h>
> #include <asm/sections.h>
> #include <asm/sbi.h>
> @@ -45,13 +46,18 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
> {
> int cpuid;
> int ret;
> + unsigned int curr_cpuid;
> +
> + curr_cpuid = smp_processor_id();
> + numa_store_cpu_info(curr_cpuid);
> + numa_add_cpu(curr_cpuid);
>
> /* This covers non-smp usecase mandated by "nosmp" option */
> if (max_cpus == 0)
> return;
>
> for_each_possible_cpu(cpuid) {
> - if (cpuid == smp_processor_id())
> + if (cpuid == curr_cpuid)
> continue;
> if (cpu_ops[cpuid]->cpu_prepare) {
> ret = cpu_ops[cpuid]->cpu_prepare(cpuid);
> @@ -59,6 +65,7 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
> continue;
> }
> set_cpu_present(cpuid, true);
> + numa_store_cpu_info(cpuid);
> }
> }
>
> @@ -79,6 +86,7 @@ void __init setup_smp(void)
> if (hart == cpuid_to_hartid_map(0)) {
> BUG_ON(found_boot_cpu);
> found_boot_cpu = 1;
> + early_map_cpu_to_node(0, of_node_to_nid(dn));
> continue;
> }
> if (cpuid >= NR_CPUS) {
> @@ -88,6 +96,7 @@ void __init setup_smp(void)
> }
>
> cpuid_to_hartid_map(cpuid) = hart;
> + early_map_cpu_to_node(cpuid, of_node_to_nid(dn));
> cpuid++;
> }
>
> @@ -153,6 +162,7 @@ asmlinkage __visible void smp_callin(void)
> current->active_mm = mm;
>
> notify_cpu_starting(curr_cpuid);
> + numa_add_cpu(curr_cpuid);
> update_siblings_masks(curr_cpuid);
> set_cpu_online(curr_cpuid, 1);
>
> diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
> index 114c3966aadb..c4046e11d264 100644
> --- a/arch/riscv/mm/init.c
> +++ b/arch/riscv/mm/init.c
> @@ -20,6 +20,7 @@
> #include <asm/soc.h>
> #include <asm/io.h>
> #include <asm/ptdump.h>
> +#include <asm/numa.h>
>
> #include "../kernel/head.h"
>
> @@ -185,7 +186,6 @@ void __init setup_bootmem(void)
>
> early_init_fdt_scan_reserved_mem();
> memblock_allow_resize();
> - memblock_dump_all();
>
> for_each_memblock(memory, reg) {
> unsigned long start_pfn = memblock_region_memory_base_pfn(reg);
> @@ -570,9 +570,11 @@ void __init paging_init(void)
>
> void __init misc_mem_init(void)
> {
> + arch_numa_init();
> sparse_init();
> zone_sizes_init();
> resource_init();
> + memblock_dump_all();
> }
>
> #ifdef CONFIG_SPARSEMEM_VMEMMAP
Reviewed-by: Palmer Dabbelt <[email protected]>
On Mon, 05 Oct 2020 17:17:47 PDT (-0700), Atish Patra wrote:
> This series attempts to move the ARM64 numa implementation to common
> code so that RISC-V can leverage that as well instead of reimplementing
> it again.
>
> RISC-V specific bits are based on initial work done by Greentime Hu [1] but
> modified to reuse the common implementation to avoid duplication.
>
> [1] https://lkml.org/lkml/2020/1/10/233
>
> This series has been tested on qemu with numa enabled for both RISC-V & ARM64.
> It would be great if somebody can test it on numa capable ARM64 hardware platforms.
> This patch series doesn't modify the maintainers list for the common code (arch_numa)
> as I am not sure if somebody from ARM64 community or Greg should take up the
> maintainership. Ganapatrao was the original author of the arm64 version.
> I would be happy to update that in the next revision once it is decided.
>
> # numactl --hardware
> available: 2 nodes (0-1)
> node 0 cpus: 0 1 2 3
> node 0 size: 486 MB
> node 0 free: 470 MB
> node 1 cpus: 4 5 6 7
> node 1 size: 424 MB
> node 1 free: 408 MB
> node distances:
> node 0 1
> 0: 10 20
> 1: 20 10
> # numactl -show
> policy: default
> preferred node: current
> physcpubind: 0 1 2 3 4 5 6 7
> cpubind: 0 1
> nodebind: 0 1
> membind: 0 1
>
> The patches are also available at
> https://github.com/atishp04/linux/tree/5.10_numa_unified_v4
>
> For RISC-V, the following qemu series is a pre-requisite(already available in upstream)
> https://patchwork.kernel.org/project/qemu-devel/list/?series=303313
>
> Testing:
> RISC-V:
> Tested in Qemu and 2 socket OmniXtend FPGA.
>
> ARM64:
> 2 socket kunpeng920 (4 nodes around 250G a node)
> Tested-by: Jonathan Cameron <[email protected]>
>
> There may be some minor conflicts with Mike's cleanup series [2] depending on the
> order in which these two series are being accepted. I can rebase on top his series
> if required.
>
> [2] https://lkml.org/lkml/2020/8/18/754
>
> Changes from v3->v4:
> 1. Removed redundant duplicate header.
> 2. Added Reviewed-by tags.
>
> Changes from v2->v3:
> 1. Added Acked-by/Reviewed-by tags.
> 2. Replaced asm/acpi.h with linux/acpi.h
> 3. Defined arch_acpi_numa_init as static.
>
> Changes from v1->v2:
> 1. Replaced ARM64 specific compile time protection with ACPI specific ones.
> 2. Dropped common pcibus_to_node changes. Added required changes in RISC-V.
> 3. Fixed few typos.
>
> Atish Patra (4):
> numa: Move numa implementation to common code
> arm64, numa: Change the numa init functions name to be generic
> riscv: Separate memory init from paging init
> riscv: Add numa support for riscv64 platform
>
> Greentime Hu (1):
> riscv: Add support pte_protnone and pmd_protnone if
> CONFIG_NUMA_BALANCING
>
> arch/arm64/Kconfig | 1 +
> arch/arm64/include/asm/numa.h | 45 +----------------
> arch/arm64/kernel/acpi_numa.c | 13 -----
> arch/arm64/mm/Makefile | 1 -
> arch/arm64/mm/init.c | 4 +-
> arch/riscv/Kconfig | 31 +++++++++++-
> arch/riscv/include/asm/mmzone.h | 13 +++++
> arch/riscv/include/asm/numa.h | 8 +++
> arch/riscv/include/asm/pci.h | 14 ++++++
> arch/riscv/include/asm/pgtable.h | 21 ++++++++
> arch/riscv/kernel/setup.c | 11 ++++-
> arch/riscv/kernel/smpboot.c | 12 ++++-
> arch/riscv/mm/init.c | 10 +++-
> drivers/base/Kconfig | 6 +++
> drivers/base/Makefile | 1 +
> .../mm/numa.c => drivers/base/arch_numa.c | 30 ++++++++++--
> include/asm-generic/numa.h | 49 +++++++++++++++++++
> 17 files changed, 199 insertions(+), 71 deletions(-)
> create mode 100644 arch/riscv/include/asm/mmzone.h
> create mode 100644 arch/riscv/include/asm/numa.h
> rename arch/arm64/mm/numa.c => drivers/base/arch_numa.c (95%)
> create mode 100644 include/asm-generic/numa.h
Sorry it took me a while to get around to this, I had some work stuff to deal
with and have managed to get buried in email. This all looks fine to me, but
the way it's structured make it kind of hard to apply -- essentially I can't
take the first two without at least some Acks from the arm64 folks, and it
smells to me like it'd be better to have those go through the arm64 tree. The
RISC-V stuff isn't that heavywight, but I'd like it to at least land in my
for-next at some point as otherwise it'll be completely untested.
arm64 guys: do you want to try and do some sort of shared base tag sort of
thing for these, or do you want me to refactor this such that it adds the
generic stuff before removing the arm64 stuff so we can decouble that way?
On Mon, Oct 05, 2020 at 05:17:49PM -0700, Atish Patra wrote:
> diff --git a/arch/arm64/kernel/acpi_numa.c b/arch/arm64/kernel/acpi_numa.c
> index 7ff800045434..96502ff92af5 100644
> --- a/arch/arm64/kernel/acpi_numa.c
> +++ b/arch/arm64/kernel/acpi_numa.c
> @@ -117,16 +117,3 @@ void __init acpi_numa_gicc_affinity_init(struct acpi_srat_gicc_affinity *pa)
>
> node_set(node, numa_nodes_parsed);
> }
> -
> -int __init arm64_acpi_numa_init(void)
> -{
> - int ret;
> -
> - ret = acpi_numa_init();
> - if (ret) {
> - pr_info("Failed to initialise from firmware\n");
> - return ret;
> - }
> -
> - return srat_disabled() ? -EINVAL : 0;
> -}
I think it's better if arm64_acpi_numa_init() and arm64_numa_init()
remained in the arm64 code. It's not really much code to be shared.
> diff --git a/drivers/base/arch_numa.c b/drivers/base/arch_numa.c
> index 73f8b49d485c..74b4f2ddad70 100644
> --- a/drivers/base/arch_numa.c
> +++ b/drivers/base/arch_numa.c
> @@ -13,7 +13,6 @@
> #include <linux/module.h>
> #include <linux/of.h>
>
> -#include <asm/acpi.h>
> #include <asm/sections.h>
>
> struct pglist_data *node_data[MAX_NUMNODES] __read_mostly;
> @@ -444,16 +443,37 @@ static int __init dummy_numa_init(void)
> return 0;
> }
>
> +#ifdef CONFIG_ACPI_NUMA
> +static int __init arch_acpi_numa_init(void)
> +{
> + int ret;
> +
> + ret = acpi_numa_init();
> + if (ret) {
> + pr_info("Failed to initialise from firmware\n");
> + return ret;
> + }
> +
> + return srat_disabled() ? -EINVAL : 0;
> +}
> +#else
> +static int __init arch_acpi_numa_init(void)
> +{
> + return -EOPNOTSUPP;
> +}
> +
> +#endif
> +
> /**
> - * arm64_numa_init() - Initialize NUMA
> + * arch_numa_init() - Initialize NUMA
> *
> * Try each configured NUMA initialization method until one succeeds. The
> - * last fallback is dummy single node config encomapssing whole memory.
> + * last fallback is dummy single node config encompassing whole memory.
> */
> -void __init arm64_numa_init(void)
> +void __init arch_numa_init(void)
> {
> if (!numa_off) {
> - if (!acpi_disabled && !numa_init(arm64_acpi_numa_init))
> + if (!acpi_disabled && !numa_init(arch_acpi_numa_init))
> return;
> if (acpi_disabled && !numa_init(of_numa_init))
> return;
Does riscv even have an acpi_disabled variable?
--
Catalin
On Thu, Nov 05, 2020 at 10:07:00AM -0800, Palmer Dabbelt wrote:
> On Mon, 05 Oct 2020 17:17:47 PDT (-0700), Atish Patra wrote:
> > arch/arm64/Kconfig | 1 +
> > arch/arm64/include/asm/numa.h | 45 +----------------
> > arch/arm64/kernel/acpi_numa.c | 13 -----
> > arch/arm64/mm/Makefile | 1 -
> > arch/arm64/mm/init.c | 4 +-
> > arch/riscv/Kconfig | 31 +++++++++++-
> > arch/riscv/include/asm/mmzone.h | 13 +++++
> > arch/riscv/include/asm/numa.h | 8 +++
> > arch/riscv/include/asm/pci.h | 14 ++++++
> > arch/riscv/include/asm/pgtable.h | 21 ++++++++
> > arch/riscv/kernel/setup.c | 11 ++++-
> > arch/riscv/kernel/smpboot.c | 12 ++++-
> > arch/riscv/mm/init.c | 10 +++-
> > drivers/base/Kconfig | 6 +++
> > drivers/base/Makefile | 1 +
> > .../mm/numa.c => drivers/base/arch_numa.c | 30 ++++++++++--
> > include/asm-generic/numa.h | 49 +++++++++++++++++++
> > 17 files changed, 199 insertions(+), 71 deletions(-)
> > create mode 100644 arch/riscv/include/asm/mmzone.h
> > create mode 100644 arch/riscv/include/asm/numa.h
> > rename arch/arm64/mm/numa.c => drivers/base/arch_numa.c (95%)
> > create mode 100644 include/asm-generic/numa.h
[...]
> arm64 guys: do you want to try and do some sort of shared base tag sort of
> thing for these, or do you want me to refactor this such that it adds the
> generic stuff before removing the arm64 stuff so we can decouble that way?
I had a comment on the second patch (probably impacting the first) but
otherwise they look fine.
I'm happy for this series to go in via the riscv tree but, if we run
into conflicts, please provide a stable branch somewhere containing the
arm64 changes (first two patches).
--
Catalin
On Fri, Nov 6, 2020 at 9:14 AM Catalin Marinas <[email protected]> wrote:
>
> On Mon, Oct 05, 2020 at 05:17:49PM -0700, Atish Patra wrote:
> > diff --git a/arch/arm64/kernel/acpi_numa.c b/arch/arm64/kernel/acpi_numa.c
> > index 7ff800045434..96502ff92af5 100644
> > --- a/arch/arm64/kernel/acpi_numa.c
> > +++ b/arch/arm64/kernel/acpi_numa.c
> > @@ -117,16 +117,3 @@ void __init acpi_numa_gicc_affinity_init(struct acpi_srat_gicc_affinity *pa)
> >
> > node_set(node, numa_nodes_parsed);
> > }
> > -
> > -int __init arm64_acpi_numa_init(void)
> > -{
> > - int ret;
> > -
> > - ret = acpi_numa_init();
> > - if (ret) {
> > - pr_info("Failed to initialise from firmware\n");
> > - return ret;
> > - }
> > -
> > - return srat_disabled() ? -EINVAL : 0;
> > -}
>
> I think it's better if arm64_acpi_numa_init() and arm64_numa_init()
> remained in the arm64 code. It's not really much code to be shared.
>
RISC-V will probably support ACPI one day. The idea is to not to do
exercise again in future.
Moreover, there will be arch_numa_init which will be used by RISC-V
and there will be arm64_numa_init
used by arm64. However, if you feel strongly about it, I am happy to
move back those two functions to arm64.
In case, we decide to go that route, can we define arm64_numa_init in
mm/init.c ?
Defining numa.c just for arm64_numa_init in arm64 may be an overkill.
> > diff --git a/drivers/base/arch_numa.c b/drivers/base/arch_numa.c
> > index 73f8b49d485c..74b4f2ddad70 100644
> > --- a/drivers/base/arch_numa.c
> > +++ b/drivers/base/arch_numa.c
> > @@ -13,7 +13,6 @@
> > #include <linux/module.h>
> > #include <linux/of.h>
> >
> > -#include <asm/acpi.h>
> > #include <asm/sections.h>
> >
> > struct pglist_data *node_data[MAX_NUMNODES] __read_mostly;
> > @@ -444,16 +443,37 @@ static int __init dummy_numa_init(void)
> > return 0;
> > }
> >
> > +#ifdef CONFIG_ACPI_NUMA
> > +static int __init arch_acpi_numa_init(void)
> > +{
> > + int ret;
> > +
> > + ret = acpi_numa_init();
> > + if (ret) {
> > + pr_info("Failed to initialise from firmware\n");
> > + return ret;
> > + }
> > +
> > + return srat_disabled() ? -EINVAL : 0;
> > +}
> > +#else
> > +static int __init arch_acpi_numa_init(void)
> > +{
> > + return -EOPNOTSUPP;
> > +}
> > +
> > +#endif
> > +
> > /**
> > - * arm64_numa_init() - Initialize NUMA
> > + * arch_numa_init() - Initialize NUMA
> > *
> > * Try each configured NUMA initialization method until one succeeds. The
> > - * last fallback is dummy single node config encomapssing whole memory.
> > + * last fallback is dummy single node config encompassing whole memory.
> > */
> > -void __init arm64_numa_init(void)
> > +void __init arch_numa_init(void)
> > {
> > if (!numa_off) {
> > - if (!acpi_disabled && !numa_init(arm64_acpi_numa_init))
> > + if (!acpi_disabled && !numa_init(arch_acpi_numa_init))
> > return;
> > if (acpi_disabled && !numa_init(of_numa_init))
> > return;
>
> Does riscv even have an acpi_disabled variable?
>
It is defined in "include/linux/acpi.h" which is included in arch_numa.c
> --
> Catalin
>
> _______________________________________________
> linux-riscv mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-riscv
--
Regards,
Atish
On Fri, Nov 06, 2020 at 09:33:14AM -0800, Atish Patra wrote:
> On Fri, Nov 6, 2020 at 9:14 AM Catalin Marinas <[email protected]> wrote:
> > On Mon, Oct 05, 2020 at 05:17:49PM -0700, Atish Patra wrote:
> > > diff --git a/arch/arm64/kernel/acpi_numa.c b/arch/arm64/kernel/acpi_numa.c
> > > index 7ff800045434..96502ff92af5 100644
> > > --- a/arch/arm64/kernel/acpi_numa.c
> > > +++ b/arch/arm64/kernel/acpi_numa.c
> > > @@ -117,16 +117,3 @@ void __init acpi_numa_gicc_affinity_init(struct acpi_srat_gicc_affinity *pa)
> > >
> > > node_set(node, numa_nodes_parsed);
> > > }
> > > -
> > > -int __init arm64_acpi_numa_init(void)
> > > -{
> > > - int ret;
> > > -
> > > - ret = acpi_numa_init();
> > > - if (ret) {
> > > - pr_info("Failed to initialise from firmware\n");
> > > - return ret;
> > > - }
> > > -
> > > - return srat_disabled() ? -EINVAL : 0;
> > > -}
> >
> > I think it's better if arm64_acpi_numa_init() and arm64_numa_init()
> > remained in the arm64 code. It's not really much code to be shared.
>
> RISC-V will probably support ACPI one day. The idea is to not to do
> exercise again in future.
> Moreover, there will be arch_numa_init which will be used by RISC-V
> and there will be arm64_numa_init
> used by arm64. However, if you feel strongly about it, I am happy to
> move back those two functions to arm64.
I don't have a strong view on this, only if there's a risk at some point
of the implementations diverging (e.g. quirks). We can revisit it if
that happens.
It may be worth swapping patches 1 and 2 so that you don't have an
arm64_* function in the core code after the first patch (more of a
nitpick). Either way, feel free to add my ack on both patches:
Acked-by: Catalin Marinas <[email protected]>
On Fri, Nov 6, 2020 at 11:08 AM Catalin Marinas <[email protected]> wrote:
>
> On Fri, Nov 06, 2020 at 09:33:14AM -0800, Atish Patra wrote:
> > On Fri, Nov 6, 2020 at 9:14 AM Catalin Marinas <[email protected]> wrote:
> > > On Mon, Oct 05, 2020 at 05:17:49PM -0700, Atish Patra wrote:
> > > > diff --git a/arch/arm64/kernel/acpi_numa.c b/arch/arm64/kernel/acpi_numa.c
> > > > index 7ff800045434..96502ff92af5 100644
> > > > --- a/arch/arm64/kernel/acpi_numa.c
> > > > +++ b/arch/arm64/kernel/acpi_numa.c
> > > > @@ -117,16 +117,3 @@ void __init acpi_numa_gicc_affinity_init(struct acpi_srat_gicc_affinity *pa)
> > > >
> > > > node_set(node, numa_nodes_parsed);
> > > > }
> > > > -
> > > > -int __init arm64_acpi_numa_init(void)
> > > > -{
> > > > - int ret;
> > > > -
> > > > - ret = acpi_numa_init();
> > > > - if (ret) {
> > > > - pr_info("Failed to initialise from firmware\n");
> > > > - return ret;
> > > > - }
> > > > -
> > > > - return srat_disabled() ? -EINVAL : 0;
> > > > -}
> > >
> > > I think it's better if arm64_acpi_numa_init() and arm64_numa_init()
> > > remained in the arm64 code. It's not really much code to be shared.
> >
> > RISC-V will probably support ACPI one day. The idea is to not to do
> > exercise again in future.
> > Moreover, there will be arch_numa_init which will be used by RISC-V
> > and there will be arm64_numa_init
> > used by arm64. However, if you feel strongly about it, I am happy to
> > move back those two functions to arm64.
>
> I don't have a strong view on this, only if there's a risk at some point
> of the implementations diverging (e.g. quirks). We can revisit it if
> that happens.
>
Sure. I seriously hope we don't have to deal with arch specific quirks
in future.
> It may be worth swapping patches 1 and 2 so that you don't have an
> arm64_* function in the core code after the first patch (more of a
> nitpick). Either way, feel free to add my ack on both patches:
>
Sure. I will swap 1 & 2 and resend the series.
> Acked-by: Catalin Marinas <[email protected]>
Thanks.
--
Regards,
Atish