2016-04-08 22:50:43

by David Daney

[permalink] [raw]
Subject: [PATCH v16 0/6] arm64, numa: Add numa support for arm64 platforms

From: David Daney <[email protected]>

v16:

- No functional change.

- Rebase to v4.6-rc2 to avoid merge conflicts.

v15:

- Make the distance-map node optional (again), if it is not in
the device tree, default values are used.

- Minor cleanups to of_numa.c as suggested by Rob Harring.

v14:
- Revised patch to unflatten the device tree earlier.

- Cleanups and added EXPORT_SYMBOL to of_numa.c as suggested
by Rob Harring

v13:
- Added patch to unflatten the device tree earlier.

- Rewrote of_numa.c to work on unflattened the device tree.

- Cleanup of EXPORTs in arch/arm64/mm/numa.c as suggested by
Will Deacon.

v12:

- Replaced 6 patches from Ard Biesheuvel with new simpler, and
more correct, single patch, also from Ard.

v11:
- Dropped cleanup patches for other architectures, they will be
submitted as a separate set after more testing.

- Added patch set from Ard Biesheuvel that are needed to make
the whole thing actually work. Previously this was a
separate set.

- Kconfig and other fixes and simplifications as suggested by
Rob Herring.

- Rearranged, refactored and reordered so that we don't patch
new files multiple times.

- Summary:

o 6 patches from Ard Biesheuvel to allow use of
"memory" nodes with efi stub.

o 2 patches to document and add of_numa.c

o 1 patch to add arm64 NUMA support.

o 1 patch to add NUMA balancing support for arm64.

v10:
- Incorporated review comments from Rob Herring.
- Moved numa binding and implementation to devicetree core.
- Added cleanup patch to remove redundant NODE_DATA macro from asm header files
- Include numa balancing support for arm64 patch in this series.
- Fix tile build issue reported by the kbuild robot(patch 7)

v9: - Added cleanup patch to reuse and avoid redefinition of cpumask_of_pcibus
as suggested from Will Deacon and Bjorn Helgaas.
- Including patch to Make pci-host-generic driver numa aware.
- Incorporated comment from Shannon Zhao.

v8:
- Incorporated review comments of Mark Rutland and Will Deacon.
- Added pci helper function and macro for numa.

v7:
- managing numa memory mapping using memblock.
- Incorporated review comments of Mark Rutland.

v6:
- defined and implemented the numa dt binding using
node property proximity and device node distance-map.
- renamed dt_numa to of_numa

v5:
- created base verion of numa.c which creates dummy numa without using dt
on single socket platforms. Then added patches for dt support.
- Incorporated review comments from Hanjun Guo.

v4:
done changes as per Arnd review comments.

v3:
Added changes to support numa on arm64 based platforms.
Tested these patches on cavium's multinode(2 node topology) platform.
In this patchset, defined and implemented dt bindings for numa mapping
for core and memory using device node property arm,associativity.

v2:
Defined and implemented numa map for memory, cores to node and
proximity distance matrix of nodes.

v1:
Initial patchset to support numa on arm64 platforms.

Note: 1. This patchset is tested for NUMA and without NUMA with dt
(both with and without NUMA bindings) on thunderx single
socket and dual socket boards.

Ard Biesheuvel (1):
efi: ARM/arm64: ignore DT memory nodes instead of removing them

David Daney (2):
of, numa: Add NUMA of binding implementation.
arm64: Move unflatten_device_tree() call earlier.

Ganapatrao Kulkarni (3):
Documentation, dt, numa: dt bindings for NUMA.
arm64, numa: Add NUMA support for arm64 platforms.
arm64, mm, numa: Add NUMA balancing support for arm64.

Documentation/devicetree/bindings/numa.txt | 275 ++++++++++++++++++++
arch/arm64/Kconfig | 27 ++
arch/arm64/include/asm/mmu.h | 1 +
arch/arm64/include/asm/mmzone.h | 12 +
arch/arm64/include/asm/numa.h | 45 ++++
arch/arm64/include/asm/pgtable.h | 15 ++
arch/arm64/include/asm/topology.h | 10 +
arch/arm64/kernel/pci.c | 10 +
arch/arm64/kernel/setup.c | 17 +-
arch/arm64/kernel/smp.c | 4 +
arch/arm64/mm/Makefile | 1 +
arch/arm64/mm/init.c | 35 ++-
arch/arm64/mm/mm.h | 1 -
arch/arm64/mm/mmu.c | 2 -
arch/arm64/mm/numa.c | 396 +++++++++++++++++++++++++++++
drivers/firmware/efi/arm-init.c | 8 +
drivers/firmware/efi/libstub/fdt.c | 24 +-
drivers/of/Kconfig | 3 +
drivers/of/Makefile | 1 +
drivers/of/of_numa.c | 211 +++++++++++++++
include/linux/of.h | 9 +
21 files changed, 1072 insertions(+), 35 deletions(-)
create mode 100644 Documentation/devicetree/bindings/numa.txt
create mode 100644 arch/arm64/include/asm/mmzone.h
create mode 100644 arch/arm64/include/asm/numa.h
create mode 100644 arch/arm64/mm/numa.c
create mode 100644 drivers/of/of_numa.c

--
1.8.3.1


2016-04-08 22:50:46

by David Daney

[permalink] [raw]
Subject: [PATCH v16 3/6] of, numa: Add NUMA of binding implementation.

From: David Daney <[email protected]>

Add device tree parsing for NUMA topology using device
"numa-node-id" property in distance-map and cpu nodes.

This is a complete rewrite of a previous patch by:
Ganapatrao Kulkarni<[email protected]>

Signed-off-by: David Daney <[email protected]>
Acked-by: Rob Herring <[email protected]>
---
drivers/of/Kconfig | 3 +
drivers/of/Makefile | 1 +
drivers/of/of_numa.c | 211 +++++++++++++++++++++++++++++++++++++++++++++++++++
include/linux/of.h | 9 +++
4 files changed, 224 insertions(+)
create mode 100644 drivers/of/of_numa.c

diff --git a/drivers/of/Kconfig b/drivers/of/Kconfig
index e2a4841..b3bec3a 100644
--- a/drivers/of/Kconfig
+++ b/drivers/of/Kconfig
@@ -112,4 +112,7 @@ config OF_OVERLAY
While this option is selected automatically when needed, you can
enable it manually to improve device tree unit test coverage.

+config OF_NUMA
+ bool
+
endif # OF
diff --git a/drivers/of/Makefile b/drivers/of/Makefile
index 156c072..bee3fa9 100644
--- a/drivers/of/Makefile
+++ b/drivers/of/Makefile
@@ -14,5 +14,6 @@ obj-$(CONFIG_OF_MTD) += of_mtd.o
obj-$(CONFIG_OF_RESERVED_MEM) += of_reserved_mem.o
obj-$(CONFIG_OF_RESOLVE) += resolver.o
obj-$(CONFIG_OF_OVERLAY) += overlay.o
+obj-$(CONFIG_OF_NUMA) += of_numa.o

obj-$(CONFIG_OF_UNITTEST) += unittest-data/
diff --git a/drivers/of/of_numa.c b/drivers/of/of_numa.c
new file mode 100644
index 0000000..0f2784b
--- /dev/null
+++ b/drivers/of/of_numa.c
@@ -0,0 +1,211 @@
+/*
+ * OF NUMA Parsing support.
+ *
+ * Copyright (C) 2015 - 2016 Cavium Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/of.h>
+#include <linux/of_address.h>
+#include <linux/nodemask.h>
+
+#include <asm/numa.h>
+
+/* define default numa node to 0 */
+#define DEFAULT_NODE 0
+
+/*
+ * Even though we connect cpus to numa domains later in SMP
+ * init, we need to know the node ids now for all cpus.
+*/
+static void __init of_numa_parse_cpu_nodes(void)
+{
+ u32 nid;
+ int r;
+ struct device_node *cpus;
+ struct device_node *np = NULL;
+
+ cpus = of_find_node_by_path("/cpus");
+ if (!cpus)
+ return;
+
+ for_each_child_of_node(cpus, np) {
+ /* Skip things that are not CPUs */
+ if (of_node_cmp(np->type, "cpu") != 0)
+ continue;
+
+ r = of_property_read_u32(np, "numa-node-id", &nid);
+ if (r)
+ continue;
+
+ pr_debug("NUMA: CPU on %u\n", nid);
+ if (nid >= MAX_NUMNODES)
+ pr_warn("NUMA: Node id %u exceeds maximum value\n",
+ nid);
+ else
+ node_set(nid, numa_nodes_parsed);
+ }
+}
+
+static int __init of_numa_parse_memory_nodes(void)
+{
+ struct device_node *np = NULL;
+ struct resource rsrc;
+ u32 nid;
+ int r = 0;
+
+ for (;;) {
+ np = of_find_node_by_type(np, "memory");
+ if (!np)
+ break;
+
+ r = of_property_read_u32(np, "numa-node-id", &nid);
+ if (r == -EINVAL)
+ /*
+ * property doesn't exist if -EINVAL, continue
+ * looking for more memory nodes with
+ * "numa-node-id" property
+ */
+ continue;
+ else if (r)
+ /* some other error */
+ break;
+
+ r = of_address_to_resource(np, 0, &rsrc);
+ if (r) {
+ pr_err("NUMA: bad reg property in memory node\n");
+ break;
+ }
+
+ pr_debug("NUMA: base = %llx len = %llx, node = %u\n",
+ rsrc.start, rsrc.end - rsrc.start + 1, nid);
+
+ r = numa_add_memblk(nid, rsrc.start,
+ rsrc.end - rsrc.start + 1);
+ if (r)
+ break;
+ }
+ of_node_put(np);
+
+ return r;
+}
+
+static int __init of_numa_parse_distance_map_v1(struct device_node *map)
+{
+ const __be32 *matrix;
+ int entry_count;
+ int i;
+
+ pr_info("NUMA: parsing numa-distance-map-v1\n");
+
+ matrix = of_get_property(map, "distance-matrix", NULL);
+ if (!matrix) {
+ pr_err("NUMA: No distance-matrix property in distance-map\n");
+ return -EINVAL;
+ }
+
+ entry_count = of_property_count_u32_elems(map, "distance-matrix");
+ if (entry_count <= 0) {
+ pr_err("NUMA: Invalid distance-matrix\n");
+ return -EINVAL;
+ }
+
+ for (i = 0; i + 2 < entry_count; i += 3) {
+ u32 nodea, nodeb, distance;
+
+ nodea = of_read_number(matrix, 1);
+ matrix++;
+ nodeb = of_read_number(matrix, 1);
+ matrix++;
+ distance = of_read_number(matrix, 1);
+ matrix++;
+
+ numa_set_distance(nodea, nodeb, distance);
+ pr_debug("NUMA: distance[node%d -> node%d] = %d\n",
+ nodea, nodeb, distance);
+
+ /* Set default distance of node B->A same as A->B */
+ if (nodeb > nodea)
+ numa_set_distance(nodeb, nodea, distance);
+ }
+
+ return 0;
+}
+
+static int __init of_numa_parse_distance_map(void)
+{
+ int ret = 0;
+ struct device_node *np;
+
+ np = of_find_compatible_node(NULL, NULL,
+ "numa-distance-map-v1");
+ if (np)
+ ret = of_numa_parse_distance_map_v1(np);
+
+ of_node_put(np);
+ return ret;
+}
+
+int of_node_to_nid(struct device_node *device)
+{
+ struct device_node *np;
+ u32 nid;
+ int r = -ENODATA;
+
+ np = of_node_get(device);
+
+ while (np) {
+ struct device_node *parent;
+
+ r = of_property_read_u32(np, "numa-node-id", &nid);
+ /*
+ * -EINVAL indicates the property was not found, and
+ * we walk up the tree trying to find a parent with a
+ * "numa-node-id". Any other type of error indicates
+ * a bad device tree and we give up.
+ */
+ if (r != -EINVAL)
+ break;
+
+ parent = of_get_parent(np);
+ of_node_put(np);
+ np = parent;
+ }
+ if (np && r)
+ pr_warn("NUMA: Invalid \"numa-node-id\" property in node %s\n",
+ np->name);
+ of_node_put(np);
+
+ if (!r) {
+ if (nid >= MAX_NUMNODES)
+ pr_warn("NUMA: Node id %u exceeds maximum value\n",
+ nid);
+ else
+ return nid;
+ }
+
+ return NUMA_NO_NODE;
+}
+EXPORT_SYMBOL(of_node_to_nid);
+
+int __init of_numa_init(void)
+{
+ int r;
+
+ of_numa_parse_cpu_nodes();
+ r = of_numa_parse_memory_nodes();
+ if (r)
+ return r;
+ return of_numa_parse_distance_map();
+}
diff --git a/include/linux/of.h b/include/linux/of.h
index 7fcb681..76f07c8 100644
--- a/include/linux/of.h
+++ b/include/linux/of.h
@@ -685,6 +685,15 @@ static inline int of_node_to_nid(struct device_node *device)
}
#endif

+#ifdef CONFIG_OF_NUMA
+extern int of_numa_init(void);
+#else
+static inline int of_numa_init(void)
+{
+ return -ENOSYS;
+}
+#endif
+
static inline struct device_node *of_find_matching_node(
struct device_node *from,
const struct of_device_id *matches)
--
1.8.3.1

2016-04-08 22:50:53

by David Daney

[permalink] [raw]
Subject: [PATCH v16 5/6] arm64, numa: Add NUMA support for arm64 platforms.

From: Ganapatrao Kulkarni <[email protected]>

Attempt to get the memory and CPU NUMA node via of_numa. If that
fails, default the dummy NUMA node and map all memory and CPUs to node
0.

Tested-by: Shannon Zhao <[email protected]>
Reviewed-by: Robert Richter <[email protected]>
Signed-off-by: Ganapatrao Kulkarni <[email protected]>
Signed-off-by: David Daney <[email protected]>
---
arch/arm64/Kconfig | 26 +++
arch/arm64/include/asm/mmzone.h | 12 ++
arch/arm64/include/asm/numa.h | 45 +++++
arch/arm64/include/asm/topology.h | 10 +
arch/arm64/kernel/pci.c | 10 +
arch/arm64/kernel/setup.c | 4 +
arch/arm64/kernel/smp.c | 4 +
arch/arm64/mm/Makefile | 1 +
arch/arm64/mm/init.c | 35 +++-
arch/arm64/mm/numa.c | 396 ++++++++++++++++++++++++++++++++++++++
10 files changed, 538 insertions(+), 5 deletions(-)
create mode 100644 arch/arm64/include/asm/mmzone.h
create mode 100644 arch/arm64/include/asm/numa.h
create mode 100644 arch/arm64/mm/numa.c

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 4f43622..99f9b55 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -76,6 +76,7 @@ config ARM64
select HAVE_HW_BREAKPOINT if PERF_EVENTS
select HAVE_IRQ_TIME_ACCOUNTING
select HAVE_MEMBLOCK
+ select HAVE_MEMBLOCK_NODE_MAP if NUMA
select HAVE_PATA_PLATFORM
select HAVE_PERF_EVENTS
select HAVE_PERF_REGS
@@ -98,6 +99,7 @@ config ARM64
select SYSCTL_EXCEPTION_TRACE
select HAVE_CONTEXT_TRACKING
select HAVE_ARM_SMCCC
+ select OF_NUMA if NUMA && OF
help
ARM 64-bit (AArch64) Linux support.

@@ -546,6 +548,30 @@ config HOTPLUG_CPU
Say Y here to experiment with turning CPUs off and on. CPUs
can be controlled through /sys/devices/system/cpu.

+# Common NUMA Features
+config NUMA
+ bool "Numa Memory Allocation and Scheduler Support"
+ depends on SMP
+ help
+ Enable NUMA (Non Uniform Memory Access) support.
+
+ The kernel will try to allocate memory used by a CPU on the
+ local memory of the CPU and add some more
+ NUMA awareness to the kernel.
+
+config NODES_SHIFT
+ int "Maximum NUMA Nodes (as a power of 2)"
+ range 1 10
+ default "2"
+ depends on NEED_MULTIPLE_NODES
+ help
+ Specify the maximum number of NUMA Nodes available on the target
+ system. Increases memory reserved to accommodate various tables.
+
+config USE_PERCPU_NUMA_NODE_ID
+ def_bool y
+ depends on NUMA
+
source kernel/Kconfig.preempt
source kernel/Kconfig.hz

diff --git a/arch/arm64/include/asm/mmzone.h b/arch/arm64/include/asm/mmzone.h
new file mode 100644
index 0000000..a0de9e6
--- /dev/null
+++ b/arch/arm64/include/asm/mmzone.h
@@ -0,0 +1,12 @@
+#ifndef __ASM_MMZONE_H
+#define __ASM_MMZONE_H
+
+#ifdef CONFIG_NUMA
+
+#include <asm/numa.h>
+
+extern struct pglist_data *node_data[];
+#define NODE_DATA(nid) (node_data[(nid)])
+
+#endif /* CONFIG_NUMA */
+#endif /* __ASM_MMZONE_H */
diff --git a/arch/arm64/include/asm/numa.h b/arch/arm64/include/asm/numa.h
new file mode 100644
index 0000000..e9b4f29
--- /dev/null
+++ b/arch/arm64/include/asm/numa.h
@@ -0,0 +1,45 @@
+#ifndef __ASM_NUMA_H
+#define __ASM_NUMA_H
+
+#include <asm/topology.h>
+
+#ifdef CONFIG_NUMA
+
+/* currently, arm64 implements flat NUMA topology */
+#define parent_node(node) (node)
+
+int __node_distance(int from, int to);
+#define node_distance(a, b) __node_distance(a, b)
+
+extern nodemask_t numa_nodes_parsed __initdata;
+
+/* Mappings between node number and cpus on that node. */
+extern cpumask_var_t node_to_cpumask_map[MAX_NUMNODES];
+void numa_clear_node(unsigned int cpu);
+
+#ifdef CONFIG_DEBUG_PER_CPU_MAPS
+const struct cpumask *cpumask_of_node(int node);
+#else
+/* Returns a pointer to the cpumask of CPUs on Node 'node'. */
+static inline const struct cpumask *cpumask_of_node(int node)
+{
+ return node_to_cpumask_map[node];
+}
+#endif
+
+void __init arm64_numa_init(void);
+int __init numa_add_memblk(int nodeid, u64 start, u64 end);
+void __init numa_set_distance(int from, int to, int distance);
+void __init numa_free_distance(void);
+void __init early_map_cpu_to_node(unsigned int cpu, int nid);
+void numa_store_cpu_info(unsigned int cpu);
+
+#else /* CONFIG_NUMA */
+
+static inline void numa_store_cpu_info(unsigned int cpu) { }
+static inline void arm64_numa_init(void) { }
+static inline void early_map_cpu_to_node(unsigned int cpu, int nid) { }
+
+#endif /* CONFIG_NUMA */
+
+#endif /* __ASM_NUMA_H */
diff --git a/arch/arm64/include/asm/topology.h b/arch/arm64/include/asm/topology.h
index a3e9d6f..8b57339 100644
--- a/arch/arm64/include/asm/topology.h
+++ b/arch/arm64/include/asm/topology.h
@@ -22,6 +22,16 @@ void init_cpu_topology(void);
void store_cpu_topology(unsigned int cpuid);
const struct cpumask *cpu_coregroup_mask(int cpu);

+#ifdef CONFIG_NUMA
+
+struct pci_bus;
+int pcibus_to_node(struct pci_bus *bus);
+#define cpumask_of_pcibus(bus) (pcibus_to_node(bus) == -1 ? \
+ cpu_all_mask : \
+ cpumask_of_node(pcibus_to_node(bus)))
+
+#endif /* CONFIG_NUMA */
+
#include <asm-generic/topology.h>

#endif /* _ASM_ARM_TOPOLOGY_H */
diff --git a/arch/arm64/kernel/pci.c b/arch/arm64/kernel/pci.c
index c72de66..3c4e308 100644
--- a/arch/arm64/kernel/pci.c
+++ b/arch/arm64/kernel/pci.c
@@ -74,6 +74,16 @@ int raw_pci_write(unsigned int domain, unsigned int bus,
return -ENXIO;
}

+#ifdef CONFIG_NUMA
+
+int pcibus_to_node(struct pci_bus *bus)
+{
+ return dev_to_node(&bus->dev);
+}
+EXPORT_SYMBOL(pcibus_to_node);
+
+#endif
+
#ifdef CONFIG_ACPI
/* Root bridge scanning */
struct pci_bus *pci_acpi_scan_root(struct acpi_pci_root *root)
diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index 9bd237e..0ad4b77 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c
@@ -53,6 +53,7 @@
#include <asm/cpufeature.h>
#include <asm/cpu_ops.h>
#include <asm/kasan.h>
+#include <asm/numa.h>
#include <asm/sections.h>
#include <asm/setup.h>
#include <asm/smp_plat.h>
@@ -384,6 +385,9 @@ static int __init topology_init(void)
{
int i;

+ for_each_online_node(i)
+ register_one_node(i);
+
for_each_possible_cpu(i) {
struct cpu *cpu = &per_cpu(cpu_data.cpu, i);
cpu->hotpluggable = 1;
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index b2d5f4e..bebc4c6 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -45,6 +45,7 @@
#include <asm/cputype.h>
#include <asm/cpu_ops.h>
#include <asm/mmu_context.h>
+#include <asm/numa.h>
#include <asm/pgtable.h>
#include <asm/pgalloc.h>
#include <asm/processor.h>
@@ -166,6 +167,7 @@ int __cpu_up(unsigned int cpu, struct task_struct *idle)
static void smp_store_cpu_info(unsigned int cpuid)
{
store_cpu_topology(cpuid);
+ numa_store_cpu_info(cpuid);
}

/*
@@ -595,6 +597,8 @@ static void __init of_parse_and_init_cpus(void)

pr_debug("cpu logical map 0x%llx\n", hwid);
cpu_logical_map(cpu_count) = hwid;
+
+ early_map_cpu_to_node(cpu_count, of_node_to_nid(dn));
next:
cpu_count++;
}
diff --git a/arch/arm64/mm/Makefile b/arch/arm64/mm/Makefile
index 57f57fd..54bb209 100644
--- a/arch/arm64/mm/Makefile
+++ b/arch/arm64/mm/Makefile
@@ -4,6 +4,7 @@ obj-y := dma-mapping.o extable.o fault.o init.o \
context.o proc.o pageattr.o
obj-$(CONFIG_HUGETLB_PAGE) += hugetlbpage.o
obj-$(CONFIG_ARM64_PTDUMP) += dump.o
+obj-$(CONFIG_NUMA) += numa.o

obj-$(CONFIG_KASAN) += kasan_init.o
KASAN_SANITIZE_kasan_init.o := n
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index ea989d8..b1ff151 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -40,6 +40,7 @@
#include <asm/kasan.h>
#include <asm/kernel-pgtable.h>
#include <asm/memory.h>
+#include <asm/numa.h>
#include <asm/sections.h>
#include <asm/setup.h>
#include <asm/sizes.h>
@@ -86,6 +87,21 @@ static phys_addr_t __init max_zone_dma_phys(void)
return min(offset + (1ULL << 32), memblock_end_of_DRAM());
}

+#ifdef CONFIG_NUMA
+
+static void __init zone_sizes_init(unsigned long min, unsigned long max)
+{
+ unsigned long max_zone_pfns[MAX_NR_ZONES] = {0};
+
+ if (IS_ENABLED(CONFIG_ZONE_DMA))
+ max_zone_pfns[ZONE_DMA] = PFN_DOWN(max_zone_dma_phys());
+ max_zone_pfns[ZONE_NORMAL] = max;
+
+ free_area_init_nodes(max_zone_pfns);
+}
+
+#else
+
static void __init zone_sizes_init(unsigned long min, unsigned long max)
{
struct memblock_region *reg;
@@ -126,6 +142,8 @@ static void __init zone_sizes_init(unsigned long min, unsigned long max)
free_area_init_node(0, zone_size, min, zhole_size);
}

+#endif /* CONFIG_NUMA */
+
#ifdef CONFIG_HAVE_ARCH_PFN_VALID
int pfn_valid(unsigned long pfn)
{
@@ -142,10 +160,15 @@ static void __init arm64_memory_present(void)
static void __init arm64_memory_present(void)
{
struct memblock_region *reg;
+ int nid = 0;

- for_each_memblock(memory, reg)
- memory_present(0, memblock_region_memory_base_pfn(reg),
- memblock_region_memory_end_pfn(reg));
+ for_each_memblock(memory, reg) {
+#ifdef CONFIG_NUMA
+ nid = reg->nid;
+#endif
+ memory_present(nid, memblock_region_memory_base_pfn(reg),
+ memblock_region_memory_end_pfn(reg));
+ }
}
#endif

@@ -245,7 +268,6 @@ void __init arm64_memblock_init(void)
dma_contiguous_reserve(arm64_dma_phys_limit);

memblock_allow_resize();
- memblock_dump_all();
}

void __init bootmem_init(void)
@@ -257,6 +279,9 @@ void __init bootmem_init(void)

early_memtest(min << PAGE_SHIFT, max << PAGE_SHIFT);

+ max_pfn = max_low_pfn = max;
+
+ arm64_numa_init();
/*
* Sparsemem tries to allocate bootmem in memory_present(), so must be
* done after the fixed reservations.
@@ -267,7 +292,7 @@ void __init bootmem_init(void)
zone_sizes_init(min, max);

high_memory = __va((max << PAGE_SHIFT) - 1) + 1;
- max_pfn = max_low_pfn = max;
+ memblock_dump_all();
}

#ifndef CONFIG_SPARSEMEM_VMEMMAP
diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
new file mode 100644
index 0000000..98dc104
--- /dev/null
+++ b/arch/arm64/mm/numa.c
@@ -0,0 +1,396 @@
+/*
+ * NUMA support, based on the x86 implementation.
+ *
+ * Copyright (C) 2015 Cavium Inc.
+ * Author: Ganapatrao Kulkarni <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/bootmem.h>
+#include <linux/memblock.h>
+#include <linux/module.h>
+#include <linux/of.h>
+
+struct pglist_data *node_data[MAX_NUMNODES] __read_mostly;
+EXPORT_SYMBOL(node_data);
+nodemask_t numa_nodes_parsed __initdata;
+static int cpu_to_node_map[NR_CPUS] = { [0 ... NR_CPUS-1] = NUMA_NO_NODE };
+
+static int numa_distance_cnt;
+static u8 *numa_distance;
+static int numa_off;
+
+static __init int numa_parse_early_param(char *opt)
+{
+ if (!opt)
+ return -EINVAL;
+ if (!strncmp(opt, "off", 3)) {
+ pr_info("%s\n", "NUMA turned off");
+ numa_off = 1;
+ }
+ return 0;
+}
+early_param("numa", numa_parse_early_param);
+
+cpumask_var_t node_to_cpumask_map[MAX_NUMNODES];
+EXPORT_SYMBOL(node_to_cpumask_map);
+
+#ifdef CONFIG_DEBUG_PER_CPU_MAPS
+
+/*
+ * Returns a pointer to the bitmask of CPUs on Node 'node'.
+ */
+const struct cpumask *cpumask_of_node(int node)
+{
+ if (WARN_ON(node >= nr_node_ids))
+ return cpu_none_mask;
+
+ if (WARN_ON(node_to_cpumask_map[node] == NULL))
+ return cpu_online_mask;
+
+ return node_to_cpumask_map[node];
+}
+EXPORT_SYMBOL(cpumask_of_node);
+
+#endif
+
+static void map_cpu_to_node(unsigned int cpu, int nid)
+{
+ set_cpu_numa_node(cpu, nid);
+ if (nid >= 0)
+ cpumask_set_cpu(cpu, node_to_cpumask_map[nid]);
+}
+
+void numa_clear_node(unsigned int cpu)
+{
+ int nid = cpu_to_node(cpu);
+
+ if (nid >= 0)
+ cpumask_clear_cpu(cpu, node_to_cpumask_map[nid]);
+ set_cpu_numa_node(cpu, NUMA_NO_NODE);
+}
+
+/*
+ * Allocate node_to_cpumask_map based on number of available nodes
+ * Requires node_possible_map to be valid.
+ *
+ * Note: cpumask_of_node() is not valid until after this is done.
+ * (Use CONFIG_DEBUG_PER_CPU_MAPS to check this.)
+ */
+static void __init setup_node_to_cpumask_map(void)
+{
+ unsigned int cpu;
+ int node;
+
+ /* setup nr_node_ids if not done yet */
+ if (nr_node_ids == MAX_NUMNODES)
+ setup_nr_node_ids();
+
+ /* allocate and clear the mapping */
+ for (node = 0; node < nr_node_ids; node++) {
+ alloc_bootmem_cpumask_var(&node_to_cpumask_map[node]);
+ cpumask_clear(node_to_cpumask_map[node]);
+ }
+
+ for_each_possible_cpu(cpu)
+ set_cpu_numa_node(cpu, NUMA_NO_NODE);
+
+ /* cpumask_of_node() will now work */
+ pr_debug("NUMA: Node to cpumask map for %d nodes\n", nr_node_ids);
+}
+
+/*
+ * Set the cpu to node and mem mapping
+ */
+void numa_store_cpu_info(unsigned int cpu)
+{
+ map_cpu_to_node(cpu, numa_off ? 0 : cpu_to_node_map[cpu]);
+}
+
+void __init early_map_cpu_to_node(unsigned int cpu, int nid)
+{
+ /* fallback to node 0 */
+ if (nid < 0 || nid >= MAX_NUMNODES)
+ nid = 0;
+
+ cpu_to_node_map[cpu] = nid;
+}
+
+/**
+ * numa_add_memblk - Set node id to memblk
+ * @nid: NUMA node ID of the new memblk
+ * @start: Start address of the new memblk
+ * @size: Size of the new memblk
+ *
+ * RETURNS:
+ * 0 on success, -errno on failure.
+ */
+int __init numa_add_memblk(int nid, u64 start, u64 size)
+{
+ int ret;
+
+ ret = memblock_set_node(start, size, &memblock.memory, nid);
+ if (ret < 0) {
+ pr_err("NUMA: memblock [0x%llx - 0x%llx] failed to add on node %d\n",
+ start, (start + size - 1), nid);
+ return ret;
+ }
+
+ node_set(nid, numa_nodes_parsed);
+ pr_info("NUMA: Adding memblock [0x%llx - 0x%llx] on node %d\n",
+ start, (start + size - 1), nid);
+ return ret;
+}
+
+/**
+ * Initialize NODE_DATA for a node on the local memory
+ */
+static void __init setup_node_data(int nid, u64 start_pfn, u64 end_pfn)
+{
+ const size_t nd_size = roundup(sizeof(pg_data_t), SMP_CACHE_BYTES);
+ u64 nd_pa;
+ void *nd;
+ int tnid;
+
+ pr_info("NUMA: Initmem setup node %d [mem %#010Lx-%#010Lx]\n",
+ nid, start_pfn << PAGE_SHIFT,
+ (end_pfn << PAGE_SHIFT) - 1);
+
+ nd_pa = memblock_alloc_try_nid(nd_size, SMP_CACHE_BYTES, nid);
+ nd = __va(nd_pa);
+
+ /* report and initialize */
+ pr_info("NUMA: NODE_DATA [mem %#010Lx-%#010Lx]\n",
+ nd_pa, nd_pa + nd_size - 1);
+ tnid = early_pfn_to_nid(nd_pa >> PAGE_SHIFT);
+ if (tnid != nid)
+ pr_info("NUMA: NODE_DATA(%d) on node %d\n", nid, tnid);
+
+ node_data[nid] = nd;
+ memset(NODE_DATA(nid), 0, sizeof(pg_data_t));
+ NODE_DATA(nid)->node_id = nid;
+ NODE_DATA(nid)->node_start_pfn = start_pfn;
+ NODE_DATA(nid)->node_spanned_pages = end_pfn - start_pfn;
+}
+
+/**
+ * numa_free_distance
+ *
+ * The current table is freed.
+ */
+void __init numa_free_distance(void)
+{
+ size_t size;
+
+ if (!numa_distance)
+ return;
+
+ size = numa_distance_cnt * numa_distance_cnt *
+ sizeof(numa_distance[0]);
+
+ memblock_free(__pa(numa_distance), size);
+ numa_distance_cnt = 0;
+ numa_distance = NULL;
+}
+
+/**
+ *
+ * Create a new NUMA distance table.
+ *
+ */
+static int __init numa_alloc_distance(void)
+{
+ size_t size;
+ u64 phys;
+ int i, j;
+
+ size = nr_node_ids * nr_node_ids * sizeof(numa_distance[0]);
+ phys = memblock_find_in_range(0, PFN_PHYS(max_pfn),
+ size, PAGE_SIZE);
+ if (WARN_ON(!phys))
+ return -ENOMEM;
+
+ memblock_reserve(phys, size);
+
+ numa_distance = __va(phys);
+ numa_distance_cnt = nr_node_ids;
+
+ /* fill with the default distances */
+ for (i = 0; i < numa_distance_cnt; i++)
+ for (j = 0; j < numa_distance_cnt; j++)
+ numa_distance[i * numa_distance_cnt + j] = i == j ?
+ LOCAL_DISTANCE : REMOTE_DISTANCE;
+
+ pr_debug("NUMA: Initialized distance table, cnt=%d\n",
+ numa_distance_cnt);
+
+ return 0;
+}
+
+/**
+ * numa_set_distance - Set inter node NUMA distance from node to node.
+ * @from: the 'from' node to set distance
+ * @to: the 'to' node to set distance
+ * @distance: NUMA distance
+ *
+ * Set the distance from node @from to @to to @distance.
+ * If distance table doesn't exist, a warning is printed.
+ *
+ * If @from or @to is higher than the highest known node or lower than zero
+ * or @distance doesn't make sense, the call is ignored.
+ *
+ */
+void __init numa_set_distance(int from, int to, int distance)
+{
+ if (!numa_distance) {
+ pr_warn_once("NUMA: Warning: distance table not allocated yet\n");
+ return;
+ }
+
+ if (from >= numa_distance_cnt || to >= numa_distance_cnt ||
+ from < 0 || to < 0) {
+ pr_warn_once("NUMA: Warning: node ids are out of bound, from=%d to=%d distance=%d\n",
+ from, to, distance);
+ return;
+ }
+
+ if ((u8)distance != distance ||
+ (from == to && distance != LOCAL_DISTANCE)) {
+ pr_warn_once("NUMA: Warning: invalid distance parameter, from=%d to=%d distance=%d\n",
+ from, to, distance);
+ return;
+ }
+
+ numa_distance[from * numa_distance_cnt + to] = distance;
+}
+
+/**
+ * Return NUMA distance @from to @to
+ */
+int __node_distance(int from, int to)
+{
+ if (from >= numa_distance_cnt || to >= numa_distance_cnt)
+ return from == to ? LOCAL_DISTANCE : REMOTE_DISTANCE;
+ return numa_distance[from * numa_distance_cnt + to];
+}
+EXPORT_SYMBOL(__node_distance);
+
+static int __init numa_register_nodes(void)
+{
+ int nid;
+ struct memblock_region *mblk;
+
+ /* Check that valid nid is set to memblks */
+ for_each_memblock(memory, mblk)
+ if (mblk->nid == NUMA_NO_NODE || mblk->nid >= MAX_NUMNODES) {
+ pr_warn("NUMA: Warning: invalid memblk node %d [mem %#010Lx-%#010Lx]\n",
+ mblk->nid, mblk->base,
+ mblk->base + mblk->size - 1);
+ return -EINVAL;
+ }
+
+ /* Finally register nodes. */
+ for_each_node_mask(nid, numa_nodes_parsed) {
+ unsigned long start_pfn, end_pfn;
+
+ get_pfn_range_for_nid(nid, &start_pfn, &end_pfn);
+ setup_node_data(nid, start_pfn, end_pfn);
+ node_set_online(nid);
+ }
+
+ /* Setup online nodes to actual nodes*/
+ node_possible_map = numa_nodes_parsed;
+
+ return 0;
+}
+
+static int __init numa_init(int (*init_func)(void))
+{
+ int ret;
+
+ nodes_clear(numa_nodes_parsed);
+ nodes_clear(node_possible_map);
+ nodes_clear(node_online_map);
+ numa_free_distance();
+
+ ret = numa_alloc_distance();
+ if (ret < 0)
+ return ret;
+
+ ret = init_func();
+ if (ret < 0)
+ return ret;
+
+ if (nodes_empty(numa_nodes_parsed))
+ return -EINVAL;
+
+ ret = numa_register_nodes();
+ if (ret < 0)
+ return ret;
+
+ setup_node_to_cpumask_map();
+
+ /* init boot processor */
+ cpu_to_node_map[0] = 0;
+ map_cpu_to_node(0, 0);
+
+ return 0;
+}
+
+/**
+ * dummy_numa_init - Fallback dummy NUMA init
+ *
+ * Used if there's no underlying NUMA architecture, NUMA initialization
+ * fails, or NUMA is disabled on the command line.
+ *
+ * Must online at least one node (node 0) and add memory blocks that cover all
+ * allowed memory. It is unlikely that this function fails.
+ */
+static int __init dummy_numa_init(void)
+{
+ int ret;
+ struct memblock_region *mblk;
+
+ pr_info("%s\n", "No NUMA configuration found");
+ pr_info("NUMA: Faking a node at [mem %#018Lx-%#018Lx]\n",
+ 0LLU, PFN_PHYS(max_pfn) - 1);
+
+ for_each_memblock(memory, mblk) {
+ ret = numa_add_memblk(0, mblk->base, mblk->size);
+ if (!ret)
+ continue;
+
+ pr_err("NUMA init failed\n");
+ return ret;
+ }
+
+ numa_off = 1;
+ return 0;
+}
+
+/**
+ * arm64_numa_init - Initialize NUMA
+ *
+ * Try each configured NUMA initialization method until one succeeds. The
+ * last fallback is dummy single node config encomapssing whole memory.
+ */
+void __init arm64_numa_init(void)
+{
+ if (!numa_off) {
+ if (!numa_init(of_numa_init))
+ return;
+ }
+
+ numa_init(dummy_numa_init);
+}
--
1.8.3.1

2016-04-08 22:50:51

by David Daney

[permalink] [raw]
Subject: [PATCH v16 4/6] arm64: Move unflatten_device_tree() call earlier.

From: David Daney <[email protected]>

In order to extract NUMA information from the device tree, we need to
have the tree in its unflattened form.

Move the call to bootmem_init() in the tail of paging_init() into
setup_arch, and adjust header files so that its declaration is
visible.

Move the unflatten_device_tree() call between the calls to
paging_init() and bootmem_init(). Follow on patches add NUMA handling
to bootmem_init().

Signed-off-by: David Daney <[email protected]>
---
arch/arm64/include/asm/mmu.h | 1 +
arch/arm64/kernel/setup.c | 13 +++++++++----
arch/arm64/mm/mm.h | 1 -
arch/arm64/mm/mmu.c | 2 --
4 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/include/asm/mmu.h b/arch/arm64/include/asm/mmu.h
index 990124a..97b1d8f 100644
--- a/arch/arm64/include/asm/mmu.h
+++ b/arch/arm64/include/asm/mmu.h
@@ -29,6 +29,7 @@ typedef struct {
#define ASID(mm) ((mm)->context.id.counter & 0xffff)

extern void paging_init(void);
+extern void bootmem_init(void);
extern void __iomem *early_io_map(phys_addr_t phys, unsigned long virt);
extern void init_mem_pgprot(void);
extern void create_pgd_mapping(struct mm_struct *mm, phys_addr_t phys,
diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index 9dc6776..9bd237e 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c
@@ -327,6 +327,12 @@ void __init setup_arch(char **cmdline_p)
acpi_boot_table_init();

paging_init();
+
+ if (acpi_disabled)
+ unflatten_device_tree();
+
+ bootmem_init();
+
relocate_initrd();

kasan_init();
@@ -335,12 +341,11 @@ void __init setup_arch(char **cmdline_p)

early_ioremap_reset();

- if (acpi_disabled) {
- unflatten_device_tree();
+ if (acpi_disabled)
psci_dt_init();
- } else {
+ else
psci_acpi_init();
- }
+
xen_early_init();

cpu_read_bootcpu_ops();
diff --git a/arch/arm64/mm/mm.h b/arch/arm64/mm/mm.h
index ef47d99..71fe989 100644
--- a/arch/arm64/mm/mm.h
+++ b/arch/arm64/mm/mm.h
@@ -1,3 +1,2 @@
-extern void __init bootmem_init(void);

void fixup_init(void);
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index f3e5c74..267903b 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -564,8 +564,6 @@ void __init paging_init(void)
*/
memblock_free(__pa(swapper_pg_dir) + PAGE_SIZE,
SWAPPER_DIR_SIZE - PAGE_SIZE);
-
- bootmem_init();
}

/*
--
1.8.3.1

2016-04-08 22:51:20

by David Daney

[permalink] [raw]
Subject: [PATCH v16 6/6] arm64, mm, numa: Add NUMA balancing support for arm64.

From: Ganapatrao Kulkarni <[email protected]>

Enable NUMA balancing for arm64 platforms.
Add pte, pmd protnone helpers for use by automatic NUMA balancing.

Reviewed-by: Robert Richter <[email protected]>
Signed-off-by: Ganapatrao Kulkarni <[email protected]>
Signed-off-by: David Daney <[email protected]>
---
arch/arm64/Kconfig | 1 +
arch/arm64/include/asm/pgtable.h | 15 +++++++++++++++
2 files changed, 16 insertions(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 99f9b55..a578080 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -11,6 +11,7 @@ config ARM64
select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
select ARCH_USE_CMPXCHG_LOCKREF
select ARCH_SUPPORTS_ATOMIC_RMW
+ select ARCH_SUPPORTS_NUMA_BALANCING
select ARCH_WANT_OPTIONAL_GPIOLIB
select ARCH_WANT_COMPAT_IPC_PARSE_VERSION
select ARCH_WANT_FRAME_POINTERS
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 989fef1..89b8f20 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -272,6 +272,21 @@ static inline pgprot_t mk_sect_prot(pgprot_t prot)
return __pgprot(pgprot_val(prot) & ~PTE_TABLE_BIT);
}

+#ifdef CONFIG_NUMA_BALANCING
+/*
+ * See the comment in include/asm-generic/pgtable.h
+ */
+static inline int pte_protnone(pte_t pte)
+{
+ return (pte_val(pte) & (PTE_VALID | PTE_PROT_NONE)) == PTE_PROT_NONE;
+}
+
+static inline int pmd_protnone(pmd_t pmd)
+{
+ return pte_protnone(pmd_pte(pmd));
+}
+#endif
+
/*
* THP definitions.
*/
--
1.8.3.1

2016-04-08 22:51:46

by David Daney

[permalink] [raw]
Subject: [PATCH v16 1/6] efi: ARM/arm64: ignore DT memory nodes instead of removing them

From: Ard Biesheuvel <[email protected]>

There are two problems with the UEFI stub DT memory node removal
routine:
- it deletes nodes as it traverses the tree, which happens to work
but is not supported, as deletion invalidates the node iterator;
- deleting memory nodes entirely may discard annotations in the form
of additional properties on the nodes.

Since the discovery of DT memory nodes occurs strictly before the
UEFI init sequence, we can simply clear the memblock memory table
before parsing the UEFI memory map. This way, it is no longer
necessary to remove the nodes, so we can remove that logic from the
stub as well.

Signed-off-by: Ard Biesheuvel <[email protected]>
Signed-off-by: David Daney <[email protected]>
---
drivers/firmware/efi/arm-init.c | 8 ++++++++
drivers/firmware/efi/libstub/fdt.c | 24 +-----------------------
2 files changed, 9 insertions(+), 23 deletions(-)

diff --git a/drivers/firmware/efi/arm-init.c b/drivers/firmware/efi/arm-init.c
index aa1f743..5d6945b 100644
--- a/drivers/firmware/efi/arm-init.c
+++ b/drivers/firmware/efi/arm-init.c
@@ -143,6 +143,14 @@ static __init void reserve_regions(void)
if (efi_enabled(EFI_DBG))
pr_info("Processing EFI memory map:\n");

+ /*
+ * Discard memblocks discovered so far: if there are any at this
+ * point, they originate from memory nodes in the DT, and UEFI
+ * uses its own memory map instead.
+ */
+ memblock_dump_all();
+ memblock_remove(0, ULLONG_MAX);
+
for_each_efi_memory_desc(&memmap, md) {
paddr = md->phys_addr;
npages = md->num_pages;
diff --git a/drivers/firmware/efi/libstub/fdt.c b/drivers/firmware/efi/libstub/fdt.c
index 6dba78a..e58abfa 100644
--- a/drivers/firmware/efi/libstub/fdt.c
+++ b/drivers/firmware/efi/libstub/fdt.c
@@ -24,7 +24,7 @@ efi_status_t update_fdt(efi_system_table_t *sys_table, void *orig_fdt,
unsigned long map_size, unsigned long desc_size,
u32 desc_ver)
{
- int node, prev, num_rsv;
+ int node, num_rsv;
int status;
u32 fdt_val32;
u64 fdt_val64;
@@ -54,28 +54,6 @@ efi_status_t update_fdt(efi_system_table_t *sys_table, void *orig_fdt,
goto fdt_set_fail;

/*
- * Delete any memory nodes present. We must delete nodes which
- * early_init_dt_scan_memory may try to use.
- */
- prev = 0;
- for (;;) {
- const char *type;
- int len;
-
- node = fdt_next_node(fdt, prev, NULL);
- if (node < 0)
- break;
-
- type = fdt_getprop(fdt, node, "device_type", &len);
- if (type && strncmp(type, "memory", len) == 0) {
- fdt_del_node(fdt, node);
- continue;
- }
-
- prev = node;
- }
-
- /*
* Delete all memory reserve map entries. When booting via UEFI,
* kernel will use the UEFI memory map to find reserved regions.
*/
--
1.8.3.1

2016-04-08 22:52:03

by David Daney

[permalink] [raw]
Subject: [PATCH v16 2/6] Documentation, dt, numa: dt bindings for NUMA.

From: Ganapatrao Kulkarni <[email protected]>

Add DT bindings for numa mapping of memory, CPUs and IOs.

Reviewed-by: Robert Richter <[email protected]>
Signed-off-by: Ganapatrao Kulkarni <[email protected]>
Signed-off-by: David Daney <[email protected]>
Acked-by: Rob Herring <[email protected]>
---
Documentation/devicetree/bindings/numa.txt | 275 +++++++++++++++++++++++++++++
1 file changed, 275 insertions(+)
create mode 100644 Documentation/devicetree/bindings/numa.txt

diff --git a/Documentation/devicetree/bindings/numa.txt b/Documentation/devicetree/bindings/numa.txt
new file mode 100644
index 0000000..21b3505
--- /dev/null
+++ b/Documentation/devicetree/bindings/numa.txt
@@ -0,0 +1,275 @@
+==============================================================================
+NUMA binding description.
+==============================================================================
+
+==============================================================================
+1 - Introduction
+==============================================================================
+
+Systems employing a Non Uniform Memory Access (NUMA) architecture contain
+collections of hardware resources including processors, memory, and I/O buses,
+that comprise what is commonly known as a NUMA node.
+Processor accesses to memory within the local NUMA node is generally faster
+than processor accesses to memory outside of the local NUMA node.
+DT defines interfaces that allow the platform to convey NUMA node
+topology information to OS.
+
+==============================================================================
+2 - numa-node-id
+==============================================================================
+
+For the purpose of identification, each NUMA node is associated with a unique
+token known as a node id. For the purpose of this binding
+a node id is a 32-bit integer.
+
+A device node is associated with a NUMA node by the presence of a
+numa-node-id property which contains the node id of the device.
+
+Example:
+ /* numa node 0 */
+ numa-node-id = <0>;
+
+ /* numa node 1 */
+ numa-node-id = <1>;
+
+==============================================================================
+3 - distance-map
+==============================================================================
+
+The optional device tree node distance-map describes the relative
+distance (memory latency) between all numa nodes.
+
+- compatible : Should at least contain "numa-distance-map-v1".
+
+- distance-matrix
+ This property defines a matrix to describe the relative distances
+ between all numa nodes.
+ It is represented as a list of node pairs and their relative distance.
+
+ Note:
+ 1. Each entry represents distance from first node to second node.
+ The distances are equal in either direction.
+ 2. The distance from a node to self (local distance) is represented
+ with value 10 and all internode distance should be represented with
+ a value greater than 10.
+ 3. distance-matrix should have entries in lexicographical ascending
+ order of nodes.
+ 4. There must be only one device node distance-map which must
+ reside in the root node.
+ 5. If the distance-map node is not present, a default
+ distance-matrix is used.
+
+Example:
+ 4 nodes connected in mesh/ring topology as below,
+
+ 0_______20______1
+ | |
+ | |
+ 20 20
+ | |
+ | |
+ |_______________|
+ 3 20 2
+
+ if relative distance for each hop is 20,
+ then internode distance would be,
+ 0 -> 1 = 20
+ 1 -> 2 = 20
+ 2 -> 3 = 20
+ 3 -> 0 = 20
+ 0 -> 2 = 40
+ 1 -> 3 = 40
+
+ and dt presentation for this distance matrix is,
+
+ distance-map {
+ compatible = "numa-distance-map-v1";
+ distance-matrix = <0 0 10>,
+ <0 1 20>,
+ <0 2 40>,
+ <0 3 20>,
+ <1 0 20>,
+ <1 1 10>,
+ <1 2 20>,
+ <1 3 40>,
+ <2 0 40>,
+ <2 1 20>,
+ <2 2 10>,
+ <2 3 20>,
+ <3 0 20>,
+ <3 1 40>,
+ <3 2 20>,
+ <3 3 10>;
+ };
+
+==============================================================================
+4 - Example dts
+==============================================================================
+
+Dual socket system consists of 2 boards connected through ccn bus and
+each board having one socket/soc of 8 cpus, memory and pci bus.
+
+ memory@c00000 {
+ device_type = "memory";
+ reg = <0x0 0xc00000 0x0 0x80000000>;
+ /* node 0 */
+ numa-node-id = <0>;
+ };
+
+ memory@10000000000 {
+ device_type = "memory";
+ reg = <0x100 0x0 0x0 0x80000000>;
+ /* node 1 */
+ numa-node-id = <1>;
+ };
+
+ cpus {
+ #address-cells = <2>;
+ #size-cells = <0>;
+
+ cpu@0 {
+ device_type = "cpu";
+ compatible = "arm,armv8";
+ reg = <0x0 0x0>;
+ enable-method = "psci";
+ /* node 0 */
+ numa-node-id = <0>;
+ };
+ cpu@1 {
+ device_type = "cpu";
+ compatible = "arm,armv8";
+ reg = <0x0 0x1>;
+ enable-method = "psci";
+ numa-node-id = <0>;
+ };
+ cpu@2 {
+ device_type = "cpu";
+ compatible = "arm,armv8";
+ reg = <0x0 0x2>;
+ enable-method = "psci";
+ numa-node-id = <0>;
+ };
+ cpu@3 {
+ device_type = "cpu";
+ compatible = "arm,armv8";
+ reg = <0x0 0x3>;
+ enable-method = "psci";
+ numa-node-id = <0>;
+ };
+ cpu@4 {
+ device_type = "cpu";
+ compatible = "arm,armv8";
+ reg = <0x0 0x4>;
+ enable-method = "psci";
+ numa-node-id = <0>;
+ };
+ cpu@5 {
+ device_type = "cpu";
+ compatible = "arm,armv8";
+ reg = <0x0 0x5>;
+ enable-method = "psci";
+ numa-node-id = <0>;
+ };
+ cpu@6 {
+ device_type = "cpu";
+ compatible = "arm,armv8";
+ reg = <0x0 0x6>;
+ enable-method = "psci";
+ numa-node-id = <0>;
+ };
+ cpu@7 {
+ device_type = "cpu";
+ compatible = "arm,armv8";
+ reg = <0x0 0x7>;
+ enable-method = "psci";
+ numa-node-id = <0>;
+ };
+ cpu@8 {
+ device_type = "cpu";
+ compatible = "arm,armv8";
+ reg = <0x0 0x8>;
+ enable-method = "psci";
+ /* node 1 */
+ numa-node-id = <1>;
+ };
+ cpu@9 {
+ device_type = "cpu";
+ compatible = "arm,armv8";
+ reg = <0x0 0x9>;
+ enable-method = "psci";
+ numa-node-id = <1>;
+ };
+ cpu@a {
+ device_type = "cpu";
+ compatible = "arm,armv8";
+ reg = <0x0 0xa>;
+ enable-method = "psci";
+ numa-node-id = <1>;
+ };
+ cpu@b {
+ device_type = "cpu";
+ compatible = "arm,armv8";
+ reg = <0x0 0xb>;
+ enable-method = "psci";
+ numa-node-id = <1>;
+ };
+ cpu@c {
+ device_type = "cpu";
+ compatible = "arm,armv8";
+ reg = <0x0 0xc>;
+ enable-method = "psci";
+ numa-node-id = <1>;
+ };
+ cpu@d {
+ device_type = "cpu";
+ compatible = "arm,armv8";
+ reg = <0x0 0xd>;
+ enable-method = "psci";
+ numa-node-id = <1>;
+ };
+ cpu@e {
+ device_type = "cpu";
+ compatible = "arm,armv8";
+ reg = <0x0 0xe>;
+ enable-method = "psci";
+ numa-node-id = <1>;
+ };
+ cpu@f {
+ device_type = "cpu";
+ compatible = "arm,armv8";
+ reg = <0x0 0xf>;
+ enable-method = "psci";
+ numa-node-id = <1>;
+ };
+ };
+
+ pcie0: pcie0@848000000000 {
+ compatible = "arm,armv8";
+ device_type = "pci";
+ bus-range = <0 255>;
+ #size-cells = <2>;
+ #address-cells = <3>;
+ reg = <0x8480 0x00000000 0 0x10000000>; /* Configuration space */
+ ranges = <0x03000000 0x8010 0x00000000 0x8010 0x00000000 0x70 0x00000000>;
+ /* node 0 */
+ numa-node-id = <0>;
+ };
+
+ pcie1: pcie1@948000000000 {
+ compatible = "arm,armv8";
+ device_type = "pci";
+ bus-range = <0 255>;
+ #size-cells = <2>;
+ #address-cells = <3>;
+ reg = <0x9480 0x00000000 0 0x10000000>; /* Configuration space */
+ ranges = <0x03000000 0x9010 0x00000000 0x9010 0x00000000 0x70 0x00000000>;
+ /* node 1 */
+ numa-node-id = <1>;
+ };
+
+ distance-map {
+ compatible = "numa-distance-map-v1";
+ distance-matrix = <0 0 10>,
+ <0 1 20>,
+ <1 1 10>;
+ };
--
1.8.3.1

2016-04-13 15:59:34

by Steve Capper

[permalink] [raw]
Subject: Re: [PATCH v16 6/6] arm64, mm, numa: Add NUMA balancing support for arm64.

On Fri, Apr 08, 2016 at 03:50:28PM -0700, David Daney wrote:
> From: Ganapatrao Kulkarni <[email protected]>
>
> Enable NUMA balancing for arm64 platforms.
> Add pte, pmd protnone helpers for use by automatic NUMA balancing.
>
> Reviewed-by: Robert Richter <[email protected]>
> Signed-off-by: Ganapatrao Kulkarni <[email protected]>
> Signed-off-by: David Daney <[email protected]>
> ---
> arch/arm64/Kconfig | 1 +
> arch/arm64/include/asm/pgtable.h | 15 +++++++++++++++
> 2 files changed, 16 insertions(+)
>
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 99f9b55..a578080 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -11,6 +11,7 @@ config ARM64
> select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
> select ARCH_USE_CMPXCHG_LOCKREF
> select ARCH_SUPPORTS_ATOMIC_RMW
> + select ARCH_SUPPORTS_NUMA_BALANCING
> select ARCH_WANT_OPTIONAL_GPIOLIB
> select ARCH_WANT_COMPAT_IPC_PARSE_VERSION
> select ARCH_WANT_FRAME_POINTERS
> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> index 989fef1..89b8f20 100644
> --- a/arch/arm64/include/asm/pgtable.h
> +++ b/arch/arm64/include/asm/pgtable.h
> @@ -272,6 +272,21 @@ static inline pgprot_t mk_sect_prot(pgprot_t prot)
> return __pgprot(pgprot_val(prot) & ~PTE_TABLE_BIT);
> }
>
> +#ifdef CONFIG_NUMA_BALANCING
> +/*
> + * See the comment in include/asm-generic/pgtable.h
> + */
> +static inline int pte_protnone(pte_t pte)
> +{
> + return (pte_val(pte) & (PTE_VALID | PTE_PROT_NONE)) == PTE_PROT_NONE;
> +}
> +
> +static inline int pmd_protnone(pmd_t pmd)
> +{
> + return pte_protnone(pmd_pte(pmd));
> +}
> +#endif
> +

Okay, this looks good to me. If we have a PROT_NONE VMA then this is
caught before going into do_numa_page or do_huge_pmd_numa_page (and
there is a BUG_ON inside these functions to catch stragglers.

I've given this a quick test with a PROT_NONE THP and everything worked
as expected (i.e. NUMA didn't trip up).

Reviewed-by: Steve Capper <[email protected]>

> /*
> * THP definitions.
> */
> --
> 1.8.3.1
>
>
> _______________________________________________
> linux-arm-kernel mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>

2016-04-14 11:02:22

by Steve Capper

[permalink] [raw]
Subject: Re: [PATCH v16 1/6] efi: ARM/arm64: ignore DT memory nodes instead of removing them

On Fri, Apr 08, 2016 at 03:50:23PM -0700, David Daney wrote:
> From: Ard Biesheuvel <[email protected]>
>
> There are two problems with the UEFI stub DT memory node removal
> routine:
> - it deletes nodes as it traverses the tree, which happens to work
> but is not supported, as deletion invalidates the node iterator;
> - deleting memory nodes entirely may discard annotations in the form
> of additional properties on the nodes.
>
> Since the discovery of DT memory nodes occurs strictly before the
> UEFI init sequence, we can simply clear the memblock memory table
> before parsing the UEFI memory map. This way, it is no longer
> necessary to remove the nodes, so we can remove that logic from the
> stub as well.
>
> Signed-off-by: Ard Biesheuvel <[email protected]>
> Signed-off-by: David Daney <[email protected]>
> ---
> drivers/firmware/efi/arm-init.c | 8 ++++++++
> drivers/firmware/efi/libstub/fdt.c | 24 +-----------------------
> 2 files changed, 9 insertions(+), 23 deletions(-)
>
> diff --git a/drivers/firmware/efi/arm-init.c b/drivers/firmware/efi/arm-init.c
> index aa1f743..5d6945b 100644
> --- a/drivers/firmware/efi/arm-init.c
> +++ b/drivers/firmware/efi/arm-init.c
> @@ -143,6 +143,14 @@ static __init void reserve_regions(void)
> if (efi_enabled(EFI_DBG))
> pr_info("Processing EFI memory map:\n");
>
> + /*
> + * Discard memblocks discovered so far: if there are any at this
> + * point, they originate from memory nodes in the DT, and UEFI
> + * uses its own memory map instead.
> + */
> + memblock_dump_all();
> + memblock_remove(0, ULLONG_MAX);
> +

Does this change need to be applied to any other architectures given
that deletion code has been removed from libstub below?

Cheers,
--
Steve

> for_each_efi_memory_desc(&memmap, md) {
> paddr = md->phys_addr;
> npages = md->num_pages;
> diff --git a/drivers/firmware/efi/libstub/fdt.c b/drivers/firmware/efi/libstub/fdt.c
> index 6dba78a..e58abfa 100644
> --- a/drivers/firmware/efi/libstub/fdt.c
> +++ b/drivers/firmware/efi/libstub/fdt.c
> @@ -24,7 +24,7 @@ efi_status_t update_fdt(efi_system_table_t *sys_table, void *orig_fdt,
> unsigned long map_size, unsigned long desc_size,
> u32 desc_ver)
> {
> - int node, prev, num_rsv;
> + int node, num_rsv;
> int status;
> u32 fdt_val32;
> u64 fdt_val64;
> @@ -54,28 +54,6 @@ efi_status_t update_fdt(efi_system_table_t *sys_table, void *orig_fdt,
> goto fdt_set_fail;
>
> /*
> - * Delete any memory nodes present. We must delete nodes which
> - * early_init_dt_scan_memory may try to use.
> - */
> - prev = 0;
> - for (;;) {
> - const char *type;
> - int len;
> -
> - node = fdt_next_node(fdt, prev, NULL);
> - if (node < 0)
> - break;
> -
> - type = fdt_getprop(fdt, node, "device_type", &len);
> - if (type && strncmp(type, "memory", len) == 0) {
> - fdt_del_node(fdt, node);
> - continue;
> - }
> -
> - prev = node;
> - }
> -
> - /*
> * Delete all memory reserve map entries. When booting via UEFI,
> * kernel will use the UEFI memory map to find reserved regions.
> */
> --
> 1.8.3.1
>
>
> _______________________________________________
> linux-arm-kernel mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>

2016-04-14 11:10:41

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [PATCH v16 1/6] efi: ARM/arm64: ignore DT memory nodes instead of removing them

On 14 April 2016 at 13:02, Steve Capper <[email protected]> wrote:
> On Fri, Apr 08, 2016 at 03:50:23PM -0700, David Daney wrote:
>> From: Ard Biesheuvel <[email protected]>
>>
>> There are two problems with the UEFI stub DT memory node removal
>> routine:
>> - it deletes nodes as it traverses the tree, which happens to work
>> but is not supported, as deletion invalidates the node iterator;
>> - deleting memory nodes entirely may discard annotations in the form
>> of additional properties on the nodes.
>>
>> Since the discovery of DT memory nodes occurs strictly before the
>> UEFI init sequence, we can simply clear the memblock memory table
>> before parsing the UEFI memory map. This way, it is no longer
>> necessary to remove the nodes, so we can remove that logic from the
>> stub as well.
>>
>> Signed-off-by: Ard Biesheuvel <[email protected]>
>> Signed-off-by: David Daney <[email protected]>
>> ---
>> drivers/firmware/efi/arm-init.c | 8 ++++++++
>> drivers/firmware/efi/libstub/fdt.c | 24 +-----------------------
>> 2 files changed, 9 insertions(+), 23 deletions(-)
>>
>> diff --git a/drivers/firmware/efi/arm-init.c b/drivers/firmware/efi/arm-init.c
>> index aa1f743..5d6945b 100644
>> --- a/drivers/firmware/efi/arm-init.c
>> +++ b/drivers/firmware/efi/arm-init.c
>> @@ -143,6 +143,14 @@ static __init void reserve_regions(void)
>> if (efi_enabled(EFI_DBG))
>> pr_info("Processing EFI memory map:\n");
>>
>> + /*
>> + * Discard memblocks discovered so far: if there are any at this
>> + * point, they originate from memory nodes in the DT, and UEFI
>> + * uses its own memory map instead.
>> + */
>> + memblock_dump_all();
>> + memblock_remove(0, ULLONG_MAX);
>> +
>
> Does this change need to be applied to any other architectures given
> that deletion code has been removed from libstub below?
>

The 'generic' libstub code below is only used by ARM, so we're safe
here in that regard.


>> for_each_efi_memory_desc(&memmap, md) {
>> paddr = md->phys_addr;
>> npages = md->num_pages;
>> diff --git a/drivers/firmware/efi/libstub/fdt.c b/drivers/firmware/efi/libstub/fdt.c
>> index 6dba78a..e58abfa 100644
>> --- a/drivers/firmware/efi/libstub/fdt.c
>> +++ b/drivers/firmware/efi/libstub/fdt.c
>> @@ -24,7 +24,7 @@ efi_status_t update_fdt(efi_system_table_t *sys_table, void *orig_fdt,
>> unsigned long map_size, unsigned long desc_size,
>> u32 desc_ver)
>> {
>> - int node, prev, num_rsv;
>> + int node, num_rsv;
>> int status;
>> u32 fdt_val32;
>> u64 fdt_val64;
>> @@ -54,28 +54,6 @@ efi_status_t update_fdt(efi_system_table_t *sys_table, void *orig_fdt,
>> goto fdt_set_fail;
>>
>> /*
>> - * Delete any memory nodes present. We must delete nodes which
>> - * early_init_dt_scan_memory may try to use.
>> - */
>> - prev = 0;
>> - for (;;) {
>> - const char *type;
>> - int len;
>> -
>> - node = fdt_next_node(fdt, prev, NULL);
>> - if (node < 0)
>> - break;
>> -
>> - type = fdt_getprop(fdt, node, "device_type", &len);
>> - if (type && strncmp(type, "memory", len) == 0) {
>> - fdt_del_node(fdt, node);
>> - continue;
>> - }
>> -
>> - prev = node;
>> - }
>> -
>> - /*
>> * Delete all memory reserve map entries. When booting via UEFI,
>> * kernel will use the UEFI memory map to find reserved regions.
>> */
>> --
>> 1.8.3.1
>>
>>
>> _______________________________________________
>> linux-arm-kernel mailing list
>> [email protected]
>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>>

2016-04-14 12:09:41

by Steve Capper

[permalink] [raw]
Subject: Re: [PATCH v16 1/6] efi: ARM/arm64: ignore DT memory nodes instead of removing them

On Thu, Apr 14, 2016 at 01:10:35PM +0200, Ard Biesheuvel wrote:
> On 14 April 2016 at 13:02, Steve Capper <[email protected]> wrote:
> > On Fri, Apr 08, 2016 at 03:50:23PM -0700, David Daney wrote:
> >> From: Ard Biesheuvel <[email protected]>
> >>
> >> There are two problems with the UEFI stub DT memory node removal
> >> routine:
> >> - it deletes nodes as it traverses the tree, which happens to work
> >> but is not supported, as deletion invalidates the node iterator;
> >> - deleting memory nodes entirely may discard annotations in the form
> >> of additional properties on the nodes.
> >>
> >> Since the discovery of DT memory nodes occurs strictly before the
> >> UEFI init sequence, we can simply clear the memblock memory table
> >> before parsing the UEFI memory map. This way, it is no longer
> >> necessary to remove the nodes, so we can remove that logic from the
> >> stub as well.
> >>
> >> Signed-off-by: Ard Biesheuvel <[email protected]>
> >> Signed-off-by: David Daney <[email protected]>
> >> ---
> >> drivers/firmware/efi/arm-init.c | 8 ++++++++
> >> drivers/firmware/efi/libstub/fdt.c | 24 +-----------------------
> >> 2 files changed, 9 insertions(+), 23 deletions(-)
> >>
> >> diff --git a/drivers/firmware/efi/arm-init.c b/drivers/firmware/efi/arm-init.c
> >> index aa1f743..5d6945b 100644
> >> --- a/drivers/firmware/efi/arm-init.c
> >> +++ b/drivers/firmware/efi/arm-init.c
> >> @@ -143,6 +143,14 @@ static __init void reserve_regions(void)
> >> if (efi_enabled(EFI_DBG))
> >> pr_info("Processing EFI memory map:\n");
> >>
> >> + /*
> >> + * Discard memblocks discovered so far: if there are any at this
> >> + * point, they originate from memory nodes in the DT, and UEFI
> >> + * uses its own memory map instead.
> >> + */
> >> + memblock_dump_all();
> >> + memblock_remove(0, ULLONG_MAX);
> >> +
> >
> > Does this change need to be applied to any other architectures given
> > that deletion code has been removed from libstub below?
> >
>
> The 'generic' libstub code below is only used by ARM, so we're safe
> here in that regard.

Thanks Ard,
In that case, FWIW:
Acked-by: Steve Capper <[email protected]>

Cheers,
--
Steve

>
>
> >> for_each_efi_memory_desc(&memmap, md) {
> >> paddr = md->phys_addr;
> >> npages = md->num_pages;
> >> diff --git a/drivers/firmware/efi/libstub/fdt.c b/drivers/firmware/efi/libstub/fdt.c
> >> index 6dba78a..e58abfa 100644
> >> --- a/drivers/firmware/efi/libstub/fdt.c
> >> +++ b/drivers/firmware/efi/libstub/fdt.c
> >> @@ -24,7 +24,7 @@ efi_status_t update_fdt(efi_system_table_t *sys_table, void *orig_fdt,
> >> unsigned long map_size, unsigned long desc_size,
> >> u32 desc_ver)
> >> {
> >> - int node, prev, num_rsv;
> >> + int node, num_rsv;
> >> int status;
> >> u32 fdt_val32;
> >> u64 fdt_val64;
> >> @@ -54,28 +54,6 @@ efi_status_t update_fdt(efi_system_table_t *sys_table, void *orig_fdt,
> >> goto fdt_set_fail;
> >>
> >> /*
> >> - * Delete any memory nodes present. We must delete nodes which
> >> - * early_init_dt_scan_memory may try to use.
> >> - */
> >> - prev = 0;
> >> - for (;;) {
> >> - const char *type;
> >> - int len;
> >> -
> >> - node = fdt_next_node(fdt, prev, NULL);
> >> - if (node < 0)
> >> - break;
> >> -
> >> - type = fdt_getprop(fdt, node, "device_type", &len);
> >> - if (type && strncmp(type, "memory", len) == 0) {
> >> - fdt_del_node(fdt, node);
> >> - continue;
> >> - }
> >> -
> >> - prev = node;
> >> - }
> >> -
> >> - /*
> >> * Delete all memory reserve map entries. When booting via UEFI,
> >> * kernel will use the UEFI memory map to find reserved regions.
> >> */
> >> --
> >> 1.8.3.1
> >>
> >>
> >> _______________________________________________
> >> linux-arm-kernel mailing list
> >> [email protected]
> >> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> >>
>

2016-04-15 14:04:03

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH v16 1/6] efi: ARM/arm64: ignore DT memory nodes instead of removing them

On Fri, Apr 08, 2016 at 03:50:23PM -0700, David Daney wrote:
> From: Ard Biesheuvel <[email protected]>
>
> There are two problems with the UEFI stub DT memory node removal
> routine:
> - it deletes nodes as it traverses the tree, which happens to work
> but is not supported, as deletion invalidates the node iterator;
> - deleting memory nodes entirely may discard annotations in the form
> of additional properties on the nodes.
>
> Since the discovery of DT memory nodes occurs strictly before the
> UEFI init sequence, we can simply clear the memblock memory table
> before parsing the UEFI memory map. This way, it is no longer
> necessary to remove the nodes, so we can remove that logic from the
> stub as well.
>
> Signed-off-by: Ard Biesheuvel <[email protected]>
> Signed-off-by: David Daney <[email protected]>
> ---
> drivers/firmware/efi/arm-init.c | 8 ++++++++
> drivers/firmware/efi/libstub/fdt.c | 24 +-----------------------
> 2 files changed, 9 insertions(+), 23 deletions(-)

Matt, are you ok with me taking this through the arm64 tree? (since the
NUMA patches depend on it). If so, please can I have your ack?

Will

> diff --git a/drivers/firmware/efi/arm-init.c b/drivers/firmware/efi/arm-init.c
> index aa1f743..5d6945b 100644
> --- a/drivers/firmware/efi/arm-init.c
> +++ b/drivers/firmware/efi/arm-init.c
> @@ -143,6 +143,14 @@ static __init void reserve_regions(void)
> if (efi_enabled(EFI_DBG))
> pr_info("Processing EFI memory map:\n");
>
> + /*
> + * Discard memblocks discovered so far: if there are any at this
> + * point, they originate from memory nodes in the DT, and UEFI
> + * uses its own memory map instead.
> + */
> + memblock_dump_all();
> + memblock_remove(0, ULLONG_MAX);
> +
> for_each_efi_memory_desc(&memmap, md) {
> paddr = md->phys_addr;
> npages = md->num_pages;
> diff --git a/drivers/firmware/efi/libstub/fdt.c b/drivers/firmware/efi/libstub/fdt.c
> index 6dba78a..e58abfa 100644
> --- a/drivers/firmware/efi/libstub/fdt.c
> +++ b/drivers/firmware/efi/libstub/fdt.c
> @@ -24,7 +24,7 @@ efi_status_t update_fdt(efi_system_table_t *sys_table, void *orig_fdt,
> unsigned long map_size, unsigned long desc_size,
> u32 desc_ver)
> {
> - int node, prev, num_rsv;
> + int node, num_rsv;
> int status;
> u32 fdt_val32;
> u64 fdt_val64;
> @@ -54,28 +54,6 @@ efi_status_t update_fdt(efi_system_table_t *sys_table, void *orig_fdt,
> goto fdt_set_fail;
>
> /*
> - * Delete any memory nodes present. We must delete nodes which
> - * early_init_dt_scan_memory may try to use.
> - */
> - prev = 0;
> - for (;;) {
> - const char *type;
> - int len;
> -
> - node = fdt_next_node(fdt, prev, NULL);
> - if (node < 0)
> - break;
> -
> - type = fdt_getprop(fdt, node, "device_type", &len);
> - if (type && strncmp(type, "memory", len) == 0) {
> - fdt_del_node(fdt, node);
> - continue;
> - }
> -
> - prev = node;
> - }
> -
> - /*
> * Delete all memory reserve map entries. When booting via UEFI,
> * kernel will use the UEFI memory map to find reserved regions.
> */
> --
> 1.8.3.1
>

2016-04-15 14:06:12

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [PATCH v16 1/6] efi: ARM/arm64: ignore DT memory nodes instead of removing them

On 15 April 2016 at 16:03, Will Deacon <[email protected]> wrote:
> On Fri, Apr 08, 2016 at 03:50:23PM -0700, David Daney wrote:
>> From: Ard Biesheuvel <[email protected]>
>>
>> There are two problems with the UEFI stub DT memory node removal
>> routine:
>> - it deletes nodes as it traverses the tree, which happens to work
>> but is not supported, as deletion invalidates the node iterator;
>> - deleting memory nodes entirely may discard annotations in the form
>> of additional properties on the nodes.
>>
>> Since the discovery of DT memory nodes occurs strictly before the
>> UEFI init sequence, we can simply clear the memblock memory table
>> before parsing the UEFI memory map. This way, it is no longer
>> necessary to remove the nodes, so we can remove that logic from the
>> stub as well.
>>
>> Signed-off-by: Ard Biesheuvel <[email protected]>
>> Signed-off-by: David Daney <[email protected]>
>> ---
>> drivers/firmware/efi/arm-init.c | 8 ++++++++
>> drivers/firmware/efi/libstub/fdt.c | 24 +-----------------------
>> 2 files changed, 9 insertions(+), 23 deletions(-)
>
> Matt, are you ok with me taking this through the arm64 tree? (since the
> NUMA patches depend on it). If so, please can I have your ack?
>

Matt gave his Reviewed-by for v15

http://www.gossamer-threads.com/lists/linux/kernel/2390242

2016-04-15 14:08:23

by Matt Fleming

[permalink] [raw]
Subject: Re: [PATCH v16 1/6] efi: ARM/arm64: ignore DT memory nodes instead of removing them

On Fri, 15 Apr, at 04:06:08PM, Ard Biesheuvel wrote:
> On 15 April 2016 at 16:03, Will Deacon <[email protected]> wrote:
> > On Fri, Apr 08, 2016 at 03:50:23PM -0700, David Daney wrote:
> >> From: Ard Biesheuvel <[email protected]>
> >>
> >> There are two problems with the UEFI stub DT memory node removal
> >> routine:
> >> - it deletes nodes as it traverses the tree, which happens to work
> >> but is not supported, as deletion invalidates the node iterator;
> >> - deleting memory nodes entirely may discard annotations in the form
> >> of additional properties on the nodes.
> >>
> >> Since the discovery of DT memory nodes occurs strictly before the
> >> UEFI init sequence, we can simply clear the memblock memory table
> >> before parsing the UEFI memory map. This way, it is no longer
> >> necessary to remove the nodes, so we can remove that logic from the
> >> stub as well.
> >>
> >> Signed-off-by: Ard Biesheuvel <[email protected]>
> >> Signed-off-by: David Daney <[email protected]>
> >> ---
> >> drivers/firmware/efi/arm-init.c | 8 ++++++++
> >> drivers/firmware/efi/libstub/fdt.c | 24 +-----------------------
> >> 2 files changed, 9 insertions(+), 23 deletions(-)
> >
> > Matt, are you ok with me taking this through the arm64 tree? (since the
> > NUMA patches depend on it). If so, please can I have your ack?
> >
>
> Matt gave his Reviewed-by for v15
>
> http://www.gossamer-threads.com/lists/linux/kernel/2390242

Heh, you beat me to it!

Will, go ahead and take this through the arm64 tree.

2016-04-15 14:08:43

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH v16 1/6] efi: ARM/arm64: ignore DT memory nodes instead of removing them

On Fri, Apr 15, 2016 at 04:06:08PM +0200, Ard Biesheuvel wrote:
> On 15 April 2016 at 16:03, Will Deacon <[email protected]> wrote:
> > On Fri, Apr 08, 2016 at 03:50:23PM -0700, David Daney wrote:
> >> From: Ard Biesheuvel <[email protected]>
> >>
> >> There are two problems with the UEFI stub DT memory node removal
> >> routine:
> >> - it deletes nodes as it traverses the tree, which happens to work
> >> but is not supported, as deletion invalidates the node iterator;
> >> - deleting memory nodes entirely may discard annotations in the form
> >> of additional properties on the nodes.
> >>
> >> Since the discovery of DT memory nodes occurs strictly before the
> >> UEFI init sequence, we can simply clear the memblock memory table
> >> before parsing the UEFI memory map. This way, it is no longer
> >> necessary to remove the nodes, so we can remove that logic from the
> >> stub as well.
> >>
> >> Signed-off-by: Ard Biesheuvel <[email protected]>
> >> Signed-off-by: David Daney <[email protected]>
> >> ---
> >> drivers/firmware/efi/arm-init.c | 8 ++++++++
> >> drivers/firmware/efi/libstub/fdt.c | 24 +-----------------------
> >> 2 files changed, 9 insertions(+), 23 deletions(-)
> >
> > Matt, are you ok with me taking this through the arm64 tree? (since the
> > NUMA patches depend on it). If so, please can I have your ack?
> >
>
> Matt gave his Reviewed-by for v15
>
> http://www.gossamer-threads.com/lists/linux/kernel/2390242

Brill, thanks. Looks like it got dropped by accident for the latest posting.

Will