2014-12-18 18:44:50

by Alex Thorlton

[permalink] [raw]
Subject: [PATCHv2] Fix KMALLOC_MAX_SIZE overflow during cpumask allocation

When allocating space for load_balance_mask, in sched_init, when
CPUMASK_OFFSTACK is set, we've managed to spill over KMALLOC_MAX_SIZE on our
6144 core machine. The patch below breaks up the allocations so that they don't
overflow the max alloc size. It also allocates the masks on the the node from
which they'll most commonly be accessed, to minimize remote accesses on NUMA
machines.

v2 - Replace a missing #endif to fix some build issues on certain configs

Any input is appreciated!

- Alex

Signed-off-by: Alex Thorlton <[email protected]>
Suggested-by: George Beshers <[email protected]>
Cc: George Beshers <[email protected]>
Cc: Russ Anderson <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: [email protected]

---
kernel/sched/core.c | 13 +++++--------
1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index b5797b7..80a753f 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7113,9 +7113,6 @@ void __init sched_init(void)
#ifdef CONFIG_RT_GROUP_SCHED
alloc_size += 2 * nr_cpu_ids * sizeof(void **);
#endif
-#ifdef CONFIG_CPUMASK_OFFSTACK
- alloc_size += num_possible_cpus() * cpumask_size();
-#endif
if (alloc_size) {
ptr = (unsigned long)kzalloc(alloc_size, GFP_NOWAIT);

@@ -7135,13 +7132,13 @@ void __init sched_init(void)
ptr += nr_cpu_ids * sizeof(void **);

#endif /* CONFIG_RT_GROUP_SCHED */
+ }
#ifdef CONFIG_CPUMASK_OFFSTACK
- for_each_possible_cpu(i) {
- per_cpu(load_balance_mask, i) = (void *)ptr;
- ptr += cpumask_size();
- }
-#endif /* CONFIG_CPUMASK_OFFSTACK */
+ for_each_possible_cpu(i) {
+ per_cpu(load_balance_mask, i) = (cpumask_var_t) kzalloc_node(
+ cpumask_size(), GFP_KERNEL, cpu_to_node(i));
}
+#endif /* CONFIG_CPUMASK_OFFSTACK */

init_rt_bandwidth(&def_rt_bandwidth,
global_rt_period(), global_rt_runtime());
--
1.8.2.1


Subject: [tip:sched/urgent] sched: Fix KMALLOC_MAX_SIZE overflow during cpumask allocation

Commit-ID: b74e6278fd6db5848163ccdc6e9d8eb6efdee9bd
Gitweb: http://git.kernel.org/tip/b74e6278fd6db5848163ccdc6e9d8eb6efdee9bd
Author: Alex Thorlton <[email protected]>
AuthorDate: Thu, 18 Dec 2014 12:44:30 -0600
Committer: Ingo Molnar <[email protected]>
CommitDate: Tue, 23 Dec 2014 11:43:48 +0100

sched: Fix KMALLOC_MAX_SIZE overflow during cpumask allocation

When allocating space for load_balance_mask, in sched_init, when
CPUMASK_OFFSTACK is set, we've managed to spill over
KMALLOC_MAX_SIZE on our 6144 core machine. The patch below
breaks up the allocations so that they don't overflow the max
alloc size. It also allocates the masks on the the node from
which they'll most commonly be accessed, to minimize remote
accesses on NUMA machines.

Suggested-by: George Beshers <[email protected]>
Signed-off-by: Alex Thorlton <[email protected]>
Cc: George Beshers <[email protected]>
Cc: Russ Anderson <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Linus Torvalds <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
kernel/sched/core.c | 13 +++++--------
1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index b5797b7..c0accc0 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7113,9 +7113,6 @@ void __init sched_init(void)
#ifdef CONFIG_RT_GROUP_SCHED
alloc_size += 2 * nr_cpu_ids * sizeof(void **);
#endif
-#ifdef CONFIG_CPUMASK_OFFSTACK
- alloc_size += num_possible_cpus() * cpumask_size();
-#endif
if (alloc_size) {
ptr = (unsigned long)kzalloc(alloc_size, GFP_NOWAIT);

@@ -7135,13 +7132,13 @@ void __init sched_init(void)
ptr += nr_cpu_ids * sizeof(void **);

#endif /* CONFIG_RT_GROUP_SCHED */
+ }
#ifdef CONFIG_CPUMASK_OFFSTACK
- for_each_possible_cpu(i) {
- per_cpu(load_balance_mask, i) = (void *)ptr;
- ptr += cpumask_size();
- }
-#endif /* CONFIG_CPUMASK_OFFSTACK */
+ for_each_possible_cpu(i) {
+ per_cpu(load_balance_mask, i) = (cpumask_var_t)kzalloc_node(
+ cpumask_size(), GFP_KERNEL, cpu_to_node(i));
}
+#endif /* CONFIG_CPUMASK_OFFSTACK */

init_rt_bandwidth(&def_rt_bandwidth,
global_rt_period(), global_rt_runtime());