2014-12-02 17:18:08

by Alex Thorlton

[permalink] [raw]
Subject: [PATCH] Fix KMALLOC_MAX_SIZE overflow during cpumask allocation

When allocating space for load_balance_mask, in sched_init, when
CPUMASK_OFFSTACK is set, we've managed to spill over KMALLOC_MAX_SIZE on our
6144 core machine. The patch below breaks up the allocations so that they don't
overflow the max alloc size. It also allocates the masks on the the node from
which they'll most commonly be accessed, to minimize remote accesses on NUMA
machines.

Any input is appreciated!

- Alex

Signed-off-by: Alex Thorlton <[email protected]>
Suggested-by: George Beshers <[email protected]>
Cc: George Beshers <[email protected]>
Cc: Russ Anderson <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: [email protected]

---
kernel/sched/core.c | 13 ++++---------
1 file changed, 4 insertions(+), 9 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 4499950..8f06655 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6981,9 +6981,6 @@ void __init sched_init(void)
#ifdef CONFIG_RT_GROUP_SCHED
alloc_size += 2 * nr_cpu_ids * sizeof(void **);
#endif
-#ifdef CONFIG_CPUMASK_OFFSTACK
- alloc_size += num_possible_cpus() * cpumask_size();
-#endif
if (alloc_size) {
ptr = (unsigned long)kzalloc(alloc_size, GFP_NOWAIT);

@@ -7003,12 +7000,10 @@ void __init sched_init(void)
ptr += nr_cpu_ids * sizeof(void **);

#endif /* CONFIG_RT_GROUP_SCHED */
-#ifdef CONFIG_CPUMASK_OFFSTACK
- for_each_possible_cpu(i) {
- per_cpu(load_balance_mask, i) = (void *)ptr;
- ptr += cpumask_size();
- }
-#endif /* CONFIG_CPUMASK_OFFSTACK */
+ }
+ for_each_possible_cpu(i) {
+ per_cpu(load_balance_mask, i) = kzalloc_node(
+ cpumask_size(), GFP_KERNEL, cpu_to_node(i));
}

init_rt_bandwidth(&def_rt_bandwidth,
--
1.7.12.4


2014-12-08 10:42:20

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] Fix KMALLOC_MAX_SIZE overflow during cpumask allocation


* Alex Thorlton <[email protected]> wrote:

> When allocating space for load_balance_mask, in sched_init, when
> CPUMASK_OFFSTACK is set, we've managed to spill over KMALLOC_MAX_SIZE on our
> 6144 core machine. The patch below breaks up the allocations so that they don't
> overflow the max alloc size. It also allocates the masks on the the node from
> which they'll most commonly be accessed, to minimize remote accesses on NUMA
> machines.
>
> Any input is appreciated!
>
> - Alex
>
> Signed-off-by: Alex Thorlton <[email protected]>
> Suggested-by: George Beshers <[email protected]>
> Cc: George Beshers <[email protected]>
> Cc: Russ Anderson <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: [email protected]
>
> ---
> kernel/sched/core.c | 13 ++++---------
> 1 file changed, 4 insertions(+), 9 deletions(-)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 4499950..8f06655 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -6981,9 +6981,6 @@ void __init sched_init(void)
> #ifdef CONFIG_RT_GROUP_SCHED
> alloc_size += 2 * nr_cpu_ids * sizeof(void **);
> #endif
> -#ifdef CONFIG_CPUMASK_OFFSTACK
> - alloc_size += num_possible_cpus() * cpumask_size();
> -#endif
> if (alloc_size) {
> ptr = (unsigned long)kzalloc(alloc_size, GFP_NOWAIT);
>
> @@ -7003,12 +7000,10 @@ void __init sched_init(void)
> ptr += nr_cpu_ids * sizeof(void **);
>
> #endif /* CONFIG_RT_GROUP_SCHED */
> -#ifdef CONFIG_CPUMASK_OFFSTACK
> - for_each_possible_cpu(i) {
> - per_cpu(load_balance_mask, i) = (void *)ptr;
> - ptr += cpumask_size();
> - }
> -#endif /* CONFIG_CPUMASK_OFFSTACK */
> + }
> + for_each_possible_cpu(i) {
> + per_cpu(load_balance_mask, i) = kzalloc_node(
> + cpumask_size(), GFP_KERNEL, cpu_to_node(i));
> }
>
> init_rt_bandwidth(&def_rt_bandwidth,

This patch fails to build with certain configs:

kernel/sched/core.c:7130:33: error: incompatible types when assigning to type ‘cpumask_var_t’ from type ‘void *’

Thanks,

Ingo

2014-12-08 20:03:00

by Alex Thorlton

[permalink] [raw]
Subject: Re: [PATCH] Fix KMALLOC_MAX_SIZE overflow during cpumask allocation

On Mon, Dec 08, 2014 at 11:42:14AM +0100, Ingo Molnar wrote:
> This patch fails to build with certain configs:
>
> kernel/sched/core.c:7130:33: error: incompatible types when assigning to type ‘cpumask_var_t’ from type ‘void *’

Thanks for letting us know, Ingo. I believe George has something in
mind to fix this. We'll take a look and get another version out ASAP!

- Alex