2014-12-09 16:03:19

by Pranith Kumar

[permalink] [raw]
Subject: [RFC PATCH] smpboot: Check for successfull allocation of cpumask vars

zalloc_cpumask_var() can return 0 on allocation failure when
CONFIG_CPUMASK_OFFSTACK is set. Check for the return value and WARN() on failure
of an allocation in such cases.

Signed-off-by: Pranith Kumar <[email protected]>
---
arch/x86/kernel/smpboot.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 7a8f584..9b1df21 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1083,6 +1083,7 @@ static void __init smp_cpu_index_default(void)
void __init native_smp_prepare_cpus(unsigned int max_cpus)
{
unsigned int i;
+ bool ret = true;

preempt_disable();
smp_cpu_index_default();
@@ -1096,9 +1097,12 @@ void __init native_smp_prepare_cpus(unsigned int max_cpus)

current_thread_info()->cpu = 0; /* needed? */
for_each_possible_cpu(i) {
- zalloc_cpumask_var(&per_cpu(cpu_sibling_map, i), GFP_KERNEL);
- zalloc_cpumask_var(&per_cpu(cpu_core_map, i), GFP_KERNEL);
- zalloc_cpumask_var(&per_cpu(cpu_llc_shared_map, i), GFP_KERNEL);
+ ret &= zalloc_cpumask_var(&per_cpu(cpu_sibling_map, i), GFP_KERNEL);
+ ret &= zalloc_cpumask_var(&per_cpu(cpu_core_map, i), GFP_KERNEL);
+ ret &= zalloc_cpumask_var(&per_cpu(cpu_llc_shared_map, i), GFP_KERNEL);
+
+ if (WARN(!ret, "cpumask alloc for cpu %d failed!", i))
+ break;
}
set_cpu_sibling_map(0);

--
1.9.1


2014-12-09 20:10:45

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [RFC PATCH] smpboot: Check for successfull allocation of cpumask vars

On Tue, 9 Dec 2014, Pranith Kumar wrote:
> zalloc_cpumask_var() can return 0 on allocation failure when
> CONFIG_CPUMASK_OFFSTACK is set. Check for the return value and WARN() on failure
> of an allocation in such cases.

And that warning helps in which way?

It just prints a completely useless backtrace and breaks out of the
loop, but it does not prevent that later on code will trip over the
non allocated per cpu data.

Thanks,

tglx

2014-12-09 20:36:59

by Pranith Kumar

[permalink] [raw]
Subject: Re: [RFC PATCH] smpboot: Check for successfull allocation of cpumask vars

On Tue, Dec 9, 2014 at 3:10 PM, Thomas Gleixner <[email protected]> wrote:
> On Tue, 9 Dec 2014, Pranith Kumar wrote:
>> zalloc_cpumask_var() can return 0 on allocation failure when
>> CONFIG_CPUMASK_OFFSTACK is set. Check for the return value and WARN() on failure
>> of an allocation in such cases.
>
> And that warning helps in which way?
>
> It just prints a completely useless backtrace and breaks out of the
> loop, but it does not prevent that later on code will trip over the
> non allocated per cpu data.
>

I agree. May be just a pr_warn() saying that an allocation failed perhaps?

To prevent further accesses, we can clear the cpu bit from the cpu
masks(online/possible/present) for the failed cpu and continue trying
to allocate for other cpus. We don't break out of the loop. Removing
the cpu from the cpu masks will disable accesses of the non allocated
per cpu data.

What do you suggest we do in such cases?
--
Pranith

2014-12-09 20:52:08

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [RFC PATCH] smpboot: Check for successfull allocation of cpumask vars

On Tue, 9 Dec 2014, Pranith Kumar wrote:
> On Tue, Dec 9, 2014 at 3:10 PM, Thomas Gleixner <[email protected]> wrote:
> > On Tue, 9 Dec 2014, Pranith Kumar wrote:
> >> zalloc_cpumask_var() can return 0 on allocation failure when
> >> CONFIG_CPUMASK_OFFSTACK is set. Check for the return value and WARN() on failure
> >> of an allocation in such cases.
> >
> > And that warning helps in which way?
> >
> > It just prints a completely useless backtrace and breaks out of the
> > loop, but it does not prevent that later on code will trip over the
> > non allocated per cpu data.
> >
>
> I agree. May be just a pr_warn() saying that an allocation failed perhaps?

Yep.

> To prevent further accesses, we can clear the cpu bit from the cpu
> masks(online/possible/present) for the failed cpu and continue trying
> to allocate for other cpus. We don't break out of the loop. Removing
> the cpu from the cpu masks will disable accesses of the non allocated
> per cpu data.
>
> What do you suggest we do in such cases?

Pretty much what you said, but we should definitely break out of the
loop. There is no point to try more allocations if we failed the first
one.

Thanks,

tglx