2015-12-03 22:24:32

by Steven Rostedt

[permalink] [raw]
Subject: [RFC][PATCH] Add __GFP_ZERO to alloc_cpumask_var_node() if ptr is zero

Xunlei Pang reported a bug in the scheduler code when
CONFIG_CPUMASK_OFFSTACK is set, several of the cpumasks used by the
root domains can contain garbage. The code does the following:

memset(rd, 0, sizeof(*rd));

if (!alloc_cpumask_var(&rd->span, GFP_KERNEL))
goto out;
if (!alloc_cpumask_var(&rd->online, GFP_KERNEL))
goto free_span;
if (!alloc_cpumask_var(&rd->dlo_mask, GFP_KERNEL))
goto free_online;
if (!alloc_cpumask_var(&rd->rto_mask, GFP_KERNEL))
goto free_dlo_mask;

When CONFIG_CPUMASK_OFFSTACK is not defined, the four cpumasks are part
of the 'rd' structure, and the memset() will zero them all out. But
when CONFIG_CPUMASK_OFFSTACK is enabled, those cpumasks are no longer
set by the memset() but are allocated independently. That allocation
may contain garbage.

In order to make alloc_cpumask_var() and friends behave the same with
respect to being zero or not whether or not CONFIG_CPUMASK_OFFSTACK is
defined, a check is made to the contents of the mask pointer. If the
contents of the mask pointer is zero, it is assumed that the value was
zeroed out before and __GFP_ZERO is added to the flags for allocation
to make the returned cpumasks already zeroed.

Calls to alloc_cpumask_var() are not done in performance critical
paths, and even if they are, zeroing them out shouldn't add much
overhead to it. The up side to this change is that we remove subtle
bugs when enabling CONFIG_CPUMASK_OFFSTACK with cpumask logic that
worked fined when that config was not enabled.

Signed-off-by: Steven Rostedt <[email protected]>
---
diff --git a/lib/cpumask.c b/lib/cpumask.c
index 5a70f6196f57..c0d68752a8b9 100644
--- a/lib/cpumask.c
+++ b/lib/cpumask.c
@@ -60,6 +60,19 @@ int cpumask_any_but(const struct cpumask *mask, unsigned int cpu)
*/
bool alloc_cpumask_var_node(cpumask_var_t *mask, gfp_t flags, int node)
{
+ /*
+ * When CONFIG_CPUMASK_OFFSTACK is not set, the cpumask may
+ * be zeroed by a memset of the structure that contains the
+ * mask. But if CONFIG_CPUMASK_OFFSTACK is then enabled,
+ * the mask may end up containing garbage. By checking
+ * if the pointer of the mask is already zero, we can assume
+ * that the mask itself should be allocated to contain all
+ * zeros as well. This will prevent subtle bugs by the
+ * inconsistency of the config being set or not.
+ */
+ if ((long)*mask == 0)
+ flags |= __GFP_ZERO;
+
*mask = kmalloc_node(cpumask_size(), flags, node);

#ifdef CONFIG_DEBUG_PER_CPU_MAPS


2015-12-04 01:59:09

by Rusty Russell

[permalink] [raw]
Subject: Re: [RFC][PATCH] Add __GFP_ZERO to alloc_cpumask_var_node() if ptr is zero

Steven Rostedt <[email protected]> writes:
> Xunlei Pang reported a bug in the scheduler code when
> CONFIG_CPUMASK_OFFSTACK is set, several of the cpumasks used by the
> root domains can contain garbage. The code does the following:
>
> memset(rd, 0, sizeof(*rd));
>
> if (!alloc_cpumask_var(&rd->span, GFP_KERNEL))
> goto out;
> if (!alloc_cpumask_var(&rd->online, GFP_KERNEL))
> goto free_span;
> if (!alloc_cpumask_var(&rd->dlo_mask, GFP_KERNEL))
> goto free_online;
> if (!alloc_cpumask_var(&rd->rto_mask, GFP_KERNEL))
> goto free_dlo_mask;
>
> When CONFIG_CPUMASK_OFFSTACK is not defined, the four cpumasks are part
> of the 'rd' structure, and the memset() will zero them all out. But
> when CONFIG_CPUMASK_OFFSTACK is enabled, those cpumasks are no longer
> set by the memset() but are allocated independently. That allocation
> may contain garbage.
>
> In order to make alloc_cpumask_var() and friends behave the same with
> respect to being zero or not whether or not CONFIG_CPUMASK_OFFSTACK is
> defined, a check is made to the contents of the mask pointer. If the
> contents of the mask pointer is zero, it is assumed that the value was
> zeroed out before and __GFP_ZERO is added to the flags for allocation
> to make the returned cpumasks already zeroed.
>
> Calls to alloc_cpumask_var() are not done in performance critical
> paths, and even if they are, zeroing them out shouldn't add much
> overhead to it. The up side to this change is that we remove subtle
> bugs when enabling CONFIG_CPUMASK_OFFSTACK with cpumask logic that
> worked fined when that config was not enabled.

This is clever, but I would advise against such subtle code. We will
never be able to remove this code once it is in.

Would suggest making the non-CPUMASK_OFFSTACK stubs write garbage into
the cpumasks instead, iff !(flags & __GFP_ZERO).

Cheers,
Rusty.




>
> Signed-off-by: Steven Rostedt <[email protected]>
> ---
> diff --git a/lib/cpumask.c b/lib/cpumask.c
> index 5a70f6196f57..c0d68752a8b9 100644
> --- a/lib/cpumask.c
> +++ b/lib/cpumask.c
> @@ -60,6 +60,19 @@ int cpumask_any_but(const struct cpumask *mask, unsigned int cpu)
> */
> bool alloc_cpumask_var_node(cpumask_var_t *mask, gfp_t flags, int node)
> {
> + /*
> + * When CONFIG_CPUMASK_OFFSTACK is not set, the cpumask may
> + * be zeroed by a memset of the structure that contains the
> + * mask. But if CONFIG_CPUMASK_OFFSTACK is then enabled,
> + * the mask may end up containing garbage. By checking
> + * if the pointer of the mask is already zero, we can assume
> + * that the mask itself should be allocated to contain all
> + * zeros as well. This will prevent subtle bugs by the
> + * inconsistency of the config being set or not.
> + */
> + if ((long)*mask == 0)
> + flags |= __GFP_ZERO;
> +
> *mask = kmalloc_node(cpumask_size(), flags, node);
>
> #ifdef CONFIG_DEBUG_PER_CPU_MAPS

2015-12-04 02:37:49

by Steven Rostedt

[permalink] [raw]
Subject: Re: [RFC][PATCH] Add __GFP_ZERO to alloc_cpumask_var_node() if ptr is zero

On Fri, 04 Dec 2015 12:05:12 +1030
Rusty Russell <[email protected]> wrote:

> This is clever, but I would advise against such subtle code. We will
> never be able to remove this code once it is in.
>
> Would suggest making the non-CPUMASK_OFFSTACK stubs write garbage into
> the cpumasks instead, iff !(flags & __GFP_ZERO).
>
>
I actually thought of the same thing, but thought it was a bit harsh.
If others think that's a better solution, then I'll submit a patch to
do that.

-- Steve

2015-12-04 07:34:43

by Ingo Molnar

[permalink] [raw]
Subject: Re: [RFC][PATCH] Add __GFP_ZERO to alloc_cpumask_var_node() if ptr is zero


* Steven Rostedt <[email protected]> wrote:

> On Fri, 04 Dec 2015 12:05:12 +1030
> Rusty Russell <[email protected]> wrote:
>
> > This is clever, but I would advise against such subtle code. We will never be
> > able to remove this code once it is in.
> >
> > Would suggest making the non-CPUMASK_OFFSTACK stubs write garbage into the
> > cpumasks instead, iff !(flags & __GFP_ZERO).
>
> I actually thought of the same thing, but thought it was a bit harsh. If others
> think that's a better solution, then I'll submit a patch to do that.

That just makes things more fragile - 'garbage' will spread the breakage, and if
the breakage is subtle, it will spread subtle breakage.

So why not use a kzmalloc_node() [equivalent] call instead of kmalloc_node(), to
make sure it's all zeroed instead of uninitialized?

Thanks,

Ingo

2015-12-06 02:32:50

by Rusty Russell

[permalink] [raw]
Subject: Re: [RFC][PATCH] Add __GFP_ZERO to alloc_cpumask_var_node() if ptr is zero

Ingo Molnar <[email protected]> writes:
> * Steven Rostedt <[email protected]> wrote:
>
>> On Fri, 04 Dec 2015 12:05:12 +1030
>> Rusty Russell <[email protected]> wrote:
>>
>> > This is clever, but I would advise against such subtle code. We will never be
>> > able to remove this code once it is in.
>> >
>> > Would suggest making the non-CPUMASK_OFFSTACK stubs write garbage into the
>> > cpumasks instead, iff !(flags & __GFP_ZERO).
>>
>> I actually thought of the same thing, but thought it was a bit harsh. If others
>> think that's a better solution, then I'll submit a patch to do that.
>
> That just makes things more fragile - 'garbage' will spread the breakage, and if
> the breakage is subtle, it will spread subtle breakage.
>
> So why not use a kzmalloc_node() [equivalent] call instead of kmalloc_node(), to
> make sure it's all zeroed instead of uninitialized?

OTOH, why not make *every* kmalloc a kzmalloc?

The issue here is not that the issue is subtle (not using a zeroing
allocator is a pretty clear bug), it's that it's papered over by the
normal config.

If we had a config option already to garbage-fill allocations, it'd be a
simple solution.

I don't think there are great answers here. But adding more subtle
zeroing semantics feels wrong, even if it will mostly Just Work.

Cheers,
Rusty.

2015-12-06 17:29:43

by Ingo Molnar

[permalink] [raw]
Subject: Re: [RFC][PATCH] Add __GFP_ZERO to alloc_cpumask_var_node() if ptr is zero


* Rusty Russell <[email protected]> wrote:

> Ingo Molnar <[email protected]> writes:
> > * Steven Rostedt <[email protected]> wrote:
> >
> >> On Fri, 04 Dec 2015 12:05:12 +1030
> >> Rusty Russell <[email protected]> wrote:
> >>
> >> > This is clever, but I would advise against such subtle code. We will never be
> >> > able to remove this code once it is in.
> >> >
> >> > Would suggest making the non-CPUMASK_OFFSTACK stubs write garbage into the
> >> > cpumasks instead, iff !(flags & __GFP_ZERO).
> >>
> >> I actually thought of the same thing, but thought it was a bit harsh. If others
> >> think that's a better solution, then I'll submit a patch to do that.
> >
> > That just makes things more fragile - 'garbage' will spread the breakage, and if
> > the breakage is subtle, it will spread subtle breakage.
> >
> > So why not use a kzmalloc_node() [equivalent] call instead of kmalloc_node(), to
> > make sure it's all zeroed instead of uninitialized?
>
> OTOH, why not make *every* kmalloc a kzmalloc?

The big difference to alloc_cpumask_var_node() is that kmalloc() is well-defined
in the sense that it will return uninitialized buffers (sometimes even poisoned
ones), all the time.

But alloc_cpumask_var_node() will return a zeroed cpumask 99.9% of the time when
the kernel being run is using on-stack cpumasks. So it's very easy to not
initialize and not discover it for extended periods of time.

As it happened here, and as was fixed with the patch. Hence my suggestion.

> The issue here is not that the issue is subtle (not using a zeroing allocator is
> a pretty clear bug), it's that it's papered over by the normal config.

Exactly.

> If we had a config option already to garbage-fill allocations, it'd be a simple
> solution.
>
> I don't think there are great answers here. But adding more subtle zeroing
> semantics feels wrong, even if it will mostly Just Work.

It's not subtle if the naming clearly reflects it (hence my suggestion to rename
the API) - and the status quo for on-stack allocations is zeroing anyway, so it's
not a big jump...

Thanks,

Ingo

2015-12-07 03:14:07

by Rusty Russell

[permalink] [raw]
Subject: Re: [RFC][PATCH] Add __GFP_ZERO to alloc_cpumask_var_node() if ptr is zero

Ingo Molnar <[email protected]> writes:
> * Rusty Russell <[email protected]> wrote:
>> I don't think there are great answers here. But adding more subtle zeroing
>> semantics feels wrong, even if it will mostly Just Work.
>
> It's not subtle if the naming clearly reflects it (hence my suggestion to rename
> the API) - and the status quo for on-stack allocations is zeroing anyway, so it's
> not a big jump...

True, but we already have zalloc_cpumask_var() though if we want that?

It probably makes sense to just switch everyone to that and get rid of
the non-z one?

Cheers,
Rusty.

2015-12-07 08:23:48

by Ingo Molnar

[permalink] [raw]
Subject: Re: [RFC][PATCH] Add __GFP_ZERO to alloc_cpumask_var_node() if ptr is zero


* Rusty Russell <[email protected]> wrote:

> Ingo Molnar <[email protected]> writes:
> > * Rusty Russell <[email protected]> wrote:
> >> I don't think there are great answers here. But adding more subtle zeroing
> >> semantics feels wrong, even if it will mostly Just Work.
> >
> > It's not subtle if the naming clearly reflects it (hence my suggestion to rename
> > the API) - and the status quo for on-stack allocations is zeroing anyway, so it's
> > not a big jump...
>
> True, but we already have zalloc_cpumask_var() though if we want that?

Indeed, didn't realize that.

> It probably makes sense to just switch everyone to that and get rid of the non-z
> one?

Yeah, I think this long-lived bug is a proper trigger for that. Lemme send a
2-patch series.

Thanks,

Ingo