2015-12-02 11:53:27

by Xunlei Pang

[permalink] [raw]
Subject: [PATCH] sched/core: Clear the root_domain cpumasks in init_rootdomain()

root_domain::rto_mask allocated through alloc_cpumask_var()
contains garbage data, this may cause problems. For instance,
When doing pull_rt_task(), it may do useless iterations if
rto_mask retains some extra garbage bits. Worse still, this
violates the isolated domain rule for clustered scheduling
using cpuset, because the tasks(with all the cpus allowed)
belongs to one root domain can be pulled away into another
root domain.

The patch cleans the garbage by using zalloc_cpumask_var()
instead of alloc_cpumask_var() for root_domain::rto_mask
allocation, thereby addressing the issues.

Do the same thing for root_domain's other cpumask memembers:
dlo_mask, span, and online.

Signed-off-by: Xunlei Pang <[email protected]>
---
kernel/sched/core.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 5b420d2..5691953 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5858,13 +5858,13 @@ static int init_rootdomain(struct root_domain *rd)
{
memset(rd, 0, sizeof(*rd));

- if (!alloc_cpumask_var(&rd->span, GFP_KERNEL))
+ if (!zalloc_cpumask_var(&rd->span, GFP_KERNEL))
goto out;
- if (!alloc_cpumask_var(&rd->online, GFP_KERNEL))
+ if (!zalloc_cpumask_var(&rd->online, GFP_KERNEL))
goto free_span;
- if (!alloc_cpumask_var(&rd->dlo_mask, GFP_KERNEL))
+ if (!zalloc_cpumask_var(&rd->dlo_mask, GFP_KERNEL))
goto free_online;
- if (!alloc_cpumask_var(&rd->rto_mask, GFP_KERNEL))
+ if (!zalloc_cpumask_var(&rd->rto_mask, GFP_KERNEL))
goto free_dlo_mask;

init_dl_bw(&rd->dl_bw);
--
2.5.0


2015-12-02 12:34:22

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] sched/core: Clear the root_domain cpumasks in init_rootdomain()

On Wed, Dec 02, 2015 at 07:52:59PM +0800, Xunlei Pang wrote:
> The patch cleans the garbage by using zalloc_cpumask_var()
> instead of alloc_cpumask_var() for root_domain::rto_mask
> allocation, thereby addressing the issues.

How did you notice this? Also do we want to do the same for the kmalloc
in alloc_rootdomain() ?

2015-12-02 13:12:37

by Xunlei Pang

[permalink] [raw]
Subject: Re: [PATCH] sched/core: Clear the root_domain cpumasks in init_rootdomain()

Hi Peter,

On 12/02/2015 at 08:34 PM, Peter Zijlstra wrote:
> On Wed, Dec 02, 2015 at 07:52:59PM +0800, Xunlei Pang wrote:
>> The patch cleans the garbage by using zalloc_cpumask_var()
>> instead of alloc_cpumask_var() for root_domain::rto_mask
>> allocation, thereby addressing the issues.
> How did you notice this? Also do we want to do the same for the kmalloc

When doing review.

> in alloc_rootdomain() ?

There is a "memset(rd, 0, sizeof(*rd))" in init_rootdomain(),
so I don't think we need to do this in alloc_rootdomain().

Regards,
Xunlei

2015-12-02 16:25:48

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] sched/core: Clear the root_domain cpumasks in init_rootdomain()

On Wed, Dec 02, 2015 at 09:12:30PM +0800, Xunlei Pang wrote:
> Hi Peter,
>
> On 12/02/2015 at 08:34 PM, Peter Zijlstra wrote:
> > On Wed, Dec 02, 2015 at 07:52:59PM +0800, Xunlei Pang wrote:
> >> The patch cleans the garbage by using zalloc_cpumask_var()
> >> instead of alloc_cpumask_var() for root_domain::rto_mask
> >> allocation, thereby addressing the issues.
> > How did you notice this? Also do we want to do the same for the kmalloc
>
> When doing review.

Nice, will you be looking for similar issues elsewhere in the scheduler
too?

> > in alloc_rootdomain() ?
>
> There is a "memset(rd, 0, sizeof(*rd))" in init_rootdomain(),
> so I don't think we need to do this in alloc_rootdomain().

Ah, right there is. Which also clears the mask for small systems.

2015-12-03 02:44:16

by Xunlei Pang

[permalink] [raw]
Subject: Re: [PATCH] sched/core: Clear the root_domain cpumasks in init_rootdomain()

Hi Peter,

On 12/03/2015 at 12:25 AM, Peter Zijlstra wrote:
> On Wed, Dec 02, 2015 at 09:12:30PM +0800, Xunlei Pang wrote:
>> Hi Peter,
>>
>> On 12/02/2015 at 08:34 PM, Peter Zijlstra wrote:
>>> On Wed, Dec 02, 2015 at 07:52:59PM +0800, Xunlei Pang wrote:
>>>> The patch cleans the garbage by using zalloc_cpumask_var()
>>>> instead of alloc_cpumask_var() for root_domain::rto_mask
>>>> allocation, thereby addressing the issues.
>>> How did you notice this? Also do we want to do the same for the kmalloc
>> When doing review.
> Nice, will you be looking for similar issues elsewhere in the scheduler
> too?

Sure :-)
>>> in alloc_rootdomain() ?
>> There is a "memset(rd, 0, sizeof(*rd))" in init_rootdomain(),
>> so I don't think we need to do this in alloc_rootdomain().
> Ah, right there is. Which also clears the mask for small systems.

Yeah, maybe we can improve it using alloc_cpumask_var() with
__GFP_ZERO instead of zalloc_cpumask_var() to avoid duplicate
clean for small systems.

Regards,
Xunlei

2015-12-03 08:28:08

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] sched/core: Clear the root_domain cpumasks in init_rootdomain()


* Xunlei Pang <[email protected]> wrote:

> Hi Peter,
>
> On 12/03/2015 at 12:25 AM, Peter Zijlstra wrote:
> > On Wed, Dec 02, 2015 at 09:12:30PM +0800, Xunlei Pang wrote:
> >> Hi Peter,
> >>
> >> On 12/02/2015 at 08:34 PM, Peter Zijlstra wrote:
> >>> On Wed, Dec 02, 2015 at 07:52:59PM +0800, Xunlei Pang wrote:
> >>>> The patch cleans the garbage by using zalloc_cpumask_var()
> >>>> instead of alloc_cpumask_var() for root_domain::rto_mask
> >>>> allocation, thereby addressing the issues.
> >>> How did you notice this? Also do we want to do the same for the kmalloc
> >> When doing review.
> > Nice, will you be looking for similar issues elsewhere in the scheduler
> > too?
>
> Sure :-)

Hm, is the alloc_cpumask_var() done in alloc_sched_domains() safe?

At least the usage pattern in init_sched_domains() looks unsafe:

doms_cur = alloc_sched_domains(ndoms_cur);
if (!doms_cur)
doms_cur = &fallback_doms;
cpumask_andnot(doms_cur[0], cpu_map, cpu_isolated_map);

I think alloc_cpumask_var() is a fundamentally unsafe or at least fragile
operation, because the uninitialized variable bug will only happen on large CPU
count kernels AFAICS - so it's inviting such bugs.

How about we rename alloc_cpumask_var() to alloc_cpumask_var_noinit() or at least
__alloc_cpumask_var(), to make this property easier to see?

Thanks,

Ingo

2015-12-03 09:35:34

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] sched/core: Clear the root_domain cpumasks in init_rootdomain()

On Thu, Dec 03, 2015 at 10:44:08AM +0800, Xunlei Pang wrote:

> > Nice, will you be looking for similar issues elsewhere in the scheduler
> > too?
>
> Sure :-)

Thanks!

> >>> in alloc_rootdomain() ?
> >> There is a "memset(rd, 0, sizeof(*rd))" in init_rootdomain(),
> >> so I don't think we need to do this in alloc_rootdomain().
> > Ah, right there is. Which also clears the mask for small systems.
>
> Yeah, maybe we can improve it using alloc_cpumask_var() with
> __GFP_ZERO instead of zalloc_cpumask_var() to avoid duplicate
> clean for small systems.

This isn't a performance critical path, so clarity and correctness are
more important than performance.

2015-12-03 11:53:03

by Xunlei Pang

[permalink] [raw]
Subject: Re: [PATCH] sched/core: Clear the root_domain cpumasks in init_rootdomain()

Hi Ingo,

On 12/03/2015 at 04:28 PM, Ingo Molnar wrote:
> * Xunlei Pang <[email protected]> wrote:
>
>> Hi Peter,
>>
>> On 12/03/2015 at 12:25 AM, Peter Zijlstra wrote:
>>> On Wed, Dec 02, 2015 at 09:12:30PM +0800, Xunlei Pang wrote:
>>>> Hi Peter,
>>>>
>>>> On 12/02/2015 at 08:34 PM, Peter Zijlstra wrote:
>>>>> On Wed, Dec 02, 2015 at 07:52:59PM +0800, Xunlei Pang wrote:
>>>>>> The patch cleans the garbage by using zalloc_cpumask_var()
>>>>>> instead of alloc_cpumask_var() for root_domain::rto_mask
>>>>>> allocation, thereby addressing the issues.
>>>>> How did you notice this? Also do we want to do the same for the kmalloc
>>>> When doing review.
>>> Nice, will you be looking for similar issues elsewhere in the scheduler
>>> too?
>> Sure :-)
> Hm, is the alloc_cpumask_var() done in alloc_sched_domains() safe?

Until now, I haven't found any other similar issues, but I will check further.

>
> At least the usage pattern in init_sched_domains() looks unsafe:
>
> doms_cur = alloc_sched_domains(ndoms_cur);
> if (!doms_cur)
> doms_cur = &fallback_doms;
> cpumask_andnot(doms_cur[0], cpu_map, cpu_isolated_map);
>
> I think alloc_cpumask_var() is a fundamentally unsafe or at least fragile
> operation, because the uninitialized variable bug will only happen on large CPU
> count kernels AFAICS - so it's inviting such bugs.
>
> How about we rename alloc_cpumask_var() to alloc_cpumask_var_noinit() or at least
> __alloc_cpumask_var(), to make this property easier to see?

There have already been many call sites of it in the kernel, at least we still
have zalloc_cpumask_var(), maybe we could add some function comments,
reminding people of thinking of zalloc_cpumask_var() for their cases.

Regards,
Xunlei

>
> Thanks,
>
> Ingo

2015-12-04 08:09:07

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] sched/core: Clear the root_domain cpumasks in init_rootdomain()


* Xunlei Pang <[email protected]> wrote:

> > Hm, is the alloc_cpumask_var() done in alloc_sched_domains() safe?
>
> Until now, I haven't found any other similar issues, but I will check further.
>
> >
> > At least the usage pattern in init_sched_domains() looks unsafe:
> >
> > doms_cur = alloc_sched_domains(ndoms_cur);
> > if (!doms_cur)
> > doms_cur = &fallback_doms;
> > cpumask_andnot(doms_cur[0], cpu_map, cpu_isolated_map);

So is this pattern in init_sched_domains() correct, for OFFSTACK=y?

It looks wrong to me, as alloc_sched_domains() allocates an uninitialized cpumask
via alloc_cpumask_var() and returns it:

cpumask_var_t *alloc_sched_domains(unsigned int ndoms)
{
int i;
cpumask_var_t *doms;

doms = kmalloc(sizeof(*doms) * ndoms, GFP_KERNEL);
if (!doms)
return NULL;
for (i = 0; i < ndoms; i++) {
if (!alloc_cpumask_var(&doms[i], GFP_KERNEL)) {
free_sched_domains(doms, i);
return NULL;
}
}
return doms;
}

and then this code:

> > cpumask_andnot(doms_cur[0], cpu_map, cpu_isolated_map);

uses it without first clearing it.

So is this another such bug, or am I missing something?

Thanks,

Ingo

2015-12-04 08:27:08

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] sched/core: Clear the root_domain cpumasks in init_rootdomain()

On Fri, Dec 04, 2015 at 09:09:01AM +0100, Ingo Molnar wrote:
> and then this code:
>
> > > cpumask_andnot(doms_cur[0], cpu_map, cpu_isolated_map);
>
> uses it without first clearing it.
>
> So is this another such bug, or am I missing something?

It uses it as destination, it does a complete write of the mask.

2015-12-04 08:28:37

by Xunlei Pang

[permalink] [raw]
Subject: Re: [PATCH] sched/core: Clear the root_domain cpumasks in init_rootdomain()

Hi Ingo,

On 12/04/2015 at 04:09 PM, Ingo Molnar wrote:
> * Xunlei Pang <[email protected]> wrote:
>
>>> Hm, is the alloc_cpumask_var() done in alloc_sched_domains() safe?
>> Until now, I haven't found any other similar issues, but I will check further.
>>
>>> At least the usage pattern in init_sched_domains() looks unsafe:
>>>
>>> doms_cur = alloc_sched_domains(ndoms_cur);
>>> if (!doms_cur)
>>> doms_cur = &fallback_doms;
>>> cpumask_andnot(doms_cur[0], cpu_map, cpu_isolated_map);
> So is this pattern in init_sched_domains() correct, for OFFSTACK=y?
>
> It looks wrong to me, as alloc_sched_domains() allocates an uninitialized cpumask
> via alloc_cpumask_var() and returns it:
>
> cpumask_var_t *alloc_sched_domains(unsigned int ndoms)
> {
> int i;
> cpumask_var_t *doms;
>
> doms = kmalloc(sizeof(*doms) * ndoms, GFP_KERNEL);
> if (!doms)
> return NULL;
> for (i = 0; i < ndoms; i++) {
> if (!alloc_cpumask_var(&doms[i], GFP_KERNEL)) {
> free_sched_domains(doms, i);
> return NULL;
> }
> }
> return doms;
> }
>
> and then this code:
>
>>> cpumask_andnot(doms_cur[0], cpu_map, cpu_isolated_map);
> uses it without first clearing it.
>
> So is this another such bug, or am I missing something?

Yeah, I noticed that as well. But fortunately cpumask_andnot(),
cpumask_and() and the like will clear doms_cur[] indirectly, also
cpu_isolated_map, cpu_active_mask, etc doesn't contain any
garbage bits. I also checked the use of it by cpuset, no extra such
bug found by me so far.

Regards,
Xunlei

>
> Thanks,
>
> Ingo
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2015-12-04 08:30:32

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] sched/core: Clear the root_domain cpumasks in init_rootdomain()


* Peter Zijlstra <[email protected]> wrote:

> On Fri, Dec 04, 2015 at 09:09:01AM +0100, Ingo Molnar wrote:
> > and then this code:
> >
> > > > cpumask_andnot(doms_cur[0], cpu_map, cpu_isolated_map);
> >
> > uses it without first clearing it.
> >
> > So is this another such bug, or am I missing something?
>
> It uses it as destination, it does a complete write of the mask.

ah, indeed! The cpumask primitive confused me.

Thanks,

Ingo

2015-12-04 08:30:59

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] sched/core: Clear the root_domain cpumasks in init_rootdomain()


* Xunlei Pang <[email protected]> wrote:

> Hi Ingo,
>
> On 12/04/2015 at 04:09 PM, Ingo Molnar wrote:
> > * Xunlei Pang <[email protected]> wrote:
> >
> >>> Hm, is the alloc_cpumask_var() done in alloc_sched_domains() safe?
> >> Until now, I haven't found any other similar issues, but I will check further.
> >>
> >>> At least the usage pattern in init_sched_domains() looks unsafe:
> >>>
> >>> doms_cur = alloc_sched_domains(ndoms_cur);
> >>> if (!doms_cur)
> >>> doms_cur = &fallback_doms;
> >>> cpumask_andnot(doms_cur[0], cpu_map, cpu_isolated_map);
> > So is this pattern in init_sched_domains() correct, for OFFSTACK=y?
> >
> > It looks wrong to me, as alloc_sched_domains() allocates an uninitialized cpumask
> > via alloc_cpumask_var() and returns it:
> >
> > cpumask_var_t *alloc_sched_domains(unsigned int ndoms)
> > {
> > int i;
> > cpumask_var_t *doms;
> >
> > doms = kmalloc(sizeof(*doms) * ndoms, GFP_KERNEL);
> > if (!doms)
> > return NULL;
> > for (i = 0; i < ndoms; i++) {
> > if (!alloc_cpumask_var(&doms[i], GFP_KERNEL)) {
> > free_sched_domains(doms, i);
> > return NULL;
> > }
> > }
> > return doms;
> > }
> >
> > and then this code:
> >
> >>> cpumask_andnot(doms_cur[0], cpu_map, cpu_isolated_map);
> > uses it without first clearing it.
> >
> > So is this another such bug, or am I missing something?
>
> Yeah, I noticed that as well. But fortunately cpumask_andnot(),
> cpumask_and() and the like will clear doms_cur[] indirectly, also
> cpu_isolated_map, cpu_active_mask, etc doesn't contain any
> garbage bits. I also checked the use of it by cpuset, no extra such
> bug found by me so far.

Great, thanks for double checking!

Ingo

Subject: [tip:locking/core] sched/core: Clear the root_domain cpumasks in init_rootdomain()

Commit-ID: 8295c69925ad53ec32ca54ac9fc194ff21bc40e2
Gitweb: http://git.kernel.org/tip/8295c69925ad53ec32ca54ac9fc194ff21bc40e2
Author: Xunlei Pang <[email protected]>
AuthorDate: Wed, 2 Dec 2015 19:52:59 +0800
Committer: Ingo Molnar <[email protected]>
CommitDate: Fri, 4 Dec 2015 10:16:21 +0100

sched/core: Clear the root_domain cpumasks in init_rootdomain()

root_domain::rto_mask allocated through alloc_cpumask_var()
contains garbage data, this may cause problems. For instance,
When doing pull_rt_task(), it may do useless iterations if
rto_mask retains some extra garbage bits. Worse still, this
violates the isolated domain rule for clustered scheduling
using cpuset, because the tasks(with all the cpus allowed)
belongs to one root domain can be pulled away into another
root domain.

The patch cleans the garbage by using zalloc_cpumask_var()
instead of alloc_cpumask_var() for root_domain::rto_mask
allocation, thereby addressing the issues.

Do the same thing for root_domain's other cpumask memembers:
dlo_mask, span, and online.

Signed-off-by: Xunlei Pang <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
kernel/sched/core.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index fc8c987..eee4ee6 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5846,13 +5846,13 @@ static int init_rootdomain(struct root_domain *rd)
{
memset(rd, 0, sizeof(*rd));

- if (!alloc_cpumask_var(&rd->span, GFP_KERNEL))
+ if (!zalloc_cpumask_var(&rd->span, GFP_KERNEL))
goto out;
- if (!alloc_cpumask_var(&rd->online, GFP_KERNEL))
+ if (!zalloc_cpumask_var(&rd->online, GFP_KERNEL))
goto free_span;
- if (!alloc_cpumask_var(&rd->dlo_mask, GFP_KERNEL))
+ if (!zalloc_cpumask_var(&rd->dlo_mask, GFP_KERNEL))
goto free_online;
- if (!alloc_cpumask_var(&rd->rto_mask, GFP_KERNEL))
+ if (!zalloc_cpumask_var(&rd->rto_mask, GFP_KERNEL))
goto free_dlo_mask;

init_dl_bw(&rd->dl_bw);