2008-10-18 20:03:34

by Cyrill Gorcunov

[permalink] [raw]
Subject: CFS related question

Hi Ingo, Peter,

I just curious, look we have the following

static struct sched_entity *pick_next_entity(struct cfs_rq *cfs_rq)
{
struct sched_entity *se = NULL;

if (first_fair(cfs_rq)) {
se = __pick_next_entity(cfs_rq);
se = pick_next(cfs_rq, se);
set_next_entity(cfs_rq, se);
}

return se;
}

which I presume may return NULL so the following piece
could fail

static struct task_struct *pick_next_task_fair(struct rq *rq)
{
struct task_struct *p;
struct cfs_rq *cfs_rq = &rq->cfs;
struct sched_entity *se;

if (unlikely(!cfs_rq->nr_running))
return NULL;

do {
--> se = pick_next_entity(cfs_rq);
--> OOPs cfs_rq = group_cfs_rq(se);
} while (cfs_rq);

p = task_of(se);
hrtick_start_fair(rq, p);

return p;
}

Did I miss something? Or it comepletely can NOT happen?

- Cyrill -


2008-10-18 20:28:34

by Peter Zijlstra

[permalink] [raw]
Subject: Re: CFS related question

On Sun, 2008-10-19 at 00:03 +0400, Cyrill Gorcunov wrote:
> Hi Ingo, Peter,
>
> I just curious, look we have the following
>
> static struct sched_entity *pick_next_entity(struct cfs_rq *cfs_rq)
> {
> struct sched_entity *se = NULL;
>
> if (first_fair(cfs_rq)) {
> se = __pick_next_entity(cfs_rq);
> se = pick_next(cfs_rq, se);
> set_next_entity(cfs_rq, se);
> }
>
> return se;
> }
>
> which I presume may return NULL so the following piece
> could fail
>
> static struct task_struct *pick_next_task_fair(struct rq *rq)
> {
> struct task_struct *p;
> struct cfs_rq *cfs_rq = &rq->cfs;
> struct sched_entity *se;
>
> if (unlikely(!cfs_rq->nr_running))
> return NULL;
>
> do {
> --> se = pick_next_entity(cfs_rq);
> --> OOPs cfs_rq = group_cfs_rq(se);
> } while (cfs_rq);
>
> p = task_of(se);
> hrtick_start_fair(rq, p);
>
> return p;
> }
>
> Did I miss something? Or it comepletely can NOT happen?

pick_next_entity() only returns NULL when !first_fair(), which is when !
nr_running.

So the initial !nr_running check in pick_next_task_fair() will catch
that. Further nested RQs will never have !nr_running because then they
get dequeued.


2008-10-19 05:30:20

by Cyrill Gorcunov

[permalink] [raw]
Subject: Re: CFS related question

[Peter Zijlstra - Sat, Oct 18, 2008 at 10:28:18PM +0200]
| On Sun, 2008-10-19 at 00:03 +0400, Cyrill Gorcunov wrote:
| > Hi Ingo, Peter,
| >
| > I just curious, look we have the following
| >
| > static struct sched_entity *pick_next_entity(struct cfs_rq *cfs_rq)
| > {
| > struct sched_entity *se = NULL;
| >
| > if (first_fair(cfs_rq)) {
| > se = __pick_next_entity(cfs_rq);
| > se = pick_next(cfs_rq, se);
| > set_next_entity(cfs_rq, se);
| > }
| >
| > return se;
| > }
| >
| > which I presume may return NULL so the following piece
| > could fail
| >
| > static struct task_struct *pick_next_task_fair(struct rq *rq)
| > {
| > struct task_struct *p;
| > struct cfs_rq *cfs_rq = &rq->cfs;
| > struct sched_entity *se;
| >
| > if (unlikely(!cfs_rq->nr_running))
| > return NULL;
| >
| > do {
| > --> se = pick_next_entity(cfs_rq);
| > --> OOPs cfs_rq = group_cfs_rq(se);
| > } while (cfs_rq);
| >
| > p = task_of(se);
| > hrtick_start_fair(rq, p);
| >
| > return p;
| > }
| >
| > Did I miss something? Or it comepletely can NOT happen?
|
| pick_next_entity() only returns NULL when !first_fair(), which is when !
| nr_running.
|
| So the initial !nr_running check in pick_next_task_fair() will catch
| that. Further nested RQs will never have !nr_running because then they
| get dequeued.
|
|
|

Thanks Peter!

- Cyrill -

2008-10-20 18:12:46

by Hiroshi Shimamoto

[permalink] [raw]
Subject: Re: CFS related question

Peter Zijlstra wrote:
> On Sun, 2008-10-19 at 00:03 +0400, Cyrill Gorcunov wrote:
>> Hi Ingo, Peter,
>>
>> I just curious, look we have the following
>>
>> static struct sched_entity *pick_next_entity(struct cfs_rq *cfs_rq)
>> {
>> struct sched_entity *se = NULL;
>>
>> if (first_fair(cfs_rq)) {
>> se = __pick_next_entity(cfs_rq);
>> se = pick_next(cfs_rq, se);
>> set_next_entity(cfs_rq, se);
>> }
>>
>> return se;
>> }
>>
>> which I presume may return NULL so the following piece
>> could fail
>>
>> static struct task_struct *pick_next_task_fair(struct rq *rq)
>> {
>> struct task_struct *p;
>> struct cfs_rq *cfs_rq = &rq->cfs;
>> struct sched_entity *se;
>>
>> if (unlikely(!cfs_rq->nr_running))
>> return NULL;
>>
>> do {
>> --> se = pick_next_entity(cfs_rq);
>> --> OOPs cfs_rq = group_cfs_rq(se);
>> } while (cfs_rq);
>>
>> p = task_of(se);
>> hrtick_start_fair(rq, p);
>>
>> return p;
>> }
>>
>> Did I miss something? Or it comepletely can NOT happen?
>
> pick_next_entity() only returns NULL when !first_fair(), which is when !
> nr_running.
>
> So the initial !nr_running check in pick_next_task_fair() will catch
> that. Further nested RQs will never have !nr_running because then they
> get dequeued.

Hi Peter,

pick_next_entity() is used in pick_next_task_fair() only.
So, checking first_fair() never fail, and if fails it means bug. Right?

How about the below patch?
--------
From: Hiroshi Shimamoto <[email protected]>
Subject: [PATCH] sched: replace check with BUG_ON in pick_next_entity()

BUG_ON instead of returning NULL in pick_next_entity() when !first_fair().
Basically first_fair() is always true, and returning NULL will cause oops later.

Signed-off-by: Hiroshi Shimamoto <[email protected]>
---
kernel/sched_fair.c | 12 ++++++------
1 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index 9573c33..3ce7c25 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -758,13 +758,13 @@ pick_next(struct cfs_rq *cfs_rq, struct sched_entity *se)

static struct sched_entity *pick_next_entity(struct cfs_rq *cfs_rq)
{
- struct sched_entity *se = NULL;
+ struct sched_entity *se;

- if (first_fair(cfs_rq)) {
- se = __pick_next_entity(cfs_rq);
- se = pick_next(cfs_rq, se);
- set_next_entity(cfs_rq, se);
- }
+ BUG_ON(!first_fair(cfs_rq));
+
+ se = __pick_next_entity(cfs_rq);
+ se = pick_next(cfs_rq, se);
+ set_next_entity(cfs_rq, se);

return se;
}
--
1.5.6

2008-10-20 18:20:53

by Cyrill Gorcunov

[permalink] [raw]
Subject: Re: CFS related question

[Hiroshi Shimamoto - Mon, Oct 20, 2008 at 11:12:36AM -0700]
| Peter Zijlstra wrote:
| > On Sun, 2008-10-19 at 00:03 +0400, Cyrill Gorcunov wrote:
| >> Hi Ingo, Peter,
| >>
| >> I just curious, look we have the following
| >>
| >> static struct sched_entity *pick_next_entity(struct cfs_rq *cfs_rq)
| >> {
| >> struct sched_entity *se = NULL;
| >>
| >> if (first_fair(cfs_rq)) {
| >> se = __pick_next_entity(cfs_rq);
| >> se = pick_next(cfs_rq, se);
| >> set_next_entity(cfs_rq, se);
| >> }
| >>
| >> return se;
| >> }
| >>
| >> which I presume may return NULL so the following piece
| >> could fail
| >>
| >> static struct task_struct *pick_next_task_fair(struct rq *rq)
| >> {
| >> struct task_struct *p;
| >> struct cfs_rq *cfs_rq = &rq->cfs;
| >> struct sched_entity *se;
| >>
| >> if (unlikely(!cfs_rq->nr_running))
| >> return NULL;
| >>
| >> do {
| >> --> se = pick_next_entity(cfs_rq);
| >> --> OOPs cfs_rq = group_cfs_rq(se);
| >> } while (cfs_rq);
| >>
| >> p = task_of(se);
| >> hrtick_start_fair(rq, p);
| >>
| >> return p;
| >> }
| >>
| >> Did I miss something? Or it comepletely can NOT happen?
| >
| > pick_next_entity() only returns NULL when !first_fair(), which is when !
| > nr_running.
| >
| > So the initial !nr_running check in pick_next_task_fair() will catch
| > that. Further nested RQs will never have !nr_running because then they
| > get dequeued.
|
| Hi Peter,
|
| pick_next_entity() is used in pick_next_task_fair() only.
| So, checking first_fair() never fail, and if fails it means bug. Right?
|
| How about the below patch?
| --------
| From: Hiroshi Shimamoto <[email protected]>
| Subject: [PATCH] sched: replace check with BUG_ON in pick_next_entity()
|
| BUG_ON instead of returning NULL in pick_next_entity() when !first_fair().
| Basically first_fair() is always true, and returning NULL will cause oops later.
|
| Signed-off-by: Hiroshi Shimamoto <[email protected]>
| ---
| kernel/sched_fair.c | 12 ++++++------
| 1 files changed, 6 insertions(+), 6 deletions(-)
|
| diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
| index 9573c33..3ce7c25 100644
| --- a/kernel/sched_fair.c
| +++ b/kernel/sched_fair.c
| @@ -758,13 +758,13 @@ pick_next(struct cfs_rq *cfs_rq, struct sched_entity *se)
|
| static struct sched_entity *pick_next_entity(struct cfs_rq *cfs_rq)
| {
| - struct sched_entity *se = NULL;
| + struct sched_entity *se;
|
| - if (first_fair(cfs_rq)) {
| - se = __pick_next_entity(cfs_rq);
| - se = pick_next(cfs_rq, se);
| - set_next_entity(cfs_rq, se);
| - }
| + BUG_ON(!first_fair(cfs_rq));
| +
| + se = __pick_next_entity(cfs_rq);
| + se = pick_next(cfs_rq, se);
| + set_next_entity(cfs_rq, se);
|
| return se;
| }
| --
| 1.5.6
|

Technically it should not happen (as Peter explained). But it
seems it happens sometime -- and most probably 'cause of error
in another part of kernel (ie indirect error). So if Peter would
not object against -- I would really like to better have BUG_ON
check there :) Peter?

http://lkml.org/lkml/2008/8/3/269
http://lists.openwall.net/linux-kernel/2008/05/14/130

- Cyrill -