2009-09-18 19:51:42

by Langsdorf, Mark

[permalink] [raw]
Subject: [PATCH] Prevent immediate process rescheduling

Prevent the scheduler from immediately rescheduling a process
that just yielded if there is another process available.

Originally suggested by Mike Galbraith ([email protected]).

Signed-off-by: Mark Langsdorf <[email protected]>
---
kernel/sched_fair.c | 16 +++++++++++++++-
1 files changed, 15 insertions(+), 1 deletions(-)

diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index 652e8bd..4fad08f 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -353,11 +353,25 @@ static void __dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se)
static struct sched_entity *__pick_next_entity(struct cfs_rq *cfs_rq)
{
struct rb_node *left = cfs_rq->rb_leftmost;
+ struct sched_entity *se, *curr;

if (!left)
return NULL;

- return rb_entry(left, struct sched_entity, run_node);
+ se = rb_entry(left, struct sched_entity, run_node);
+ curr = &current->se;
+
+ /*
+ * Don't select the entity who just tried to schedule away
+ * if there's another entity available.
+ */
+ if (unlikely(se == curr && cfs_rq->nr_running > 1)) {
+ struct rb_node *next_node = rb_next(&curr->run_node);
+ if (next_node)
+ se = rb_entry(next_node, struct sched_entity, run_node);
+ }
+
+ return se;
}

static struct sched_entity *__pick_last_entity(struct cfs_rq *cfs_rq)
--
1.6.0.2


2009-09-18 19:55:05

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] Prevent immediate process rescheduling


(fixed the Cc: lines)

* Mark Langsdorf <[email protected]> wrote:

> Prevent the scheduler from immediately rescheduling a process
> that just yielded if there is another process available.
>
> Originally suggested by Mike Galbraith ([email protected]).
>
> Signed-off-by: Mark Langsdorf <[email protected]>
> ---
> kernel/sched_fair.c | 16 +++++++++++++++-
> 1 files changed, 15 insertions(+), 1 deletions(-)
>
> diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
> index 652e8bd..4fad08f 100644
> --- a/kernel/sched_fair.c
> +++ b/kernel/sched_fair.c
> @@ -353,11 +353,25 @@ static void __dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se)
> static struct sched_entity *__pick_next_entity(struct cfs_rq *cfs_rq)
> {
> struct rb_node *left = cfs_rq->rb_leftmost;
> + struct sched_entity *se, *curr;
>
> if (!left)
> return NULL;
>
> - return rb_entry(left, struct sched_entity, run_node);
> + se = rb_entry(left, struct sched_entity, run_node);
> + curr = &current->se;
> +
> + /*
> + * Don't select the entity who just tried to schedule away
> + * if there's another entity available.
> + */
> + if (unlikely(se == curr && cfs_rq->nr_running > 1)) {
> + struct rb_node *next_node = rb_next(&curr->run_node);
> + if (next_node)
> + se = rb_entry(next_node, struct sched_entity, run_node);
> + }
> +
> + return se;
> }

I suspect some real workload is the motivation of this - what is that
workload?

Ingo

2009-09-18 20:02:14

by Langsdorf, Mark

[permalink] [raw]
Subject: RE: [PATCH] Prevent immediate process rescheduling



> -----Original Message-----
> From: Ingo Molnar [mailto:[email protected]]
> Sent: Friday, September 18, 2009 2:55 PM
> To: Langsdorf, Mark; Peter Zijlstra; Mike Galbraith
> Cc: [email protected]
> Subject: Re: [PATCH] Prevent immediate process rescheduling
>
>
> (fixed the Cc: lines)
>
> * Mark Langsdorf <[email protected]> wrote:
>
> > Prevent the scheduler from immediately rescheduling a process
> > that just yielded if there is another process available.
> >
> > Originally suggested by Mike Galbraith ([email protected]).
> >
> > Signed-off-by: Mark Langsdorf <[email protected]>
> > ---
> > kernel/sched_fair.c | 16 +++++++++++++++-
> > 1 files changed, 15 insertions(+), 1 deletions(-)
> >
> > diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
> > index 652e8bd..4fad08f 100644
> > --- a/kernel/sched_fair.c
> > +++ b/kernel/sched_fair.c
> > @@ -353,11 +353,25 @@ static void __dequeue_entity(struct
> cfs_rq *cfs_rq, struct sched_entity *se)
> > static struct sched_entity *__pick_next_entity(struct
> cfs_rq *cfs_rq)
> > {
> > struct rb_node *left = cfs_rq->rb_leftmost;
> > + struct sched_entity *se, *curr;
> >
> > if (!left)
> > return NULL;
> >
> > - return rb_entry(left, struct sched_entity, run_node);
> > + se = rb_entry(left, struct sched_entity, run_node);
> > + curr = &current->se;
> > +
> > + /*
> > + * Don't select the entity who just tried to schedule away
> > + * if there's another entity available.
> > + */
> > + if (unlikely(se == curr && cfs_rq->nr_running > 1)) {
> > + struct rb_node *next_node = rb_next(&curr->run_node);
> > + if (next_node)
> > + se = rb_entry(next_node, struct
> sched_entity, run_node);
> > + }
> > +
> > + return se;
> > }
>
> I suspect some real workload is the motivation of this - what is that
> workload?

Thanks for fixing the cc lines.

The next patch I submitted enables the Pause Filter feature
from the most recent AMD Operton processors. Pause Filter
lets us detect when a KVM guest VCPU is spinning useless on
a contended lock because the lockholder VCPU isn't scheduled.
Ideally, we'd like to switch to that VCPU thread, but at a
minimum, we'd like to meaningful yield the contending VCPU.
This patch prevents schedule() from just rescheduling the
thread that is trying to yield, and didn't have any
performance regressions in my tests.

-Mark Langsdorf
Operating System Research Center
AMD

2009-09-18 20:03:30

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] Prevent immediate process rescheduling

On Fri, 2009-09-18 at 21:54 +0200, Ingo Molnar wrote:

> > diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
> > index 652e8bd..4fad08f 100644
> > --- a/kernel/sched_fair.c
> > +++ b/kernel/sched_fair.c
> > @@ -353,11 +353,25 @@ static void __dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se)
> > static struct sched_entity *__pick_next_entity(struct cfs_rq *cfs_rq)
> > {
> > struct rb_node *left = cfs_rq->rb_leftmost;
> > + struct sched_entity *se, *curr;
> >
> > if (!left)
> > return NULL;
> >
> > - return rb_entry(left, struct sched_entity, run_node);
> > + se = rb_entry(left, struct sched_entity, run_node);
> > + curr = &current->se;
> > +
> > + /*
> > + * Don't select the entity who just tried to schedule away
> > + * if there's another entity available.
> > + */
> > + if (unlikely(se == curr && cfs_rq->nr_running > 1)) {
> > + struct rb_node *next_node = rb_next(&curr->run_node);
> > + if (next_node)
> > + se = rb_entry(next_node, struct sched_entity, run_node);
> > + }
> > +
> > + return se;
> > }

Really hate this change though,. doesn't seem right to not pick the same
task again if its runnable. Bad for cache footprint.

The scenario is quite common for stuff like:

CPU0 CPU1

set_task_state(TASK_INTERRUPTIBLE)

if (cond)
goto out;
<--- ttwu()
schedule();


2009-09-19 02:16:51

by Mike Galbraith

[permalink] [raw]
Subject: Re: [PATCH] Prevent immediate process rescheduling

On Fri, 2009-09-18 at 22:03 +0200, Peter Zijlstra wrote:

> Really hate this change though,. doesn't seem right to not pick the same
> task again if its runnable. Bad for cache footprint.
>
> The scenario is quite common for stuff like:
>
> CPU0 CPU1
>
> set_task_state(TASK_INTERRUPTIBLE)
>
> if (cond)
> goto out;
> <--- ttwu()
> schedule();

It also resists the scheduler's built in need to close spread, too much
of which can seriously damage latency for others later when the friendly
task later decides it wants to run hard. OTOH, when it is a voluntary
schedule, not selecting another task is resisting the programmer.

Conundrum.

Another problem with that change is that it doesn't do diddly spit if
the top two are trying to yield, they'll just spin together.

Sounds like Mark's case really needs a gentle_yield() that moves the
task behind next, and slightly beyond if they are too close, but which
also ignores the request entirely if the gap is too large.

-Mike

2009-09-19 08:31:55

by Avi Kivity

[permalink] [raw]
Subject: Re: [PATCH] Prevent immediate process rescheduling

On 09/18/2009 11:03 PM, Peter Zijlstra wrote:
> On Fri, 2009-09-18 at 21:54 +0200, Ingo Molnar wrote:
>
>
>>> diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
>>> index 652e8bd..4fad08f 100644
>>> --- a/kernel/sched_fair.c
>>> +++ b/kernel/sched_fair.c
>>> @@ -353,11 +353,25 @@ static void __dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se)
>>> static struct sched_entity *__pick_next_entity(struct cfs_rq *cfs_rq)
>>> {
>>> struct rb_node *left = cfs_rq->rb_leftmost;
>>> + struct sched_entity *se, *curr;
>>>
>>> if (!left)
>>> return NULL;
>>>
>>> - return rb_entry(left, struct sched_entity, run_node);
>>> + se = rb_entry(left, struct sched_entity, run_node);
>>> + curr =&current->se;
>>> +
>>> + /*
>>> + * Don't select the entity who just tried to schedule away
>>> + * if there's another entity available.
>>> + */
>>> + if (unlikely(se == curr&& cfs_rq->nr_running> 1)) {
>>> + struct rb_node *next_node = rb_next(&curr->run_node);
>>> + if (next_node)
>>> + se = rb_entry(next_node, struct sched_entity, run_node);
>>> + }
>>> +
>>> + return se;
>>> }
>>>
> Really hate this change though,. doesn't seem right to not pick the same
> task again if its runnable. Bad for cache footprint.
>
> The scenario is quite common for stuff like:
>
> CPU0 CPU1
>
> set_task_state(TASK_INTERRUPTIBLE)
>
> if (cond)
> goto out;
> <--- ttwu()
> schedule();
>
>

I agree, yielding should be explicitly requested.

Also, on a heavily overcommitted box an undirected yield might take
quite a long time to find the thread that's holding the lock. I think a
yield_to() will be a lot better:

- we can pick one of the vcpus belonging to the same guest to improve
the probability that the lock actually get released
- we avoid an issue when the other vcpus are on different runqueues (in
which case the current patch does nothing)
- we can fix the accounting by donating vruntime from current to the
yielded-to vcpu


--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

2009-09-19 09:03:37

by Mike Galbraith

[permalink] [raw]
Subject: Re: [PATCH] Prevent immediate process rescheduling

On Sat, 2009-09-19 at 04:16 +0200, Mike Galbraith wrote:

> Sounds like Mark's case really needs a gentle_yield() that moves the
> task behind next, and slightly beyond if they are too close, but which
> also ignores the request entirely if the gap is too large.

Did that, it sucks. (not surprising really, yield _always_ sucked;)

-Mike

2009-09-20 21:03:38

by Langsdorf, Mark

[permalink] [raw]
Subject: RE: [PATCH] Prevent immediate process rescheduling

> > Really hate this change though,. doesn't seem right to not
> pick the same
> > task again if its runnable. Bad for cache footprint.
>
> I agree, yielding should be explicitly requested.
>
> Also, on a heavily overcommitted box an undirected yield might take
> quite a long time to find the thread that's holding the lock.
> I think a
> yield_to() will be a lot better:
>
> - we can pick one of the vcpus belonging to the same guest to improve
> the probability that the lock actually get released

Is there a way to find the other vcpus belonging to the
same guest? I poked around at that, but couldn't find
one.

> - we avoid an issue when the other vcpus are on different
> runqueues (in which case the current patch does nothing)
> - we can fix the accounting by donating vruntime from current to the
> yielded-to vcpu

I may need someone to walk me through the vruntime donation
but that's secondary to finding the other vcpus.

-Mark Langsdorf
Operating System Research Center
AMD

2009-09-20 21:56:18

by Avi Kivity

[permalink] [raw]
Subject: Re: [PATCH] Prevent immediate process rescheduling

On 09/21/2009 12:03 AM, Langsdorf, Mark wrote:
>> - we can pick one of the vcpus belonging to the same guest to improve
>> the probability that the lock actually get released
>>
> Is there a way to find the other vcpus belonging to the
> same guest? I poked around at that, but couldn't find
> one.
>

vcpu->kvm->vcpus[]. It doesn't give you the task_struct, but we can
easily save it in vcpu_load() and release it in vcpu_put().


--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.