2007-08-30 08:37:26

by Michal Schmidt

[permalink] [raw]
Subject: Re: [PATCH 2.6.21] Return available first timeslice to the creator, not parent

Vitaly Mayatskikh skrev:
> Short-living process returns its timeslice to the parent, this
> affects process that creates a lot of such short-living threads,
> because its not a parent for new threads.

I don't see the point of sending patches for old Linux versions such as
2.6.21, unless it's something applicable to the -stable tree.
Do recent kernels with CFS have the same problem?

> Patch fixes this issue and
> doesn't break kabi as does the patch from reporter:
> http://lkml.org/lkml/2007/4/7/21

There's no kabi.

Michal

2007-08-30 09:11:15

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 2.6.21] Return available first timeslice to the creator, not parent

On Thu, 2007-08-30 at 09:50 +0200, Vitaly Mayatskikh wrote:
> Short-living process returns its timeslice to the parent, this affects
> process that creates a lot of such short-living threads, because its
> not a parent for new threads. Patch fixes this issue and doesn't break
> kabi as does the patch from reporter: http://lkml.org/lkml/2007/4/7/21

> plain text document attachment (2.6.21-timeslice.patch), "proposed
> patch"
> diff -up -bB ./include/linux/sched.h.orig ./include/linux/sched.h
> --- ./include/linux/sched.h.orig 2007-08-21 09:20:22.000000000 +0200
> +++ ./include/linux/sched.h 2007-08-27 10:14:06.000000000 +0200
> @@ -827,7 +827,9 @@ struct task_struct {
>
> unsigned long policy;
> cpumask_t cpus_allowed;
> - unsigned int time_slice, first_time_slice;
> + unsigned int time_slice;
> + /* Pid of creator */
> + unsigned int cpid;

might as well make that pid_t, or maybe even a struct pid* and keep a
reference on it - the struct pid police might have an opinion.

> #if defined(CONFIG_SCHEDSTATS) || defined(CONFIG_TASK_DELAY_ACCT)
> struct sched_info sched_info;
> diff -up -bB ./kernel/sched.c.orig ./kernel/sched.c
> --- ./kernel/sched.c.orig 2007-08-21 09:20:22.000000000 +0200
> +++ ./kernel/sched.c 2007-08-27 10:18:44.000000000 +0200
> @@ -1626,9 +1626,9 @@ void fastcall sched_fork(struct task_str
> p->time_slice = (current->time_slice + 1) >> 1;
> /*
> * The remainder of the first timeslice might be recovered by
> - * the parent if the child exits early enough.
> + * the creator (not parent!) if the child exits early enough.
> */
> - p->first_time_slice = 1;
> + p->cpid = current->pid;
> current->time_slice >>= 1;
> p->timestamp = sched_clock();
> if (unlikely(!current->time_slice)) {
> @@ -1728,33 +1728,46 @@ void fastcall wake_up_new_task(struct ta
>
> /*
> * Potentially available exiting-child timeslices are
> - * retrieved here - this way the parent does not get
> + * retrieved here - this way the creator does not get
> * penalized for creating too many threads.
> *
> * (this cannot be used to 'generate' timeslices
> * artificially, because any timeslice recovered here
> - * was given away by the parent in the first place.)
> + * was given away by the creator in the first place.)
> */
> void fastcall sched_exit(struct task_struct *p)
> {
> unsigned long flags;
> struct rq *rq;
> -
> + struct task_struct* creator = NULL;
> /*
> * If the child was a (relative-) CPU hog then decrease
> - * the sleep_avg of the parent as well.
> + * the sleep_avg of the creator as well.
> */
> - rq = task_rq_lock(p->parent, &flags);
> - if (p->first_time_slice && task_cpu(p) == task_cpu(p->parent)) {
> - p->parent->time_slice += p->time_slice;
> - if (unlikely(p->parent->time_slice > task_timeslice(p)))
> - p->parent->time_slice = task_timeslice(p);
> + if (p->cpid) {
> + struct pid *pid = find_get_pid((pid_t)p->cpid);
> + if (pid) {
> + creator = get_pid_task(pid, PIDTYPE_PID);
> + put_pid(pid);
> }
> - if (p->sleep_avg < p->parent->sleep_avg)
> - p->parent->sleep_avg = p->parent->sleep_avg /
> +
> + if (creator) {
> + if (task_cpu(p) == task_cpu(creator)) {
> + rq = task_rq_lock(creator, &flags);
> +
> + creator->time_slice += p->time_slice;
> + if (unlikely(creator->time_slice > task_timeslice(p)))
> + creator->time_slice = task_timeslice(p);
> +
> + if (p->sleep_avg < creator->sleep_avg)
> + creator->sleep_avg = creator->sleep_avg /
> (EXIT_WEIGHT + 1) * EXIT_WEIGHT + p->sleep_avg /
> (EXIT_WEIGHT + 1);
> task_rq_unlock(rq, &flags);
> + }
> + put_task_struct(creator);
> + }
> + }
> }
>
> /**
> @@ -3153,7 +3166,7 @@ static void task_running_tick(struct rq
> */
> if ((p->policy == SCHED_RR) && !--p->time_slice) {
> p->time_slice = task_timeslice(p);
> - p->first_time_slice = 0;
> + p->cpid = 0;
> set_tsk_need_resched(p);
>
> /* put it at the end of the queue: */
> @@ -3166,7 +3179,7 @@ static void task_running_tick(struct rq
> set_tsk_need_resched(p);
> p->prio = effective_prio(p);
> p->time_slice = task_timeslice(p);
> - p->first_time_slice = 0;
> + p->cpid = 0;
>
> if (!rq->expired_timestamp)
> rq->expired_timestamp = jiffies;

Other than that it looks good, pretty much what I suggested :-)

2007-08-30 09:15:17

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 2.6.21] Return available first timeslice to the creator, not parent

On Thu, 2007-08-30 at 10:37 +0200, Michal Schmidt wrote:
> Vitaly Mayatskikh skrev:
> > Short-living process returns its timeslice to the parent, this
> > affects process that creates a lot of such short-living threads,
> > because its not a parent for new threads.
>
> I don't see the point of sending patches for old Linux versions such as
> 2.6.21, unless it's something applicable to the -stable tree.

The older trees might want to have this, perhaps the .16 by Adrian,
certainly distros still care.

> Do recent kernels with CFS have the same problem?

Very much not comparable, as you probably guessed :-)

> > Patch fixes this issue and
> > doesn't break kabi as does the patch from reporter:
> > http://lkml.org/lkml/2007/4/7/21
>
> There's no kabi.

True.

2007-08-30 09:49:29

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 2.6.21] Return available first timeslice to the creator, not parent

On 08/30, Peter Zijlstra wrote:
>
> On Thu, 2007-08-30 at 09:50 +0200, Vitaly Mayatskikh wrote:
> > Short-living process returns its timeslice to the parent, this affects
> > process that creates a lot of such short-living threads, because its
> > not a parent for new threads. Patch fixes this issue and doesn't break
> > kabi as does the patch from reporter: http://lkml.org/lkml/2007/4/7/21
>
> > plain text document attachment (2.6.21-timeslice.patch), "proposed
> > patch"
> > diff -up -bB ./include/linux/sched.h.orig ./include/linux/sched.h
> > --- ./include/linux/sched.h.orig 2007-08-21 09:20:22.000000000 +0200
> > +++ ./include/linux/sched.h 2007-08-27 10:14:06.000000000 +0200
> > @@ -827,7 +827,9 @@ struct task_struct {
> >
> > unsigned long policy;
> > cpumask_t cpus_allowed;
> > - unsigned int time_slice, first_time_slice;
> > + unsigned int time_slice;
> > + /* Pid of creator */
> > + unsigned int cpid;
>
> might as well make that pid_t, or maybe even a struct pid* and keep a
> reference on it - the struct pid police might have an opinion.

I agree, "struct pid*" is better, because

1. we don't need a costly find_pid() in sched_exit()
2. we don't suffer from pid re-use problem

> > void fastcall sched_exit(struct task_struct *p)
> > {
> > unsigned long flags;
> > struct rq *rq;
> > -
> > + struct task_struct* creator = NULL;
> > /*
> > * If the child was a (relative-) CPU hog then decrease
> > - * the sleep_avg of the parent as well.
> > + * the sleep_avg of the creator as well.
> > */
> > - rq = task_rq_lock(p->parent, &flags);
> > - if (p->first_time_slice && task_cpu(p) == task_cpu(p->parent)) {
> > - p->parent->time_slice += p->time_slice;
> > - if (unlikely(p->parent->time_slice > task_timeslice(p)))
> > - p->parent->time_slice = task_timeslice(p);
> > + if (p->cpid) {
> > + struct pid *pid = find_get_pid((pid_t)p->cpid);
> > + if (pid) {
> > + creator = get_pid_task(pid, PIDTYPE_PID);
> > + put_pid(pid);

sched_exit() was removed in 2.6.23-rc.

If you are going to re-introduce this logic, please don't do sched_exit()
from release_task(). It was done this way just because we can't access
->parent after release_task(). But release_task() is called either too
early, or too late for timeslice accounting, depending on ->exit_signal == -1.

I'd suggest to do this in do_exit(), before the last schedule(). Without
write_unlock_irq() the code above needs a couple of rcu_read_lock()'s.

I am not sure Ingo will like this change though...

Oleg.

2007-08-30 09:52:22

by Pavel Emelyanov

[permalink] [raw]
Subject: Re: [PATCH 2.6.21] Return available first timeslice to the creator, not parent

Peter Zijlstra wrote:
> On Thu, 2007-08-30 at 09:50 +0200, Vitaly Mayatskikh wrote:
>> Short-living process returns its timeslice to the parent, this affects
>> process that creates a lot of such short-living threads, because its
>> not a parent for new threads. Patch fixes this issue and doesn't break
>> kabi as does the patch from reporter: http://lkml.org/lkml/2007/4/7/21
>
>> plain text document attachment (2.6.21-timeslice.patch), "proposed
>> patch"
>> diff -up -bB ./include/linux/sched.h.orig ./include/linux/sched.h
>> --- ./include/linux/sched.h.orig 2007-08-21 09:20:22.000000000 +0200
>> +++ ./include/linux/sched.h 2007-08-27 10:14:06.000000000 +0200
>> @@ -827,7 +827,9 @@ struct task_struct {
>>
>> unsigned long policy;
>> cpumask_t cpus_allowed;
>> - unsigned int time_slice, first_time_slice;
>> + unsigned int time_slice;
>> + /* Pid of creator */
>> + unsigned int cpid;
>
> might as well make that pid_t, or maybe even a struct pid* and keep a
> reference on it - the struct pid police might have an opinion.

Store the struct pid reference itself. In any case you make
the find_get_pid() later to obtain the struct pid itself. This
will even make the sched_exit() work faster.

>> #if defined(CONFIG_SCHEDSTATS) || defined(CONFIG_TASK_DELAY_ACCT)
>> struct sched_info sched_info;
>> diff -up -bB ./kernel/sched.c.orig ./kernel/sched.c
>> --- ./kernel/sched.c.orig 2007-08-21 09:20:22.000000000 +0200
>> +++ ./kernel/sched.c 2007-08-27 10:18:44.000000000 +0200
>> @@ -1626,9 +1626,9 @@ void fastcall sched_fork(struct task_str
>> p->time_slice = (current->time_slice + 1) >> 1;
>> /*
>> * The remainder of the first timeslice might be recovered by
>> - * the parent if the child exits early enough.
>> + * the creator (not parent!) if the child exits early enough.
>> */
>> - p->first_time_slice = 1;
>> + p->cpid = current->pid;
>> current->time_slice >>= 1;
>> p->timestamp = sched_clock();
>> if (unlikely(!current->time_slice)) {
>> @@ -1728,33 +1728,46 @@ void fastcall wake_up_new_task(struct ta
>>
>> /*
>> * Potentially available exiting-child timeslices are
>> - * retrieved here - this way the parent does not get
>> + * retrieved here - this way the creator does not get
>> * penalized for creating too many threads.
>> *
>> * (this cannot be used to 'generate' timeslices
>> * artificially, because any timeslice recovered here
>> - * was given away by the parent in the first place.)
>> + * was given away by the creator in the first place.)
>> */
>> void fastcall sched_exit(struct task_struct *p)
>> {
>> unsigned long flags;
>> struct rq *rq;
>> -
>> + struct task_struct* creator = NULL;
>> /*
>> * If the child was a (relative-) CPU hog then decrease
>> - * the sleep_avg of the parent as well.
>> + * the sleep_avg of the creator as well.
>> */
>> - rq = task_rq_lock(p->parent, &flags);
>> - if (p->first_time_slice && task_cpu(p) == task_cpu(p->parent)) {
>> - p->parent->time_slice += p->time_slice;
>> - if (unlikely(p->parent->time_slice > task_timeslice(p)))
>> - p->parent->time_slice = task_timeslice(p);
>> + if (p->cpid) {
>> + struct pid *pid = find_get_pid((pid_t)p->cpid);
>> + if (pid) {
>> + creator = get_pid_task(pid, PIDTYPE_PID);
>> + put_pid(pid);
>> }
>> - if (p->sleep_avg < p->parent->sleep_avg)
>> - p->parent->sleep_avg = p->parent->sleep_avg /
>> +
>> + if (creator) {
>> + if (task_cpu(p) == task_cpu(creator)) {
>> + rq = task_rq_lock(creator, &flags);
>> +
>> + creator->time_slice += p->time_slice;
>> + if (unlikely(creator->time_slice > task_timeslice(p)))
>> + creator->time_slice = task_timeslice(p);
>> +
>> + if (p->sleep_avg < creator->sleep_avg)
>> + creator->sleep_avg = creator->sleep_avg /
>> (EXIT_WEIGHT + 1) * EXIT_WEIGHT + p->sleep_avg /
>> (EXIT_WEIGHT + 1);
>> task_rq_unlock(rq, &flags);
>> + }
>> + put_task_struct(creator);
>> + }
>> + }
>> }
>>
>> /**
>> @@ -3153,7 +3166,7 @@ static void task_running_tick(struct rq
>> */
>> if ((p->policy == SCHED_RR) && !--p->time_slice) {
>> p->time_slice = task_timeslice(p);
>> - p->first_time_slice = 0;
>> + p->cpid = 0;
>> set_tsk_need_resched(p);
>>
>> /* put it at the end of the queue: */
>> @@ -3166,7 +3179,7 @@ static void task_running_tick(struct rq
>> set_tsk_need_resched(p);
>> p->prio = effective_prio(p);
>> p->time_slice = task_timeslice(p);
>> - p->first_time_slice = 0;
>> + p->cpid = 0;
>>
>> if (!rq->expired_timestamp)
>> rq->expired_timestamp = jiffies;
>
> Other than that it looks good, pretty much what I suggested :-)
>
>

2007-08-30 09:57:34

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 2.6.21] Return available first timeslice to the creator, not parent

On Thu, 2007-08-30 at 13:49 +0400, Oleg Nesterov wrote:
> On 08/30, Peter Zijlstra wrote:
> >
> > On Thu, 2007-08-30 at 09:50 +0200, Vitaly Mayatskikh wrote:
> > > Short-living process returns its timeslice to the parent, this affects
> > > process that creates a lot of such short-living threads, because its
> > > not a parent for new threads. Patch fixes this issue and doesn't break
> > > kabi as does the patch from reporter: http://lkml.org/lkml/2007/4/7/21
> >
> > > plain text document attachment (2.6.21-timeslice.patch), "proposed
> > > patch"
> > > diff -up -bB ./include/linux/sched.h.orig ./include/linux/sched.h
> > > --- ./include/linux/sched.h.orig 2007-08-21 09:20:22.000000000 +0200
> > > +++ ./include/linux/sched.h 2007-08-27 10:14:06.000000000 +0200
> > > @@ -827,7 +827,9 @@ struct task_struct {
> > >
> > > unsigned long policy;
> > > cpumask_t cpus_allowed;
> > > - unsigned int time_slice, first_time_slice;
> > > + unsigned int time_slice;
> > > + /* Pid of creator */
> > > + unsigned int cpid;
> >
> > might as well make that pid_t, or maybe even a struct pid* and keep a
> > reference on it - the struct pid police might have an opinion.
>
> I agree, "struct pid*" is better, because
>
> 1. we don't need a costly find_pid() in sched_exit()
> 2. we don't suffer from pid re-use problem

Ah, good arguments indeed.

> > > void fastcall sched_exit(struct task_struct *p)
> > > {
> > > unsigned long flags;
> > > struct rq *rq;
> > > -
> > > + struct task_struct* creator = NULL;
> > > /*
> > > * If the child was a (relative-) CPU hog then decrease
> > > - * the sleep_avg of the parent as well.
> > > + * the sleep_avg of the creator as well.
> > > */
> > > - rq = task_rq_lock(p->parent, &flags);
> > > - if (p->first_time_slice && task_cpu(p) == task_cpu(p->parent)) {
> > > - p->parent->time_slice += p->time_slice;
> > > - if (unlikely(p->parent->time_slice > task_timeslice(p)))
> > > - p->parent->time_slice = task_timeslice(p);
> > > + if (p->cpid) {
> > > + struct pid *pid = find_get_pid((pid_t)p->cpid);
> > > + if (pid) {
> > > + creator = get_pid_task(pid, PIDTYPE_PID);
> > > + put_pid(pid);
>
> sched_exit() was removed in 2.6.23-rc.
>
> If you are going to re-introduce this logic, please don't do sched_exit()
> from release_task(). It was done this way just because we can't access
> ->parent after release_task(). But release_task() is called either too
> early, or too late for timeslice accounting, depending on ->exit_signal == -1.
>
> I'd suggest to do this in do_exit(), before the last schedule(). Without
> write_unlock_irq() the code above needs a couple of rcu_read_lock()'s.
>
> I am not sure Ingo will like this change though...

This is not intended as re-introduction of the feature, this stems from
fixing this issue in older (read distro) kernels.

2007-08-30 10:09:26

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 2.6.21] Return available first timeslice to the creator, not parent

On 08/30, Peter Zijlstra wrote:
>
> On Thu, 2007-08-30 at 13:49 +0400, Oleg Nesterov wrote:
> >
> > sched_exit() was removed in 2.6.23-rc.
> >
> > If you are going to re-introduce this logic, please don't do sched_exit()
> > from release_task(). It was done this way just because we can't access
> > ->parent after release_task(). But release_task() is called either too
> > early, or too late for timeslice accounting, depending on ->exit_signal == -1.
> >
> > I'd suggest to do this in do_exit(), before the last schedule(). Without
> > write_unlock_irq() the code above needs a couple of rcu_read_lock()'s.
> >
> > I am not sure Ingo will like this change though...
>
> This is not intended as re-introduction of the feature, this stems from
> fixing this issue in older (read distro) kernels.

Ah, good, sorry for noise then.

In that case I don't think it makes sense to move sched_exit() to do_exit(),
of course. This doesn't look suitable for the -stable tree.

Oleg.