2007-09-25 14:45:14

by Ingo Molnar

[permalink] [raw]
Subject: [git] CFS-devel, latest code


The latest sched-devel.git tree can be pulled from:

git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched-devel.git

This is a quick iteration after yesterday's: a couple of group
scheduling bugs were found/debugged and fixed by Srivatsa Vaddagiri and
Mike Galbraith. There's also a yield fix from Dmitry Adamushko, a build
fix from S.Ceglar Onur and Andrew Morton, a cleanup from Hiroshi
Shimamoto and the usual stream of goodies from Peter Zijlstra. Rebased
it to -rc8 as well.

there are no known regressions at the moment in the sched-devel.git
codebase. (yay :)

Ingo

----------------------------------------->
the shortlog relative to 2.6.23-rc8:

Dmitry Adamushko (9):
sched: clean up struct load_stat
sched: clean up schedstat block in dequeue_entity()
sched: sched_setscheduler() fix
sched: add set_curr_task() calls
sched: do not keep current in the tree and get rid of sched_entity::fair_key
sched: optimize task_new_fair()
sched: simplify sched_class::yield_task()
sched: rework enqueue/dequeue_entity() to get rid of set_curr_task()
sched: yield fix

Hiroshi Shimamoto (1):
sched: clean up sched_fork()

Ingo Molnar (44):
sched: fix new-task method
sched: resched task in task_new_fair()
sched: small sched_debug cleanup
sched: debug: track maximum 'slice'
sched: uniform tunings
sched: use constants if !CONFIG_SCHED_DEBUG
sched: remove stat_gran
sched: remove precise CPU load
sched: remove precise CPU load calculations #2
sched: track cfs_rq->curr on !group-scheduling too
sched: cleanup: simplify cfs_rq_curr() methods
sched: uninline __enqueue_entity()/__dequeue_entity()
sched: speed up update_load_add/_sub()
sched: clean up calc_weighted()
sched: introduce se->vruntime
sched: move sched_feat() definitions
sched: optimize vruntime based scheduling
sched: simplify check_preempt() methods
sched: wakeup granularity fix
sched: add se->vruntime debugging
sched: add more vruntime statistics
sched: debug: update exec_clock only when SCHED_DEBUG
sched: remove wait_runtime limit
sched: remove wait_runtime fields and features
sched: x86: allow single-depth wchan output
sched: fix delay accounting performance regression
sched: prettify /proc/sched_debug output
sched: enhance debug output
sched: kernel/sched_fair.c whitespace cleanups
sched: fair-group sched, cleanups
sched: enable CONFIG_FAIR_GROUP_SCHED=y by default
sched debug: BKL usage statistics
sched: remove unneeded tunables
sched debug: print settings
sched debug: more width for parameter printouts
sched: entity_key() fix
sched: remove condition from set_task_cpu()
sched: remove last_min_vruntime effect
sched: undo some of the recent changes
sched: fix place_entity()
sched: fix sched_fork()
sched: remove set_leftmost()
sched: clean up schedstats, cnt -> count
sched: cleanup, remove stale comment

Matthias Kaehlcke (1):
sched: use list_for_each_entry_safe() in __wake_up_common()

Mike Galbraith (2):
sched: fix SMP migration latencies
sched: fix formatting of /proc/sched_debug

Peter Zijlstra (12):
sched: simplify SCHED_FEAT_* code
sched: new task placement for vruntime
sched: simplify adaptive latency
sched: clean up new task placement
sched: add tree based averages
sched: handle vruntime overflow
sched: better min_vruntime tracking
sched: add vslice
sched debug: check spread
sched: max_vruntime() simplification
sched: clean up min_vruntime use
sched: speed up and simplify vslice calculations

S.Ceglar Onur (1):
sched debug: BKL usage statistics, fix

Srivatsa Vaddagiri (9):
sched: group-scheduler core
sched: revert recent removal of set_curr_task()
sched: fix minor bug in yield
sched: print nr_running and load in /proc/sched_debug
sched: print &rq->cfs stats
sched: clean up code under CONFIG_FAIR_GROUP_SCHED
sched: add fair-user scheduler
sched: group scheduler wakeup latency fix
sched: group scheduler SMP migration fix

arch/i386/Kconfig | 11
fs/proc/base.c | 2
include/linux/sched.h | 55 ++-
init/Kconfig | 21 +
kernel/delayacct.c | 2
kernel/sched.c | 577 +++++++++++++++++++++++++-------------
kernel/sched_debug.c | 250 +++++++++++-----
kernel/sched_fair.c | 718 +++++++++++++++++-------------------------------
kernel/sched_idletask.c | 5
kernel/sched_rt.c | 12
kernel/sched_stats.h | 28 -
kernel/sysctl.c | 31 --
kernel/user.c | 43 ++
13 files changed, 954 insertions(+), 801 deletions(-)


2007-09-25 15:54:04

by Srivatsa Vaddagiri

[permalink] [raw]
Subject: Re: [git] CFS-devel, latest code

On Tue, Sep 25, 2007 at 04:44:43PM +0200, Ingo Molnar wrote:
>
> The latest sched-devel.git tree can be pulled from:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched-devel.git
>

This is required for it to compile.

---
include/linux/sched.h | 1 +
1 files changed, 1 insertion(+)

Index: current/include/linux/sched.h
===================================================================
--- current.orig/include/linux/sched.h
+++ current/include/linux/sched.h
@@ -1404,6 +1404,7 @@ extern unsigned int sysctl_sched_wakeup_
extern unsigned int sysctl_sched_batch_wakeup_granularity;
extern unsigned int sysctl_sched_child_runs_first;
extern unsigned int sysctl_sched_features;
+extern unsigned int sysctl_sched_nr_latency;
#endif

extern unsigned int sysctl_sched_compat_yield;

--
Regards,
vatsa

2007-09-25 15:57:38

by Srivatsa Vaddagiri

[permalink] [raw]
Subject: Re: [git] CFS-devel, latest code

On Tue, Sep 25, 2007 at 09:34:20PM +0530, Srivatsa Vaddagiri wrote:
> On Tue, Sep 25, 2007 at 04:44:43PM +0200, Ingo Molnar wrote:
> >
> > The latest sched-devel.git tree can be pulled from:
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched-devel.git
> >
>
> This is required for it to compile.
>
> ---
> include/linux/sched.h | 1 +
> 1 files changed, 1 insertion(+)
>
> Index: current/include/linux/sched.h
> ===================================================================
> --- current.orig/include/linux/sched.h
> +++ current/include/linux/sched.h
> @@ -1404,6 +1404,7 @@ extern unsigned int sysctl_sched_wakeup_
> extern unsigned int sysctl_sched_batch_wakeup_granularity;
> extern unsigned int sysctl_sched_child_runs_first;
> extern unsigned int sysctl_sched_features;
> +extern unsigned int sysctl_sched_nr_latency;
> #endif
>
> extern unsigned int sysctl_sched_compat_yield;

and this:

---
kernel/sched_debug.c | 1 -
1 files changed, 1 deletion(-)

Index: current/kernel/sched_debug.c
===================================================================
--- current.orig/kernel/sched_debug.c
+++ current/kernel/sched_debug.c
@@ -210,7 +210,6 @@ static int sched_debug_show(struct seq_f
#define PN(x) \
SEQ_printf(m, " .%-40s: %Ld.%06ld\n", #x, SPLIT_NS(x))
PN(sysctl_sched_latency);
- PN(sysctl_sched_min_granularity);
PN(sysctl_sched_wakeup_granularity);
PN(sysctl_sched_batch_wakeup_granularity);
PN(sysctl_sched_child_runs_first);

--
Regards,
vatsa

2007-09-25 16:14:43

by Srivatsa Vaddagiri

[permalink] [raw]
Subject: [PATCH 0/3] More group scheduler related fixes

On Tue, Sep 25, 2007 at 04:44:43PM +0200, Ingo Molnar wrote:
>
> The latest sched-devel.git tree can be pulled from:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched-devel.git
>

Ingo,
Few more patches follow on top of this latest sched-devel tree.

Pls consider for inclusion.

--
Regards,
vatsa

2007-09-25 16:18:15

by Srivatsa Vaddagiri

[permalink] [raw]
Subject: [PATCH 1/3] Fix coding style

Fix coding style issues reported by Randy Dunlap and others

Signed-off-by : Dhaval Giani <[email protected]>
Signed-off-by : Srivatsa Vaddagiri <[email protected]>

---
init/Kconfig | 14 +++++++-------
kernel/sched_debug.c | 8 ++------
2 files changed, 9 insertions(+), 13 deletions(-)

Index: current/init/Kconfig
===================================================================
--- current.orig/init/Kconfig
+++ current/init/Kconfig
@@ -282,11 +282,11 @@ config CPUSETS
Say N if unsure.

config FAIR_GROUP_SCHED
- bool "Fair group cpu scheduler"
+ bool "Fair group CPU scheduler"
default y
depends on EXPERIMENTAL
help
- This feature lets cpu scheduler recognize task groups and control cpu
+ This feature lets CPU scheduler recognize task groups and control CPU
bandwidth allocation to such task groups.

choice
@@ -294,11 +294,11 @@ choice
prompt "Basis for grouping tasks"
default FAIR_USER_SCHED

- config FAIR_USER_SCHED
- bool "user id"
- help
- This option will choose userid as the basis for grouping
- tasks, thus providing equal cpu bandwidth to each user.
+config FAIR_USER_SCHED
+ bool "user id"
+ help
+ This option will choose userid as the basis for grouping
+ tasks, thus providing equal CPU bandwidth to each user.

endchoice

Index: current/kernel/sched_debug.c
===================================================================
--- current.orig/kernel/sched_debug.c
+++ current/kernel/sched_debug.c
@@ -239,11 +239,7 @@ static int
root_user_share_read_proc(char *page, char **start, off_t off, int count,
int *eof, void *data)
{
- int len;
-
- len = sprintf(page, "%d\n", init_task_grp_load);
-
- return len;
+ return sprintf(page, "%d\n", init_task_grp_load);
}

static int
@@ -297,7 +293,7 @@ static int __init init_sched_debug_procf
pe->proc_fops = &sched_debug_fops;

#ifdef CONFIG_FAIR_USER_SCHED
- pe = create_proc_entry("root_user_share", 0644, NULL);
+ pe = create_proc_entry("root_user_cpu_share", 0644, NULL);
if (!pe)
return -ENOMEM;



--
Regards,
vatsa

2007-09-25 16:22:28

by Srivatsa Vaddagiri

[permalink] [raw]
Subject: [PATCH 2/3] Fix size bloat for !CONFIG_FAIR_GROUP_SCHED

Recent fix to check_preempt_wakeup() to check for preemption at higher
levels caused a size bloat for !CONFIG_FAIR_GROUP_SCHED.

Fix the problem.

42277 10598 320 53195 cfcb kernel/sched.o-before_this_patch
42216 10598 320 53134 cf8e kernel/sched.o-after_this_patch


Signed-off-by : Srivatsa Vaddagiri <[email protected]>


---
kernel/sched_fair.c | 43 +++++++++++++++++++++++++------------------
1 files changed, 25 insertions(+), 18 deletions(-)

Index: current/kernel/sched_fair.c
===================================================================
--- current.orig/kernel/sched_fair.c
+++ current/kernel/sched_fair.c
@@ -640,15 +640,21 @@ static inline struct cfs_rq *cpu_cfs_rq(
#define for_each_leaf_cfs_rq(rq, cfs_rq) \
list_for_each_entry(cfs_rq, &rq->leaf_cfs_rq_list, leaf_cfs_rq_list)

-/* Do the two (enqueued) tasks belong to the same group ? */
-static inline int is_same_group(struct task_struct *curr, struct task_struct *p)
+/* Do the two (enqueued) entities belong to the same group ? */
+static inline int
+is_same_group(struct sched_entity *se, struct sched_entity *pse)
{
- if (curr->se.cfs_rq == p->se.cfs_rq)
+ if (se->cfs_rq == pse->cfs_rq)
return 1;

return 0;
}

+static inline struct sched_entity *parent_entity(struct sched_entity *se)
+{
+ return se->parent;
+}
+
#else /* CONFIG_FAIR_GROUP_SCHED */

#define for_each_sched_entity(se) \
@@ -681,11 +687,17 @@ static inline struct cfs_rq *cpu_cfs_rq(
#define for_each_leaf_cfs_rq(rq, cfs_rq) \
for (cfs_rq = &rq->cfs; cfs_rq; cfs_rq = NULL)

-static inline int is_same_group(struct task_struct *curr, struct task_struct *p)
+static inline int
+is_same_group(struct sched_entity *se, struct sched_entity *pse)
{
return 1;
}

+static inline struct sched_entity *parent_entity(struct sched_entity *se)
+{
+ return NULL;
+}
+
#endif /* CONFIG_FAIR_GROUP_SCHED */

/*
@@ -775,8 +787,9 @@ static void yield_task_fair(struct rq *r
static void check_preempt_wakeup(struct rq *rq, struct task_struct *p)
{
struct task_struct *curr = rq->curr;
- struct cfs_rq *cfs_rq = task_cfs_rq(curr), *pcfs_rq;
+ struct cfs_rq *cfs_rq = task_cfs_rq(curr);
struct sched_entity *se = &curr->se, *pse = &p->se;
+ s64 delta;

if (unlikely(rt_prio(p->prio))) {
update_rq_clock(rq);
@@ -785,21 +798,15 @@ static void check_preempt_wakeup(struct
return;
}

- for_each_sched_entity(se) {
- cfs_rq = cfs_rq_of(se);
- pcfs_rq = cfs_rq_of(pse);
+ while (!is_same_group(se, pse)) {
+ se = parent_entity(se);
+ pse = parent_entity(pse);
+ }

- if (cfs_rq == pcfs_rq) {
- s64 delta = se->vruntime - pse->vruntime;
+ delta = se->vruntime - pse->vruntime;

- if (delta > (s64)sysctl_sched_wakeup_granularity)
- resched_task(curr);
- break;
- }
-#ifdef CONFIG_FAIR_GROUP_SCHED
- pse = pse->parent;
-#endif
- }
+ if (delta > (s64)sysctl_sched_wakeup_granularity)
+ resched_task(curr);
}

static struct task_struct *pick_next_task_fair(struct rq *rq)


--
Regards,
vatsa

2007-09-25 16:27:21

by Srivatsa Vaddagiri

[permalink] [raw]
Subject: [PATCH 3/3] Fix other possible sources of latency issues

There is a possibility that because of task of a group moving from one
cpu to another, it may gain more cpu time that desired. See
http://marc.info/?l=linux-kernel&m=119073197730334 for details.

This is an attempt to fix that problem. Basically it simulates dequeue
of higher level entities as if they are going to sleep. Similarly it
simulate wakeup of higher level entities as if they are waking up from
sleep.

Signed-off-by : Srivatsa Vaddagiri <[email protected]>

---
kernel/sched_fair.c | 2 ++
1 files changed, 2 insertions(+)

Index: current/kernel/sched_fair.c
===================================================================
--- current.orig/kernel/sched_fair.c
+++ current/kernel/sched_fair.c
@@ -715,6 +715,7 @@ static void enqueue_task_fair(struct rq
break;
cfs_rq = cfs_rq_of(se);
enqueue_entity(cfs_rq, se, wakeup);
+ wakeup = 1;
}
}

@@ -734,6 +735,7 @@ static void dequeue_task_fair(struct rq
/* Don't dequeue parent if it has other entities besides us */
if (cfs_rq->load.weight)
break;
+ sleep = 1;
}
}

--
Regards,
vatsa

2007-09-25 18:33:43

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 0/3] More group scheduler related fixes


* Srivatsa Vaddagiri <[email protected]> wrote:

> On Tue, Sep 25, 2007 at 04:44:43PM +0200, Ingo Molnar wrote:
> >
> > The latest sched-devel.git tree can be pulled from:
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched-devel.git
> >
>
> Ingo,
> Few more patches follow on top of this latest sched-devel tree.
>
> Pls consider for inclusion.

thanks, applied.

Ingo

2007-09-25 19:14:13

by Ingo Oeser

[permalink] [raw]
Subject: Re: [PATCH 1/3] Fix coding style

On Tuesday 25 September 2007, Srivatsa Vaddagiri wrote:
> Index: current/kernel/sched_debug.c
> ===================================================================
> --- current.orig/kernel/sched_debug.c
> +++ current/kernel/sched_debug.c
> @@ -239,11 +239,7 @@ static int
> root_user_share_read_proc(char *page, char **start, off_t off, int count,
> int *eof, void *data)
> {
> - int len;
> -
> - len = sprintf(page, "%d\n", init_task_grp_load);
> -
> - return len;
> + return sprintf(page, "%d\n", init_task_grp_load);
> }
>
> static int
> @@ -297,7 +293,7 @@ static int __init init_sched_debug_procf
> pe->proc_fops = &sched_debug_fops;
>
> #ifdef CONFIG_FAIR_USER_SCHED
> - pe = create_proc_entry("root_user_share", 0644, NULL);
> + pe = create_proc_entry("root_user_cpu_share", 0644, NULL);
> if (!pe)
> return -ENOMEM;

What about moving this debug stuff under debugfs?
Please consider using the functions in <linux/debugfs.h> .
They compile into nothing, if DEBUGFS is not compiled in
and have already useful functions for reading/writing integers
and booleans.

Best Regards

Ingo Oeser

2007-09-25 20:47:57

by Kyle Moffett

[permalink] [raw]
Subject: Re: [PATCH 1/3] Fix coding style

On Sep 25, 2007, at 15:16:20, Ingo Oeser wrote:
> On Tuesday 25 September 2007, Srivatsa Vaddagiri wrote:
>> @@ -297,7 +293,7 @@ static int __init init_sched_debug_procf
>> pe->proc_fops = &sched_debug_fops;
>>
>> #ifdef CONFIG_FAIR_USER_SCHED
>> - pe = create_proc_entry("root_user_share", 0644, NULL);
>> + pe = create_proc_entry("root_user_cpu_share", 0644, NULL);
>> if (!pe)
>> return -ENOMEM;
>
> What about moving this debug stuff under debugfs? Please consider
> using the functions in <linux/debugfs.h>. They compile into
> nothing, if DEBUGFS is not compiled in and have already useful
> functions for reading/writing integers and booleans.

Umm, that's not a debugging thing. It appears to be a tunable
allowing you to configure what percentage of the total CPU that UID 0
gets which is likely to be useful to configure on production systems;
at least until better group-scheduling tools are produced.

Cheers,
Kyle Moffett

2007-09-25 21:35:44

by Dmitry Adamushko

[permalink] [raw]
Subject: Re: [git] CFS-devel, latest code


humm... I think, it'd be safer to have something like the following
change in place.

The thing is that __pick_next_entity() must never be called when
first_fair(cfs_rq) == NULL. It wouldn't be a problem, should 'run_node'
be the very first field of 'struct sched_entity' (and it's the second).

The 'nr_running != 0' check is _not_ enough, due to the fact that
'current' is not within the tree. Generic paths are ok (e.g. schedule()
as put_prev_task() is called previously)... I'm more worried about e.g.
migration_call() -> CPU_DEAD_FROZEN -> migrate_dead_tasks()... if
'current' == rq->idle, no problems.. if it's one of the SCHED_NORMAL
tasks (or imagine, some other use-cases in the future -- i.e. we should
not make outer world dependent on internal details of sched_fair class)
-- it may be "Houston, we've got a problem" case.

it's +16 bytes to the ".text". Another variant is to make 'run_node' the
first data member of 'struct sched_entity' but an additional check (se !
= NULL) is still needed in pick_next_entity().

what do you think?


---
diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index dae714a..33b2376 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -563,9 +563,12 @@ set_next_entity(struct cfs_rq *cfs_rq, struct sched_entity *se)

static struct sched_entity *pick_next_entity(struct cfs_rq *cfs_rq)
{
- struct sched_entity *se = __pick_next_entity(cfs_rq);
-
- set_next_entity(cfs_rq, se);
+ struct sched_entity *se = NULL;
+
+ if (first_fair(cfs_rq)) {
+ se = __pick_next_entity(cfs_rq);
+ set_next_entity(cfs_rq, se);
+ }

return se;
}

---


2007-09-26 02:06:46

by Dhaval Giani

[permalink] [raw]
Subject: Re: [PATCH 1/3] Fix coding style

On Tue, Sep 25, 2007 at 09:16:20PM +0200, Ingo Oeser wrote:
> On Tuesday 25 September 2007, Srivatsa Vaddagiri wrote:
> > Index: current/kernel/sched_debug.c
> > ===================================================================
> > --- current.orig/kernel/sched_debug.c
> > +++ current/kernel/sched_debug.c
> > @@ -239,11 +239,7 @@ static int
> > root_user_share_read_proc(char *page, char **start, off_t off, int count,
> > int *eof, void *data)
> > {
> > - int len;
> > -
> > - len = sprintf(page, "%d\n", init_task_grp_load);
> > -
> > - return len;
> > + return sprintf(page, "%d\n", init_task_grp_load);
> > }
> >
> > static int
> > @@ -297,7 +293,7 @@ static int __init init_sched_debug_procf
> > pe->proc_fops = &sched_debug_fops;
> >
> > #ifdef CONFIG_FAIR_USER_SCHED
> > - pe = create_proc_entry("root_user_share", 0644, NULL);
> > + pe = create_proc_entry("root_user_cpu_share", 0644, NULL);
> > if (!pe)
> > return -ENOMEM;
>
> What about moving this debug stuff under debugfs?
> Please consider using the functions in <linux/debugfs.h> .
> They compile into nothing, if DEBUGFS is not compiled in
> and have already useful functions for reading/writing integers
> and booleans.
>
Hi Ingo,

This is not debug stuff. It is a tunable to give the root user more
weight with respect to the other users.

--
regards,
Dhaval

2007-09-27 07:56:58

by Ingo Molnar

[permalink] [raw]
Subject: Re: [git] CFS-devel, latest code


* Dmitry Adamushko <[email protected]> wrote:

> humm... I think, it'd be safer to have something like the following
> change in place.
>
> The thing is that __pick_next_entity() must never be called when
> first_fair(cfs_rq) == NULL. It wouldn't be a problem, should
> 'run_node' be the very first field of 'struct sched_entity' (and it's
> the second).
>
> The 'nr_running != 0' check is _not_ enough, due to the fact that
> 'current' is not within the tree. Generic paths are ok (e.g.
> schedule() as put_prev_task() is called previously)... I'm more
> worried about e.g. migration_call() -> CPU_DEAD_FROZEN ->
> migrate_dead_tasks()... if 'current' == rq->idle, no problems.. if
> it's one of the SCHED_NORMAL tasks (or imagine, some other use-cases
> in the future -- i.e. we should not make outer world dependent on
> internal details of sched_fair class) -- it may be "Houston, we've got
> a problem" case.
>
> it's +16 bytes to the ".text". Another variant is to make 'run_node'
> the first data member of 'struct sched_entity' but an additional check
> (se ! = NULL) is still needed in pick_next_entity().

looks good to me - and we already have something similar in sched_rt.c.
I've added your patch to the queue. (Can i add your SoB line too?)

Ingo

2007-09-30 19:13:28

by Dmitry Adamushko

[permalink] [raw]
Subject: Re: [git] CFS-devel, latest code



here is a few patches on top of the recent 'sched-dev':

(1) [ proposal ] make timeslices of SCHED_RR tasks constant and not
dependent on task's static_prio;

(2) [ cleanup ] calc_weighted() is obsolete, remove it;

(3) [ refactoring ] make dequeue_entity() / enqueue_entity()
and update_stats_dequeue() / update_stats_enqueue() look similar, structure-wise.

-----------------------------------

(1)

- make timeslices of SCHED_RR tasks constant and not
dependent on task's static_prio [1] ;
- remove obsolete code (timeslice related bits);
- make sched_rr_get_interval() return something more
meaningful [2] for SCHED_OTHER tasks.

[1] according to the following link, the current behavior is not compliant
with SUSv3 (not sure though, what is the reference for us :-)
http://lkml.org/lkml/2007/3/7/656

[2] the interval is dynamic and can be depicted as follows "should a
task be one of the runnable tasks at this particular moment, it would
expect to run for this interval of time before being re-scheduled by the
scheduler tick".

all in all, the code doesn't increase:

text data bss dec hex filename
46585 5102 40 51727 ca0f ../build/kernel/sched.o.before
46553 5102 40 51695 c9ef ../build/kernel/sched.o

yeah, this seems to require task_rq_lock/unlock() but this is not a hot
path.

what do you think?

(compiles well, not functionally tested yet)

Almost-Signed-off-by: Dmitry Adamushko <[email protected]>

---
diff --git a/kernel/sched.c b/kernel/sched.c
index 0abed89..eba7827 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -104,11 +104,9 @@ unsigned long long __attribute__((weak)) sched_clock(void)
/*
* These are the 'tuning knobs' of the scheduler:
*
- * Minimum timeslice is 5 msecs (or 1 jiffy, whichever is larger),
- * default timeslice is 100 msecs, maximum timeslice is 800 msecs.
+ * default timeslice is 100 msecs (used only for SCHED_RR tasks).
* Timeslices get refilled after they expire.
*/
-#define MIN_TIMESLICE max(5 * HZ / 1000, 1)
#define DEF_TIMESLICE (100 * HZ / 1000)

#ifdef CONFIG_SMP
@@ -132,24 +130,6 @@ static inline void sg_inc_cpu_power(struct sched_group *sg, u32 val)
}
#endif

-#define SCALE_PRIO(x, prio) \
- max(x * (MAX_PRIO - prio) / (MAX_USER_PRIO / 2), MIN_TIMESLICE)
-
-/*
- * static_prio_timeslice() scales user-nice values [ -20 ... 0 ... 19 ]
- * to time slice values: [800ms ... 100ms ... 5ms]
- */
-static unsigned int static_prio_timeslice(int static_prio)
-{
- if (static_prio == NICE_TO_PRIO(19))
- return 1;
-
- if (static_prio < NICE_TO_PRIO(0))
- return SCALE_PRIO(DEF_TIMESLICE * 4, static_prio);
- else
- return SCALE_PRIO(DEF_TIMESLICE, static_prio);
-}
-
static inline int rt_policy(int policy)
{
if (unlikely(policy == SCHED_FIFO) || unlikely(policy == SCHED_RR))
@@ -4759,6 +4739,7 @@ asmlinkage
long sys_sched_rr_get_interval(pid_t pid, struct timespec __user *interval)
{
struct task_struct *p;
+ unsigned int time_slice;
int retval = -EINVAL;
struct timespec t;

@@ -4775,9 +4756,20 @@ long sys_sched_rr_get_interval(pid_t pid, struct timespec __user *interval)
if (retval)
goto out_unlock;

- jiffies_to_timespec(p->policy == SCHED_FIFO ?
- 0 : static_prio_timeslice(p->static_prio), &t);
+ if (p->policy == SCHED_FIFO)
+ time_slice = 0;
+ else if (p->policy == SCHED_RR)
+ time_slice = DEF_TIMESLICE;
+ else {
+ unsigned long flags;
+ struct rq *rq;
+
+ rq = task_rq_lock(p, &flags);
+ time_slice = sched_slice(&rq->cfs, &p->se);
+ task_rq_unlock(rq, &flags);
+ }
read_unlock(&tasklist_lock);
+ jiffies_to_timespec(time_slice, &t);
retval = copy_to_user(interval, &t, sizeof(t)) ? -EFAULT : 0;
out_nounlock:
return retval;
diff --git a/kernel/sched_rt.c b/kernel/sched_rt.c
index dbe4d8c..5c52881 100644
--- a/kernel/sched_rt.c
+++ b/kernel/sched_rt.c
@@ -206,7 +206,7 @@ static void task_tick_rt(struct rq *rq, struct task_struct *p)
if (--p->time_slice)
return;

- p->time_slice = static_prio_timeslice(p->static_prio);
+ p->time_slice = DEF_TIMESLICE;

/*
* Requeue to the end of queue if we are not the only element

---

2007-09-30 19:15:49

by Dmitry Adamushko

[permalink] [raw]
Subject: Re: [git] CFS-devel, latest code



remove obsolete code -- calc_weighted()


Signed-off-by: Dmitry Adamushko <[email protected]>


---
diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index fe4003d..2674e27 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -342,17 +342,6 @@ update_stats_wait_start(struct cfs_rq *cfs_rq,
struct sched_entity *se)
schedstat_set(se->wait_start, rq_of(cfs_rq)->clock);
}

-static inline unsigned long
-calc_weighted(unsigned long delta, struct sched_entity *se)
-{
- unsigned long weight = se->load.weight;
-
- if (unlikely(weight != NICE_0_LOAD))
- return (u64)delta * se->load.weight >> NICE_0_SHIFT;
- else
- return delta;
-}
-
/*
* Task is being enqueued - update stats:
*/

---

2007-09-30 19:18:29

by Dmitry Adamushko

[permalink] [raw]
Subject: Re: [git] CFS-devel, latest code


and this one,

make dequeue_entity() / enqueue_entity() and update_stats_dequeue() /
update_stats_enqueue() look similar, structure-wise.

zero effect, functionally-wise.

Signed-off-by: Dmitry Adamushko <[email protected]>

---
diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index 2674e27..ed75a04 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -366,7 +366,6 @@ update_stats_wait_end(struct cfs_rq *cfs_rq, struct sched_entity *se)
static inline void
update_stats_dequeue(struct cfs_rq *cfs_rq, struct sched_entity *se)
{
- update_curr(cfs_rq);
/*
* Mark the end of the wait period if dequeueing a
* waiting task:
@@ -493,7 +492,7 @@ static void
enqueue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int wakeup)
{
/*
- * Update the fair clock.
+ * Update run-time statistics of the 'current'.
*/
update_curr(cfs_rq);

@@ -512,6 +511,11 @@ enqueue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int wakeup)
static void
dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int sleep)
{
+ /*
+ * Update run-time statistics of the 'current'.
+ */
+ update_curr(cfs_rq);
+
update_stats_dequeue(cfs_rq, se);
if (sleep) {
#ifdef CONFIG_SCHEDSTATS
@@ -775,8 +779,7 @@ static void yield_task_fair(struct rq *rq)
if (likely(!sysctl_sched_compat_yield)) {
__update_rq_clock(rq);
/*
- * Dequeue and enqueue the task to update its
- * position within the tree:
+ * Update run-time statistics of the 'current'.
*/
update_curr(cfs_rq);


---

2007-10-01 05:57:38

by Mike Galbraith

[permalink] [raw]
Subject: Re: [git] CFS-devel, latest code

On Sun, 2007-09-30 at 21:15 +0200, Dmitry Adamushko wrote:
>
> remove obsolete code -- calc_weighted()
>

Here's another piece of low hanging obsolete fruit.

Remove obsolete TASK_NONINTERACTIVE.

Signed-off-by: Mike Galbraith <[email protected]>

diff -uprNX /root/dontdiff git/linux-2.6.sched-devel/fs/pipe.c linux-2.6.23-rc8.d/fs/pipe.c
--- git/linux-2.6.sched-devel/fs/pipe.c 2007-10-01 06:59:51.000000000 +0200
+++ linux-2.6.23-rc8.d/fs/pipe.c 2007-10-01 07:41:17.000000000 +0200
@@ -45,8 +45,7 @@ void pipe_wait(struct pipe_inode_info *p
* Pipes are system-local resources, so sleeping on them
* is considered a noninteractive wait:
*/
- prepare_to_wait(&pipe->wait, &wait,
- TASK_INTERRUPTIBLE | TASK_NONINTERACTIVE);
+ prepare_to_wait(&pipe->wait, &wait, TASK_INTERRUPTIBLE);
if (pipe->inode)
mutex_unlock(&pipe->inode->i_mutex);
schedule();
diff -uprNX /root/dontdiff git/linux-2.6.sched-devel/include/linux/sched.h linux-2.6.23-rc8.d/include/linux/sched.h
--- git/linux-2.6.sched-devel/include/linux/sched.h 2007-10-01 07:00:25.000000000 +0200
+++ linux-2.6.23-rc8.d/include/linux/sched.h 2007-10-01 07:25:25.000000000 +0200
@@ -174,8 +174,7 @@ print_cfs_rq(struct seq_file *m, int cpu
#define EXIT_ZOMBIE 16
#define EXIT_DEAD 32
/* in tsk->state again */
-#define TASK_NONINTERACTIVE 64
-#define TASK_DEAD 128
+#define TASK_DEAD 64

#define __set_task_state(tsk, state_value) \
do { (tsk)->state = (state_value); } while (0)


2007-10-01 05:58:39

by Ingo Molnar

[permalink] [raw]
Subject: Re: [git] CFS-devel, latest code


* Mike Galbraith <[email protected]> wrote:

> On Sun, 2007-09-30 at 21:15 +0200, Dmitry Adamushko wrote:
> >
> > remove obsolete code -- calc_weighted()
> >
>
> Here's another piece of low hanging obsolete fruit.
>
> Remove obsolete TASK_NONINTERACTIVE.
>
> Signed-off-by: Mike Galbraith <[email protected]>

thanks, applied.

Ingo

2007-10-01 06:11:46

by Ingo Molnar

[permalink] [raw]
Subject: Re: [git] CFS-devel, latest code


* Dmitry Adamushko <[email protected]> wrote:

> here is a few patches on top of the recent 'sched-dev':
>
> (1) [ proposal ] make timeslices of SCHED_RR tasks constant and not
> dependent on task's static_prio;
>
> (2) [ cleanup ] calc_weighted() is obsolete, remove it;
>
> (3) [ refactoring ] make dequeue_entity() / enqueue_entity()
> and update_stats_dequeue() / update_stats_enqueue() look similar, structure-wise.

thanks - i've applied all 3 patches of yours.

> (compiles well, not functionally tested yet)

(it boots fine here and SCHED_RR seems to work - but i've not tested
getinterval.)

Ingo

2007-10-02 19:49:40

by Dmitry Adamushko

[permalink] [raw]
Subject: Re: [git] CFS-devel, latest code


On 01/10/2007, Ingo Molnar <[email protected]> wrote:
>
> * Dmitry Adamushko <[email protected]> wrote:
>
> > here is a few patches on top of the recent 'sched-dev':
> >
> > (1) [ proposal ] make timeslices of SCHED_RR tasks constant and not
> > dependent on task's static_prio;
> >
> > (2) [ cleanup ] calc_weighted() is obsolete, remove it;
> >
> > (3) [ refactoring ] make dequeue_entity() / enqueue_entity()
> > and update_stats_dequeue() / update_stats_enqueue() look similar, structure-wise.
>
> thanks - i've applied all 3 patches of yours.
>
> > (compiles well, not functionally tested yet)
>
> (it boots fine here and SCHED_RR seems to work - but i've not tested
> getinterval.)

/me is guilty... it was a bit broken :-/ here is the fix.

results:

(SCHED_FIFO)

dimm@earth:~/storage/prog$ sudo chrt -f 10 ./rr_interval
time_slice: 0 : 0

(SCHED_RR)

dimm@earth:~/storage/prog$ sudo chrt 10 ./rr_interval
time_slice: 0 : 99984800

(SCHED_NORMAL)

dimm@earth:~/storage/prog$ ./rr_interval
time_slice: 0 : 19996960

(SCHED_NORMAL + a cpu_hog of similar 'weight' on the same CPU --- so should be a half of the previous result)

dimm@earth:~/storage/prog$ taskset 1 ./rr_interval
time_slice: 0 : 9998480


Signed-off-by: Dmitry Adamushko <[email protected]>

---
diff --git a/kernel/sched.c b/kernel/sched.c
index d835cd2..cce22ff 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -4745,11 +4745,12 @@ long sys_sched_rr_get_interval(pid_t pid, struct timespec __user *interval)
else if (p->policy == SCHED_RR)
time_slice = DEF_TIMESLICE;
else {
+ struct sched_entity *se = &p->se;
unsigned long flags;
struct rq *rq;

rq = task_rq_lock(p, &flags);
- time_slice = sched_slice(&rq->cfs, &p->se);
+ time_slice = NS_TO_JIFFIES(sched_slice(cfs_rq_of(se), se));
task_rq_unlock(rq, &flags);
}
read_unlock(&tasklist_lock);

---



2007-10-02 19:59:18

by Dmitry Adamushko

[permalink] [raw]
Subject: Re: [git] CFS-devel, latest code


The following patch (sched: disable sleeper_fairness on SCHED_BATCH)
seems to break GROUP_SCHED. Although, it may be
'oops'-less due to the possibility of 'p' being always a valid
address.


Signed-off-by: Dmitry Adamushko <[email protected]>

---
diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index 8727d17..a379456 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -473,9 +473,8 @@ place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
vruntime += sched_vslice_add(cfs_rq, se);

if (!initial) {
- struct task_struct *p = container_of(se, struct task_struct, se);
-
- if (sched_feat(NEW_FAIR_SLEEPERS) && p->policy != SCHED_BATCH)
+ if (sched_feat(NEW_FAIR_SLEEPERS) && entity_is_task(se) &&
+ task_of(se)->policy != SCHED_BATCH)
vruntime -= sysctl_sched_latency;

vruntime = max_t(s64, vruntime, se->vruntime);

---


2007-10-03 04:04:40

by Srivatsa Vaddagiri

[permalink] [raw]
Subject: Re: [git] CFS-devel, latest code

On Tue, Oct 02, 2007 at 09:59:04PM +0200, Dmitry Adamushko wrote:
> The following patch (sched: disable sleeper_fairness on SCHED_BATCH)
> seems to break GROUP_SCHED. Although, it may be
> 'oops'-less due to the possibility of 'p' being always a valid
> address.

Thanks for catching it! Patch below looks good to me.

Acked-by : Srivatsa Vaddagiri <[email protected]>

> Signed-off-by: Dmitry Adamushko <[email protected]>
>
> ---
> diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
> index 8727d17..a379456 100644
> --- a/kernel/sched_fair.c
> +++ b/kernel/sched_fair.c
> @@ -473,9 +473,8 @@ place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
> vruntime += sched_vslice_add(cfs_rq, se);
>
> if (!initial) {
> - struct task_struct *p = container_of(se, struct task_struct, se);
> -
> - if (sched_feat(NEW_FAIR_SLEEPERS) && p->policy != SCHED_BATCH)
> + if (sched_feat(NEW_FAIR_SLEEPERS) && entity_is_task(se) &&
> + task_of(se)->policy != SCHED_BATCH)
> vruntime -= sysctl_sched_latency;
>
> vruntime = max_t(s64, vruntime, se->vruntime);
>
> ---
>

--
Regards,
vatsa

2007-10-04 07:41:20

by Ingo Molnar

[permalink] [raw]
Subject: Re: [git] CFS-devel, latest code


* Dmitry Adamushko <[email protected]> wrote:

> The following patch (sched: disable sleeper_fairness on SCHED_BATCH)
> seems to break GROUP_SCHED. Although, it may be 'oops'-less due to the
> possibility of 'p' being always a valid address.

thanks, applied.

Ingo

2007-10-04 07:41:56

by Ingo Molnar

[permalink] [raw]
Subject: Re: [git] CFS-devel, latest code


* Dmitry Adamushko <[email protected]> wrote:

> results:
>
> (SCHED_FIFO)
>
> dimm@earth:~/storage/prog$ sudo chrt -f 10 ./rr_interval
> time_slice: 0 : 0
>
> (SCHED_RR)
>
> dimm@earth:~/storage/prog$ sudo chrt 10 ./rr_interval
> time_slice: 0 : 99984800
>
> (SCHED_NORMAL)
>
> dimm@earth:~/storage/prog$ ./rr_interval
> time_slice: 0 : 19996960
>
> (SCHED_NORMAL + a cpu_hog of similar 'weight' on the same CPU --- so should be a half of the previous result)
>
> dimm@earth:~/storage/prog$ taskset 1 ./rr_interval
> time_slice: 0 : 9998480

thanks, applied.

Ingo