Hello,
We will need 64 bit counters of the slow context switches,
one counter for each new created task (e.g. u64 ctxt_switch_counts;)
We will only need them during the lifetime of the tasks.
To increment by +1 the task's 64 bit counter (it's fast)
each one slow context switch.
*kernel/sched.c:
void context_switch(...) { ... } # incr. +1 here.
void wake_up_new_task(...) { ... } # ->ctxt_switch_counts = 0ULL;
*include/linux/sched.h:
struct task_struct { ... } # add 64-bit (u64 ctxt_switch_counts;) here.
Please, do it and we can do it better than CFS fair scheduler.
I will explain your later why of it.
O:)
On Sun, 24 Feb 2008 04:08:38 +0100
"J.C. Pizarro" <[email protected]> wrote:
> We will need 64 bit counters of the slow context switches,
> one counter for each new created task (e.g. u64 ctxt_switch_counts;)
Please send a patch ...
> I will explain your later why of it.
... and explain exactly why the kernel needs this extra code.
Asking somebody else to add functionality (or is it bloat? we really
cannot tell) to the kernel for you, and not even explaining why is
not going to convince anyone.
For more hints on how to get functionality into the Linux kernel,
please read:
http://kernelnewbies.org/UpstreamMerge
--
All rights reversed.
On 2008/2/24, Rik van Riel <[email protected]> wrote:
> On Sun, 24 Feb 2008 04:08:38 +0100
> "J.C. Pizarro" <[email protected]> wrote:
>
> > We will need 64 bit counters of the slow context switches,
> > one counter for each new created task (e.g. u64 ctxt_switch_counts;)
>
>
> Please send a patch ...
diff -ur linux-2.6_git-20080224.orig/include/linux/sched.h
linux-2.6_git-20080224/include/linux/sched.h
--- linux-2.6_git-20080224.orig/include/linux/sched.h 2008-02-24
01:04:18.000000000 +0100
+++ linux-2.6_git-20080224/include/linux/sched.h 2008-02-24
04:50:18.000000000 +0100
@@ -1007,6 +1007,12 @@
struct hlist_head preempt_notifiers;
#endif
+ unsigned long long ctxt_switch_counts; /* 64-bit switches' count */
+ /* ToDo:
+ * To implement a poller/clock for CPU-scheduler that only reads
+ * these counts of context switches of the runqueue's tasks.
+ * No problem if this poller/clock is not implemented. */
+
/*
* fpu_counter contains the number of consecutive context switches
* that the FPU is used. If this is over a threshold, the lazy fpu
diff -ur linux-2.6_git-20080224.orig/kernel/sched.c
linux-2.6_git-20080224/kernel/sched.c
--- linux-2.6_git-20080224.orig/kernel/sched.c 2008-02-24
01:04:19.000000000 +0100
+++ linux-2.6_git-20080224/kernel/sched.c 2008-02-24
04:33:57.000000000 +0100
@@ -2008,6 +2008,8 @@
BUG_ON(p->state != TASK_RUNNING);
update_rq_clock(rq);
+ p->ctxt_switch_counts = 0ULL; /* task's 64-bit counter inited 0 */
+
p->prio = effective_prio(p);
if (!p->sched_class->task_new || !current->se.on_rq) {
@@ -2189,8 +2191,14 @@
context_switch(struct rq *rq, struct task_struct *prev,
struct task_struct *next)
{
+ unsigned long flags;
+ struct rq *rq_prev;
struct mm_struct *mm, *oldmm;
+ rq_prev = task_rq_lock(prev, &flags); /* locking the prev task */
+ prev->ctxt_switch_counts++; /* incr.+1 the task's 64-bit counter */
+ task_rq_unlock(rq_prev, &flags); /* unlocking the prev task */
+
prepare_task_switch(rq, prev, next);
mm = next->mm;
oldmm = prev->active_mm;
> > I will explain your later why of it.
>
>
> ... and explain exactly why the kernel needs this extra code.
One reason: for the objective of gain interactivity, it's an issue that
CFS fair scheduler lacks it.
o:)
On Sun, 24 Feb 2008 05:08:46 +0100
"J.C. Pizarro" <[email protected]> wrote:
OK, one last reply on the (overly optimistic?) assumption that you are not a troll.
> +++ linux-2.6_git-20080224/include/linux/sched.h 2008-02-24
> 04:50:18.000000000 +0100
> @@ -1007,6 +1007,12 @@
> struct hlist_head preempt_notifiers;
> #endif
>
> + unsigned long long ctxt_switch_counts; /* 64-bit switches' count */
> + /* ToDo:
> + * To implement a poller/clock for CPU-scheduler that only reads
> + * these counts of context switches of the runqueue's tasks.
> + * No problem if this poller/clock is not implemented. */
So you're introducing a statistic, but have not yet written any code
that uses it?
> + p->ctxt_switch_counts = 0ULL; /* task's 64-bit counter inited 0 */
Because we can all read C, there is no need to tell people in comments
what the code does. Comments are there to explain why the code does
things, if an explanation is needed.
> > > I will explain your later why of it.
> >
> >
> > ... and explain exactly why the kernel needs this extra code.
>
> One reason: for the objective of gain interactivity, it's an issue that
> CFS fair scheduler lacks it.
Your patch does not actually help interactivity, because all it does
is add an irq spinlock in a hot path (bad idea) and a counter which
nothing reads.
--
All rights reversed.
On Sun, 2008-02-24 at 05:08 +0100, J.C. Pizarro wrote:
> One reason: for the objective of gain interactivity, it's an issue that
> CFS fair scheduler lacks it.
A bug report would be a much better first step toward resolution of any
interactivity issues you're seeing than posts which do nothing but
suggest that there may be a problem.
First define the problem, _then_ fix it.
-Mike
Good morning :)
On 2008/2/24, Rik van Riel <[email protected]> wrote:
> OK, one last reply on the (overly optimistic?) assumption that you are not a troll.
> > +++ linux-2.6_git-20080224/include/linux/sched.h 2008-02-24
> > 04:50:18.000000000 +0100
> > @@ -1007,6 +1007,12 @@
> > struct hlist_head preempt_notifiers;
> > #endif
> >
> > + unsigned long long ctxt_switch_counts; /* 64-bit switches' count */
> > + /* ToDo:
> > + * To implement a poller/clock for CPU-scheduler that only reads
> > + * these counts of context switches of the runqueue's tasks.
> > + * No problem if this poller/clock is not implemented. */
>
> So you're introducing a statistic, but have not yet written any code
> that uses it?
It's statistic, yes, but it's a very important parameter for the CPU-scheduler.
The CPU-scheduler will know the number of context switches of each task
before of to take a blind decision into infinitum!.
Statistically, there are tasks X that have higher context switches and
tasks Y that have lower context switches in the last sized interval with the
historical formula "(alpha-1)*prev + alpha*current" 0 < alpha < 1.
(measure this value V as a velocity of number of ctxt-switches/second too)
Put more weight to X than to Y for more interactivity that X want.
(X will have more higher V and Y more lower V).
With an exception for avoid the eternal humble, to do sin(x) behaviour
after of a long period of humble (later to modify the weights).
The missing code has to be implemented between everybodies because
1. Users wann't lose interactivity in overloaded CPU.
2. There are much code of CPU-schedulers bad organizated that i wann't
touch it.
> > + p->ctxt_switch_counts = 0ULL; /* task's 64-bit counter inited 0 */
>
> Because we can all read C, there is no need to tell people in comments
> what the code does. Comments are there to explain why the code does
> things, if an explanation is needed.
OK.
> > > > I will explain your later why of it.
> > >
> > > ... and explain exactly why the kernel needs this extra code.
> >
> > One reason: for the objective of gain interactivity, it's an issue that
> > CFS fair scheduler lacks it.
> Your patch does not actually help interactivity, because all it does
> is add an irq spinlock in a hot path (bad idea) and a counter which
> nothing reads.
Then remove the lock/unlock of the task that i'd put it,
i'm not sure if it's secure because i didn't read all the control of the road.
On 2008/2/24, Mike Galbraith <[email protected]> wrote:
> > One reason: for the objective of gain interactivity, it's an issue that
> > CFS fair scheduler lacks it.
>
> A bug report would be a much better first step toward resolution of any
> interactivity issues you're seeing than posts which do nothing but
> suggest that there may be a problem.
>
> First define the problem, _then_ fix it.
It's blind eternal problem in overloaded CPU scenario in the desktops.
On Sun, 2008-02-24 at 14:12 +0100, J.C. Pizarro wrote:
> On 2008/2/24, Mike Galbraith <[email protected]> wrote:
> > > One reason: for the objective of gain interactivity, it's an issue that
> > > CFS fair scheduler lacks it.
> >
> > A bug report would be a much better first step toward resolution of any
> > interactivity issues you're seeing than posts which do nothing but
> > suggest that there may be a problem.
> >
> > First define the problem, _then_ fix it.
>
> It's blind eternal problem in overloaded CPU scenario in the desktops.
Define the problem.. please? I won't repeat myself again.
-Mike
On Sun, 24 Feb 2008 14:12:47 +0100 "J.C. Pizarro" <[email protected]> wrote:
> It's statistic, yes, but it's a very important parameter for the CPU-scheduler.
> The CPU-scheduler will know the number of context switches of each task
> before of to take a blind decision into infinitum!.
We already have these:
unsigned long nvcsw, nivcsw; /* context switch counts */
in the task_struct.
On 2008/2/25, Andrew Morton <[email protected]> wrote:
> On Sun, 24 Feb 2008 14:12:47 +0100 "J.C. Pizarro" <[email protected]> wrote:
>
> > It's statistic, yes, but it's a very important parameter for the CPU-scheduler.
> > The CPU-scheduler will know the number of context switches of each task
> > before of to take a blind decision into infinitum!.
>
>
> We already have these:
>
> unsigned long nvcsw, nivcsw; /* context switch counts */
>
> in the task_struct.
1. They use "unsigned long" instead "unsigned long long".
2. They use "= 0;" instead of "= 0ULL";
3. They don't use ++ (incr. by one per ctxt-switch).
4. I don't like the separation of voluntary and involuntary ctxt-switches,
and i don't understand the utility of this separation.
The tsk->nvcsw & tsk->nivcsw mean different to i had proposed.
It's simple, when calling to function kernel/sched.c:context_switch(..)
to do ++, but they don't do it.
I propose you
1. unsigned long long tsk->ncsw = 0ULL; and tsk->ncsw++;
2. unsigned long long tsk->last_registered_ncsw = tsk->ncsw; when it's polling.
3. long tsk->vcsw = ( tsk->ncsw - tsk->last_registered_ncsw ) / ( t2 - t1 )
/* velocity of task (ctxt-switches per second), (t1 != t2 in seconds
for no zerodiv)
4. long tsk->last_registered_vcsw = tsk->vcsw;
5. long tsk->normalized_vcsw =
(1 - alpha)*tsk->last_registered_vcsw + alpha*tsk->vcsw; /* 0<alpha<1 */
Sincerely yours ;)
On 2/26/08, J.C. Pizarro <[email protected]> wrote:
> On 2008/2/25, Andrew Morton <[email protected]> wrote:
> > On Sun, 24 Feb 2008 14:12:47 +0100 "J.C. Pizarro" <[email protected]>
> wrote:
> >
> > > It's statistic, yes, but it's a very important parameter for the
> CPU-scheduler.
> > > The CPU-scheduler will know the number of context switches of each task
> > > before of to take a blind decision into infinitum!.
> >
> >
> > We already have these:
> >
> > unsigned long nvcsw, nivcsw; /* context switch counts */
> >
> > in the task_struct.
>
> 1. They use "unsigned long" instead "unsigned long long".
> 2. They use "= 0;" instead of "= 0ULL";
Very funny.
> 3. They don't use ++ (incr. by one per ctxt-switch).
No they do, read schedule() already.
> 4. I don't like the separation of voluntary and involuntary ctxt-switches,
> and i don't understand the utility of this separation.
Ah, that's why you don't like it.
> The tsk->nvcsw & tsk->nivcsw mean different to i had proposed.
>
> It's simple, when calling to function kernel/sched.c:context_switch(..)
> to do ++, but they don't do it.
>
> I propose you
> 1. unsigned long long tsk->ncsw = 0ULL; and tsk->ncsw++;
> 2. unsigned long long tsk->last_registered_ncsw = tsk->ncsw; when it's
> polling.
> 3. long tsk->vcsw = ( tsk->ncsw - tsk->last_registered_ncsw ) / ( t2 - t1 )
> /* velocity of task (ctxt-switches per second), (t1 != t2 in seconds
> for no zerodiv)
> 4. long tsk->last_registered_vcsw = tsk->vcsw;
> 5. long tsk->normalized_vcsw =
> (1 - alpha)*tsk->last_registered_vcsw + alpha*tsk->vcsw; /* 0<alpha<1
> */
6. Profit.
As I understood the idea of CFS, all interactivity heuristics were bitbucketed,
so you'll add them back (you won't, of course, because you can't be arsed
to send a patch)
So best course of action it to describe workload and setup (distro, relevant
.config items and so on.) on which CFS behaves poorly.