Declare nr_uninterruptible as a signed long to avoid garbage values
seen in cat /proc/sched_debug when a task is moved to the run queue of
a newly online core. This is part of a global counter where only the
total sum over all CPUs matters.
Signed-off-by: Diwakar Tundlam <[email protected]>
---
kernel/sched/core.c | 7 ++++---
kernel/sched/sched.h | 2 +-
2 files changed, 5 insertions(+), 4 deletions(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 8d5eef6..7a64b5b 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2114,7 +2114,8 @@ unsigned long nr_running(void)
unsigned long nr_uninterruptible(void)
{
- unsigned long i, sum = 0;
+ unsigned long i;
+ long sum = 0;
for_each_possible_cpu(i)
sum += cpu_rq(i)->nr_uninterruptible;
@@ -2123,7 +2124,7 @@ unsigned long nr_uninterruptible(void)
* Since we read the counters lockless, it might be slightly
* inaccurate. Do not allow it to go below zero though:
*/
- if (unlikely((long)sum < 0))
+ if (unlikely(sum < 0))
sum = 0;
return sum;
@@ -2174,7 +2175,7 @@ static long calc_load_fold_active(struct rq *this_rq)
long nr_active, delta = 0;
nr_active = this_rq->nr_running;
- nr_active += (long) this_rq->nr_uninterruptible;
+ nr_active += this_rq->nr_uninterruptible;
if (nr_active != this_rq->calc_load_active) {
delta = nr_active - this_rq->calc_load_active;
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index fb3acba..2668b07 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -385,7 +385,7 @@ struct rq {
* one CPU and if it got migrated afterwards it may decrease
* it on another CPU. Always updated under the runqueue lock:
*/
- unsigned long nr_uninterruptible;
+ long nr_uninterruptible;
struct task_struct *curr, *idle, *stop;
unsigned long next_balance;
--
1.7.4.1
On Tue, 2012-05-08 at 14:39 -0700, Diwakar Tundlam wrote:
> Declare nr_uninterruptible as a signed long to avoid garbage values
> seen in cat /proc/sched_debug when a task is moved to the run queue of
> a newly online core. This is part of a global counter where only the
> total sum over all CPUs matters.
Its late here, but do explain how any of this makes any difference what
so ever? Since all we do with that field is add/sub the whole sign issue
shouldn't matter one way or the other.
Sorry to bug you when it is late for you..
You're right, there is no real difference at all.
Only cosmetic difference when you look at the output of cat /proc/sched_debug.
But I suddenly realized maybe the increment/decrement of nr_interruptible is reversed.
Maybe that's the source of the problem: decrement in activate task and increment in deactivate task !!
Code snip:
/*
* activate_task - move a task to the runqueue.
*/
static void activate_task(struct rq *rq, struct task_struct *p, int flags)
{
if (task_contributes_to_load(p))
rq->nr_uninterruptible--; <<<<< why decrement in activate task
enqueue_task(rq, p, flags);
inc_nr_running(rq);
}
/*
* deactivate_task - remove a task from the runqueue.
*/
static void deactivate_task(struct rq *rq, struct task_struct *p, int flags)
{
if (task_contributes_to_load(p))
rq->nr_uninterruptible++; <<<<< why increment in deactivate task
dequeue_task(rq, p, flags);
dec_nr_running(rq);
}
Thanks,
--Diwakar.
-----Original Message-----
From: Peter Zijlstra [mailto:[email protected]]
Sent: Tuesday, May 08, 2012 2:57 PM
To: Diwakar Tundlam
Cc: 'Ingo Molnar'; 'David Rientjes'; '[email protected]'; Peter De Schrijver
Subject: Re: [PATCH] sched: Make nr_uninterruptible count a signed value
On Tue, 2012-05-08 at 14:39 -0700, Diwakar Tundlam wrote:
> Declare nr_uninterruptible as a signed long to avoid garbage values
> seen in cat /proc/sched_debug when a task is moved to the run queue of
> a newly online core. This is part of a global counter where only the
> total sum over all CPUs matters.
Its late here, but do explain how any of this makes any difference what so ever? Since all we do with that field is add/sub the whole sign issue shouldn't matter one way or the other.
On Tue, 2012-05-08 at 15:14 -0700, Diwakar Tundlam wrote:
> Sorry to bug you when it is late for you..
>
Nah, I'm the idiot still behind the screen after midnight, its just the
brain that's slightly slower and needs more hints.
> You're right, there is no real difference at all.
> Only cosmetic difference when you look at the output of
> cat /proc/sched_debug.
Not sure I see that.. the printf is still using %Ld (signed) so the
output shouldn't matter regardless of if the variable is unsigned long
or long.
>
> But I suddenly realized maybe the increment/decrement of
> nr_interruptible is reversed.
> Maybe that's the source of the problem: decrement in activate task and
> increment in deactivate task !!
No that's right. nr_uninterruptible counts the number of tasks in
uninterruptible sleep, so deactivate_task puts a task to sleep, so we
need to increment the number of sleeping tasks, activate_task wakes a
task up so we need to decrement the number of sleeping tasks.
I think the problem you're having is that we don't match the cpu where
we inc and dec the counter, and that's fully on purpose since its rather
expensive -- it would require atomics.
On Wed, 2012-05-09 at 00:27 +0200, Peter Zijlstra wrote:
> On Tue, 2012-05-08 at 15:14 -0700, Diwakar Tundlam wrote:
> > Sorry to bug you when it is late for you..
> >
> Nah, I'm the idiot still behind the screen after midnight, its just the
> brain that's slightly slower and needs more hints.
>
> > You're right, there is no real difference at all.
> > Only cosmetic difference when you look at the output of
> > cat /proc/sched_debug.
>
> Not sure I see that.. the printf is still using %Ld (signed) so the
> output shouldn't matter regardless of if the variable is unsigned long
> or long.
> >
> > But I suddenly realized maybe the increment/decrement of
> > nr_interruptible is reversed.
> > Maybe that's the source of the problem: decrement in activate task and
> > increment in deactivate task !!
>
> No that's right. nr_uninterruptible counts the number of tasks in
> uninterruptible sleep, so deactivate_task puts a task to sleep, so we
> need to increment the number of sleeping tasks, activate_task wakes a
> task up so we need to decrement the number of sleeping tasks.
>
> I think the problem you're having is that we don't match the cpu where
> we inc and dec the counter, and that's fully on purpose since its rather
> expensive -- it would require atomics.
>
FWIW the way to properly read the sched_debug output is something like:
# grep nr_uninterruptible /proc/sched_debug
.nr_uninterruptible : -1305
.nr_uninterruptible : 336
.nr_uninterruptible : -229
.nr_uninterruptible : 276
.nr_uninterruptible : 105
.nr_uninterruptible : 157
.nr_uninterruptible : -2782
.nr_uninterruptible : 325
.nr_uninterruptible : -471
.nr_uninterruptible : 9
.nr_uninterruptible : 205
.nr_uninterruptible : 88
.nr_uninterruptible : 7
.nr_uninterruptible : 912
.nr_uninterruptible : 188
.nr_uninterruptible : 66
.nr_uninterruptible : 87
.nr_uninterruptible : 45
.nr_uninterruptible : 194
.nr_uninterruptible : 1178
.nr_uninterruptible : 185
.nr_uninterruptible : 143
.nr_uninterruptible : 136
.nr_uninterruptible : 145
# awk '/nr_uninterruptible/ {t += $3} END {print t}' /proc/sched_debug
0
The per-cpu value is meaningless, only the sum over all cpus is a
meaningful number.
> No that's right. nr_uninterruptible counts the number of tasks in
> uninterruptible sleep, so deactivate_task puts a task to sleep, so we
> need to increment the number of sleeping tasks, activate_task wakes a
> task up so we need to decrement the number of sleeping tasks.
Yep, I looked at the code for task_contributes_to_load() and I understand what it is all about.
The ++ and -- are correct, I see it now.
On the -ve values, strangely inspite of %Ld in the print statement, in my kernel, I see high unsigned values instead of -ve values for nr_uninterruptible.
But the sum is always 0, though.
Maybe it is an artifact of 32-bit machine displaying 64-bit print format.
An (unsigned long)(-24) promoted to (signed long long) ends up as 4294967272.
As seen in my output of sched_debug.
Your machine is probably natively 64-bit.
$ adb shell cat /proc/sched_debug |egrep 'cpu#|nr_'
cpu#0
.nr_running : 1
.nr_switches : 16233
.nr_load_updates : 2529
.nr_uninterruptible : 4294967272 <<<<< 0xffffffe8 == (-24)
.nr_spread_over : 18
.nr_running : 0
.nr_spread_over : 101
.nr_running : 1
.rt_nr_running : 0
.rt_nr_running : 0
.rt_nr_running : 0
cpu#1
.nr_running : 1
.nr_switches : 7891
.nr_load_updates : 2124
.nr_uninterruptible : 18
.nr_spread_over : 121
.nr_running : 1
.rt_nr_running : 0
.rt_nr_running : 0
.rt_nr_running : 0
cpu#3
.nr_running : 1
.nr_switches : 13896
.nr_load_updates : 1179
.nr_uninterruptible : 6
.nr_spread_over : 106
.nr_running : 1
.rt_nr_running : 0
.rt_nr_running : 0
.rt_nr_running : 0
Thanks,
--Diwakar.
-----Original Message-----
From: Peter Zijlstra [mailto:[email protected]]
Sent: Tuesday, May 08, 2012 3:30 PM
To: Diwakar Tundlam
Cc: 'Ingo Molnar'; 'David Rientjes'; '[email protected]'; Peter De Schrijver
Subject: RE: [PATCH] sched: Make nr_uninterruptible count a signed value
On Wed, 2012-05-09 at 00:27 +0200, Peter Zijlstra wrote:
> On Tue, 2012-05-08 at 15:14 -0700, Diwakar Tundlam wrote:
> > Sorry to bug you when it is late for you..
> >
> Nah, I'm the idiot still behind the screen after midnight, its just
> the brain that's slightly slower and needs more hints.
>
> > You're right, there is no real difference at all.
> > Only cosmetic difference when you look at the output of cat
> > /proc/sched_debug.
>
> Not sure I see that.. the printf is still using %Ld (signed) so the
> output shouldn't matter regardless of if the variable is unsigned long
> or long.
> >
> > But I suddenly realized maybe the increment/decrement of
> > nr_interruptible is reversed.
> > Maybe that's the source of the problem: decrement in activate task
> > and increment in deactivate task !!
>
> No that's right. nr_uninterruptible counts the number of tasks in
> uninterruptible sleep, so deactivate_task puts a task to sleep, so we
> need to increment the number of sleeping tasks, activate_task wakes a
> task up so we need to decrement the number of sleeping tasks.
>
> I think the problem you're having is that we don't match the cpu where
> we inc and dec the counter, and that's fully on purpose since its
> rather expensive -- it would require atomics.
>
FWIW the way to properly read the sched_debug output is something like:
# grep nr_uninterruptible /proc/sched_debug
.nr_uninterruptible : -1305
.nr_uninterruptible : 336
.nr_uninterruptible : -229
.nr_uninterruptible : 276
.nr_uninterruptible : 105
.nr_uninterruptible : 157
.nr_uninterruptible : -2782
.nr_uninterruptible : 325
.nr_uninterruptible : -471
.nr_uninterruptible : 9
.nr_uninterruptible : 205
.nr_uninterruptible : 88
.nr_uninterruptible : 7
.nr_uninterruptible : 912
.nr_uninterruptible : 188
.nr_uninterruptible : 66
.nr_uninterruptible : 87
.nr_uninterruptible : 45
.nr_uninterruptible : 194
.nr_uninterruptible : 1178
.nr_uninterruptible : 185
.nr_uninterruptible : 143
.nr_uninterruptible : 136
.nr_uninterruptible : 145
# awk '/nr_uninterruptible/ {t += $3} END {print t}' /proc/sched_debug
0
The per-cpu value is meaningless, only the sum over all cpus is a meaningful number.
On Tue, 2012-05-08 at 15:46 -0700, Diwakar Tundlam wrote:
> Maybe it is an artifact of 32-bit machine displaying 64-bit print format.
> An (unsigned long)(-24) promoted to (signed long long) ends up as 4294967272.
> As seen in my output of sched_debug.
Ah, quite possible. %Ld is indeed %lld and the value is long, not long
long. So the proper fix is to fudge that printk statement somehow.
> Your machine is probably natively 64-bit.
Yeah, I gave up on 32bit computing a while ago..
On 05/09/2012 06:46 AM, Diwakar Tundlam wrote:
>> No that's right. nr_uninterruptible counts the number of tasks in
>> uninterruptible sleep, so deactivate_task puts a task to sleep, so we
>> need to increment the number of sleeping tasks, activate_task wakes a
>> task up so we need to decrement the number of sleeping tasks.
>
> Yep, I looked at the code for task_contributes_to_load() and I understand what it is all about.
> The ++ and -- are correct, I see it now.
>
> On the -ve values, strangely inspite of %Ld in the print statement, in my kernel, I see high unsigned values instead of -ve values for nr_uninterruptible.
>
> But the sum is always 0, though.
>
> Maybe it is an artifact of 32-bit machine displaying 64-bit print format.
> An (unsigned long)(-24) promoted to (signed long long) ends up as 4294967272.
> As seen in my output of sched_debug.
This may do some help.
Regards,
Michael Wang
---
kernel/sched/debug.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
index 09acaa1..ab9d53f 100644
--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -270,7 +270,7 @@ static void print_cpu(struct seq_file *m, int cpu)
rq->load.weight);
P(nr_switches);
P(nr_load_updates);
- P(nr_uninterruptible);
+ P((signed long)nr_uninterruptible);
PN(next_balance);
P(curr->pid);
PN(clock);
--
1.7.4.1
>> - P(nr_uninterruptible);
>> + P((signed long)nr_uninterruptible);
I thought of it too, but it won't compile because the P macro expands to dereferencing rq->nr_uninterruptible, so with your change, it will show up as rq->(signed long)nr_uninterruptible which is a syntax error.
--Diwakar.
-----Original Message-----
From: Michael Wang [mailto:[email protected]]
Sent: Wednesday, May 09, 2012 12:49 AM
To: Diwakar Tundlam
Cc: 'Peter Zijlstra'; 'Ingo Molnar'; 'David Rientjes'; '[email protected]'; Peter De Schrijver
Subject: Re: [PATCH] sched: Make nr_uninterruptible count a signed value
On 05/09/2012 06:46 AM, Diwakar Tundlam wrote:
>> No that's right. nr_uninterruptible counts the number of tasks in
>> uninterruptible sleep, so deactivate_task puts a task to sleep, so we
>> need to increment the number of sleeping tasks, activate_task wakes a
>> task up so we need to decrement the number of sleeping tasks.
>
> Yep, I looked at the code for task_contributes_to_load() and I understand what it is all about.
> The ++ and -- are correct, I see it now.
>
> On the -ve values, strangely inspite of %Ld in the print statement, in my kernel, I see high unsigned values instead of -ve values for nr_uninterruptible.
>
> But the sum is always 0, though.
>
> Maybe it is an artifact of 32-bit machine displaying 64-bit print format.
> An (unsigned long)(-24) promoted to (signed long long) ends up as 4294967272.
> As seen in my output of sched_debug.
This may do some help.
Regards,
Michael Wang
---
kernel/sched/debug.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c index 09acaa1..ab9d53f 100644
--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -270,7 +270,7 @@ static void print_cpu(struct seq_file *m, int cpu)
rq->load.weight);
P(nr_switches);
P(nr_load_updates);
- P(nr_uninterruptible);
+ P((signed long)nr_uninterruptible);
PN(next_balance);
P(curr->pid);
PN(clock);
--
1.7.4.1
The issue is because when promoting from 32-bit unsigned to 64-bit signed, the compiler first does SIZE increase and then the sign change. If it did SIGN change first and then size increase (with sign-extension) and it will work correctly.
But then, the compiler is probably doing the right thing here because the starting point is an unsigned value and when it is assigned a negative value, the compiler is perfectly justified in converting it to a large positive value which is what happened here.
The theoretical correct fix is to change the declaration to signed long nr_uninterruptible simply because it can sometimes become -ve.
The other fix is to fix the print statement - but it feels wrong.
It would be good to fix the declaration...
Thanks,
--Diwakar.
-----Original Message-----
From: Peter Zijlstra [mailto:[email protected]]
Sent: Wednesday, May 09, 2012 1:12 AM
To: Diwakar Tundlam
Cc: 'Ingo Molnar'; 'David Rientjes'; '[email protected]'; Peter De Schrijver
Subject: RE: [PATCH] sched: Make nr_uninterruptible count a signed value
On Tue, 2012-05-08 at 15:46 -0700, Diwakar Tundlam wrote:
> Maybe it is an artifact of 32-bit machine displaying 64-bit print format.
> An (unsigned long)(-24) promoted to (signed long long) ends up as 4294967272.
> As seen in my output of sched_debug.
Ah, quite possible. %Ld is indeed %lld and the value is long, not long long. So the proper fix is to fudge that printk statement somehow.
> Your machine is probably natively 64-bit.
Yeah, I gave up on 32bit computing a while ago..
????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m????????????I?
On 05/09/2012 04:11 PM, Peter Zijlstra wrote:
> On Tue, 2012-05-08 at 15:46 -0700, Diwakar Tundlam wrote:
>> Maybe it is an artifact of 32-bit machine displaying 64-bit print format.
>> An (unsigned long)(-24) promoted to (signed long long) ends up as 4294967272.
>> As seen in my output of sched_debug.
>
> Ah, quite possible. %Ld is indeed %lld and the value is long, not long
> long. So the proper fix is to fudge that printk statement somehow.
Could we use SEQ_printf for nr_uninterruptible, just like we did on
rq->load.weight?
Regards,
Michael Wang
>
>> Your machine is probably natively 64-bit.
>
> Yeah, I gave up on 32bit computing a while ago..
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
Or something like the below.. I recently changed nr_running to int (see
c82513e51) so P(nr_running) would also benefit.
---
kernel/sched/debug.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
index 31e4f61..954fabf 100644
--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -260,8 +260,14 @@ static void print_cpu(struct seq_file *m, int cpu)
SEQ_printf(m, "\ncpu#%d\n", cpu);
#endif
-#define P(x) \
- SEQ_printf(m, " .%-30s: %Ld\n", #x, (long long)(rq->x))
+#define P(x) \
+do { \
+ if (sizeof(rq->x) == 4) \
+ SEQ_printf(m, " .%-30s: %ld\n", #x, (long)(rq->x)); \
+ else \
+ SEQ_printf(m, " .%-30s: %Ld\n", #x, (long long)(rq->x));\
+} while (0)
+
#define PN(x) \
SEQ_printf(m, " .%-30s: %Ld.%06ld\n", #x, SPLIT_NS(rq->x))
On 05/10/2012 02:55 AM, Diwakar Tundlam wrote:
>>> - P(nr_uninterruptible);
>>> + P((signed long)nr_uninterruptible);
>
> I thought of it too, but it won't compile because the P macro expands to dereferencing rq->nr_uninterruptible, so with your change, it will show up as rq->(signed long)nr_uninterruptible which is a syntax error.
>
oh, I see, so looks like it's a little hard to avoid some ugly change...
--Michael Wang
> --Diwakar.
>
> -----Original Message-----
> From: Michael Wang [mailto:[email protected]]
> Sent: Wednesday, May 09, 2012 12:49 AM
> To: Diwakar Tundlam
> Cc: 'Peter Zijlstra'; 'Ingo Molnar'; 'David Rientjes'; '[email protected]'; Peter De Schrijver
> Subject: Re: [PATCH] sched: Make nr_uninterruptible count a signed value
>
> On 05/09/2012 06:46 AM, Diwakar Tundlam wrote:
>
>>> No that's right. nr_uninterruptible counts the number of tasks in
>>> uninterruptible sleep, so deactivate_task puts a task to sleep, so we
>>> need to increment the number of sleeping tasks, activate_task wakes a
>>> task up so we need to decrement the number of sleeping tasks.
>>
>> Yep, I looked at the code for task_contributes_to_load() and I understand what it is all about.
>> The ++ and -- are correct, I see it now.
>>
>> On the -ve values, strangely inspite of %Ld in the print statement, in my kernel, I see high unsigned values instead of -ve values for nr_uninterruptible.
>>
>> But the sum is always 0, though.
>>
>> Maybe it is an artifact of 32-bit machine displaying 64-bit print format.
>> An (unsigned long)(-24) promoted to (signed long long) ends up as 4294967272.
>> As seen in my output of sched_debug.
>
>
> This may do some help.
>
> Regards,
> Michael Wang
> ---
> kernel/sched/debug.c | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c index 09acaa1..ab9d53f 100644
> --- a/kernel/sched/debug.c
> +++ b/kernel/sched/debug.c
> @@ -270,7 +270,7 @@ static void print_cpu(struct seq_file *m, int cpu)
> rq->load.weight);
> P(nr_switches);
> P(nr_load_updates);
> - P(nr_uninterruptible);
> + P((signed long)nr_uninterruptible);
> PN(next_balance);
> P(curr->pid);
> PN(clock);
> --
> 1.7.4.1
>
On 05/10/2012 05:46 PM, Peter Zijlstra wrote:
> Or something like the below.. I recently changed nr_running to int (see
> c82513e51) so P(nr_running) would also benefit.
>
> ---
> kernel/sched/debug.c | 10 ++++++++--
> 1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
> index 31e4f61..954fabf 100644
> --- a/kernel/sched/debug.c
> +++ b/kernel/sched/debug.c
> @@ -260,8 +260,14 @@ static void print_cpu(struct seq_file *m, int cpu)
> SEQ_printf(m, "\ncpu#%d\n", cpu);
> #endif
>
> -#define P(x) \
> - SEQ_printf(m, " .%-30s: %Ld\n", #x, (long long)(rq->x))
> +#define P(x) \
> +do { \
> + if (sizeof(rq->x) == 4) \
> + SEQ_printf(m, " .%-30s: %ld\n", #x, (long)(rq->x)); \
Oh, yes, I haven't noticed that sizeof could also be a check point.
So now we can use P(x) freely and don't need to worry about anything.
Regards,
Michael Wang
> + else \
> + SEQ_printf(m, " .%-30s: %Ld\n", #x, (long long)(rq->x));\
> +} while (0)
> +
> #define PN(x) \
> SEQ_printf(m, " .%-30s: %Ld.%06ld\n", #x, SPLIT_NS(rq->x))
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>