2002-12-04 12:36:43

by Erich Focht

[permalink] [raw]
Subject: per cpu time statistics

Andrew, Bill,

I had to learn from Michael Hohnbaum that you've eliminated the per
CPU time statistics in 2.5.50 (akpm changeset from Nov. 26). Reading
the cset comments I understood that the motivation was to save
8*NR_CPUS bytes of memory in the task_struct. Maybe that was really an
issue at the time when Bill suggested the patch (July), but in the
mean time we got configurable NR_CPUS (October) and that small amount
of additional memory really doesn't matter. Most people running SMP
have 2 CPUs.

I wasn't aware of the patch and the RFC from Bill, otherwise I would
have "shouted" long time ago. My fault... BTW: did that RFC go to the
LSE mailing list, too? I can't remember. But that's the place were I'd
expect to find people interested in such issues.

When digging in the kernel archives I found:

wli> On Tue, Jul 16, 2002 at 11:12:32AM +0100, Alan Cox wrote:
wli> > A PS: to that. I'm not opposed to removing them. I'd prefer them left
wli> > around in the kernel debugging options though
wli>
wli> In that case, I can make it conditional on something like
wli> CONFIG_DEBUG_SCHED, which option of course would go in the "Kernel Hacking"
wli> section.
wli>
wli> Cheers,
wli> Bill

I find this idea better than just eliminating the only way of finding
out on which CPUs a task has spent its time. This is an essential
question when investigating the performance on SMP and NUMA systems.

For those who miss this feature I'm attaching a patch doing what wli
suggested. The config option is CONFIG_CPUS_STAT and can be found in
the "Kernel Hacking" menu, as wli suggested. Just didn't like
DEBUG_SCHED, we want to monitor the statistics and this is not
necessarily related to bugs in the scheduler. Also added as last line
in /proc/pid/cpu the current CPU of the task. It's often needed and
/proc/pid/stat is much too cryptic.

Regards,

Erich


Attachments:
cputimes_stat-2.5.50.patch (11.54 kB)

2002-12-04 17:40:02

by Andrew Morton

[permalink] [raw]
Subject: Re: per cpu time statistics

Erich Focht wrote:
>
> Andrew, Bill,
>
> I had to learn from Michael Hohnbaum that you've eliminated the per
> CPU time statistics in 2.5.50 (akpm changeset from Nov. 26). Reading
> the cset comments I understood that the motivation was to save
> 8*NR_CPUS bytes of memory in the task_struct. Maybe that was really an
> issue at the time when Bill suggested the patch (July), but in the
> mean time we got configurable NR_CPUS (October) and that small amount
> of additional memory really doesn't matter. Most people running SMP
> have 2 CPUs.

It's mainly the big ia32 boxes which a) have a lot of CPUs and
b) have a lot of memory and c) run a lot of tasks. They're
gasping for normal-zone memory.

I'm half-inclined to just revert the whole thing and put the stats
back, rather than adding yet another obscure config option. But
your patch is certainly very tidy...

2002-12-04 18:23:19

by William Lee Irwin III

[permalink] [raw]
Subject: Re: per cpu time statistics

Erich Focht wrote:
>> I had to learn from Michael Hohnbaum that you've eliminated the per
>> CPU time statistics in 2.5.50 (akpm changeset from Nov. 26). Reading
>> the cset comments I understood that the motivation was to save
>> 8*NR_CPUS bytes of memory in the task_struct. Maybe that was really an
>> issue at the time when Bill suggested the patch (July), but in the
>> mean time we got configurable NR_CPUS (October) and that small amount
>> of additional memory really doesn't matter. Most people running SMP
>> have 2 CPUs.

On Wed, Dec 04, 2002 at 09:47:26AM -0800, Andrew Morton wrote:
> It's mainly the big ia32 boxes which a) have a lot of CPUs and
> b) have a lot of memory and c) run a lot of tasks. They're
> gasping for normal-zone memory.
> I'm half-inclined to just revert the whole thing and put the stats
> back, rather than adding yet another obscure config option. But
> your patch is certainly very tidy...


I'm (obviously) in favor of Erich's patch. =)

Still gasping for ZONE_NORMAL,
Bill

2002-12-05 10:52:00

by Erich Focht

[permalink] [raw]
Subject: Re: per cpu time statistics

On Wednesday 04 December 2002 18:47, Andrew Morton wrote:
> It's mainly the big ia32 boxes which a) have a lot of CPUs and
> b) have a lot of memory and c) run a lot of tasks. They're
> gasping for normal-zone memory.

There's a world beyond 32 bits, with plenty of normal-zone memory ;-)

> I'm half-inclined to just revert the whole thing and put the stats
> back, rather than adding yet another obscure config option. But
> your patch is certainly very tidy...

My patch is basically the reverting patch plus changed ifdefs and a
bunch of Kconfig entries. I'd be happy to get this feature back, no
matter how it is implemented. I think it is necessary for performance
analysis on HT, NUMA, SMT systems.

Regards,
Erich



2002-12-05 11:07:19

by William Lee Irwin III

[permalink] [raw]
Subject: Re: per cpu time statistics

On Thu, Dec 05, 2002 at 11:57:29AM +0100, Erich Focht wrote:
> My patch is basically the reverting patch plus changed ifdefs and a
> bunch of Kconfig entries. I'd be happy to get this feature back, no
> matter how it is implemented. I think it is necessary for performance
> analysis on HT, NUMA, SMT systems.

32-bit is not irrelevant, but there's certainly a 64-bit land where
the critical 32-bit correctness issue becomes a minor 64-bit
performance issue. This is the nature of extended 32-bit addressing.
By and large what is a severe and world-breaking correctness issue for
32-bit with extended addressing is a performance issue for the rest.

And so I feel we are all in harmony; the scheduler statistics are in
fact valuable on all platforms, it's just an question of basic "should
this overhead be required or optional?"


Bill

2002-12-05 16:56:39

by Bill Davidsen

[permalink] [raw]
Subject: Re: per cpu time statistics

On Thu, 5 Dec 2002, William Lee Irwin III wrote:

> And so I feel we are all in harmony; the scheduler statistics are in
> fact valuable on all platforms, it's just an question of basic "should
> this overhead be required or optional?"

Unless there's a downside, optional. While I might want to instrument a
kernel to follow a problem, I see no gain for most people, regardless of
number of processors.

--
bill davidsen <[email protected]>
CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.

2002-12-06 17:22:46

by Michael Hohnbaum

[permalink] [raw]
Subject: Re: per cpu time statistics

On Wed, 2002-12-04 at 04:43, Erich Focht wrote:
> Andrew, Bill,
>
> I had to learn from Michael Hohnbaum that you've eliminated the per
> CPU time statistics in 2.5.50 (akpm changeset from Nov. 26).
...
>
> For those who miss this feature I'm attaching a patch doing what wli
> suggested. The config option is CONFIG_CPUS_STAT and can be found in
> the "Kernel Hacking" menu, as wli suggested. Just didn't like
> DEBUG_SCHED, we want to monitor the statistics and this is not
> necessarily related to bugs in the scheduler. Also added as last line
> in /proc/pid/cpu the current CPU of the task. It's often needed and
> /proc/pid/stat is much too cryptic.

I use a set of tests provided by Erich that use the cpu information from
the task. This information is crucial for understanding how processes
are dispatched across CPUs (and nodes on NUMA boxes). I've applied
Erich's patch and it restores this data, making his tests useful. Could
this patch be considered for inclusion? Please.

Michael
>
> Regards,
>
> Erich
> ----

--

Michael Hohnbaum 503-578-5486
[email protected] T/L 775-5486

2002-12-06 17:49:14

by William Lee Irwin III

[permalink] [raw]
Subject: Re: [Lse-tech] Re: per cpu time statistics

On Wed, 2002-12-04 at 04:43, Erich Focht wrote:
>> I had to learn from Michael Hohnbaum that you've eliminated the per
>> CPU time statistics in 2.5.50 (akpm changeset from Nov. 26).
> ...
>> For those who miss this feature I'm attaching a patch doing what wli
>> suggested. The config option is CONFIG_CPUS_STAT and can be found in
>> the "Kernel Hacking" menu, as wli suggested. Just didn't like
>> DEBUG_SCHED, we want to monitor the statistics and this is not
>> necessarily related to bugs in the scheduler. Also added as last line
>> in /proc/pid/cpu the current CPU of the task. It's often needed and
>> /proc/pid/stat is much too cryptic.

On Fri, Dec 06, 2002 at 09:31:43AM -0800, Michael Hohnbaum wrote:
> I use a set of tests provided by Erich that use the cpu information from
> the task. This information is crucial for understanding how processes
> are dispatched across CPUs (and nodes on NUMA boxes). I've applied
> Erich's patch and it restores this data, making his tests useful. Could
> this patch be considered for inclusion? Please.

We've already come to a resolution on this one (going with Erich's patch),
or at least I got that impression.

Bill