2002-07-31 02:43:51

by David Luyer

[permalink] [raw]
Subject: Linux 2.4.19ac3rc3 on IBM x330/x340 SMP - "ps" time skew

In Linux 2.4.19ac3rc3 on IBM x330/x340 SMP systems we're seeing this:

luyer@praxis8:~$ ps auxwww | tail -1
luyer 1025 0.0 0.0 1276 352 pts/2 S Aug06 0:00 tail -1
luyer@praxis8:~$ date
Wed Jul 31 12:35:16 EST 2002

luyer@praxis8:~$ cat /proc/$$/stat
1053 (bash) S 1052 1053 1053 34818 1056 0 99 56 294 99 1 0 0 0 15 0 0 0
49574810 2244608 316 4294
967295 134512640 134997952 3221225056 3221224264 1074760249 0 65536
3686404 1266761467 3222376853
0 0 17 0
luyer@praxis8:~$ cat /proc/uptime
495803.96 481602.41
luyer@praxis8:~$ cat /proc/stat
cpu 1570707 3 1853018 95737544
cpu0 685268 1 876356 48019011
cpu1 885439 2 976662 47718533
page 1720960 27277642
swap 25 534
intr 244887271 49580636 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1742620 0 16 16 193563981 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0
disk_io: (8,0):(988585,67688,1019318,920897,8472052)
(8,1):(1832129,206174,2422602,1625955,460832
32)
ctxt 81010049
btime 1027587797
processes 98440

Note the time skew in the "ps" output (the clock time is correct).

This is happening on all ten IBM x330/x340's we're running this in SMP
on this kernel.

We have fourteen Intel ISP 2150 servers running UP with IO-APIC on SMP
motherboards which are not experiencing this issue.

David.
--
David Luyer Phone: +61 3 9674 7525
Network Development Manager P A C I F I C Fax: +61 3 9699 8693
Pacific Internet (Australia) I N T E R N E T Mobile: +61 4 1111 BYTE
http://www.pacific.net.au/ NASDAQ: PCNTF


2002-07-31 10:58:36

by David Luyer

[permalink] [raw]
Subject: RE: Linux 2.4.19ac3rc3 on IBM x330/x340 SMP - "ps" time skew

I wrote:

> In Linux 2.4.19ac3rc3 on IBM x330/x340 SMP systems we're seeing this:
>
> luyer@praxis8:~$ ps auxwww | tail -1
> luyer 1025 0.0 0.0 1276 352 pts/2 S Aug06 0:00 tail -1
> luyer@praxis8:~$ date
> Wed Jul 31 12:35:16 EST 2002

(UP systems are fine, SMP have this problem)

Reason:

luyer@praxis8:~$ ps --info 2>&1 | grep Hertz
EUID=111 TTY=136,3 Hertz=50

procps is getting the hertz value wrong, it's computing it as:

h = (unsigned long)( (double)jiffies/seconds/smp_num_cpus );

but we're only getting timer interrupts on CPU 0, and hence
jiffies is only incrementing once per 100th of a second.

luyer@praxis8:~/procps/procps-2.0.7.orig/proc$ cat /proc/interrupts
CPU0 CPU1
0: 52459351 0 local-APIC-edge timer
1: 0 2 IO-APIC-edge keyboard
2: 0 0 XT-PIC cascade
24: 883655 863043 IO-APIC-level ips
26: 7 9 IO-APIC-level aic7xxx
27: 8 8 IO-APIC-level aic7xxx
28: 97880608 96542591 IO-APIC-level eth0
NMI: 0 0
LOC: 52456889 52456887
ERR: 0
MIS: 0

procps version is 2.0.7 (Debian 3.0).

Where's the mistake -- should timer interrupts be on both
CPUs (I think this is the problem), or is procps miscalculating
Hz (seems less likely, someone would have noticed by now...)?

David.

2002-07-31 12:09:20

by Alan

[permalink] [raw]
Subject: RE: Linux 2.4.19ac3rc3 on IBM x330/x340 SMP - "ps" time skew

> procps version is 2.0.7 (Debian 3.0).
>
> Where's the mistake -- should timer interrupts be on both
> CPUs (I think this is the problem), or is procps miscalculating
> Hz (seems less likely, someone would have noticed by now...)?

HZ on x86 for user space is defined as 100. Its a procps problem

2002-07-31 12:56:18

by David Luyer

[permalink] [raw]
Subject: RE: Linux 2.4.19ac3rc3 on IBM x330/x340 SMP - "ps" time skew

Alan Cox wrote:
> > procps version is 2.0.7 (Debian 3.0).
> >
> > Where's the mistake -- should timer interrupts be on both
> > CPUs (I think this is the problem), or is procps miscalculating
> > Hz (seems less likely, someone would have noticed by now...)?
>
> HZ on x86 for user space is defined as 100. Its a procps problem

Slight error in my initial diagnosis of why procps is getting Hertz
wrong tho. It's not because timer interrupts are only happening
on one CPU. It's because it thinks I have 4 CPUs per system, when
really I only have 2 CPUs per system.

It's taking jiffies from the sum of the figures on the first line
of /proc/stat and dividing by the uptime in seconds from /proc/uptime
multiplied by the number of CPUs. The system has two CPUs, #0 and #1,
and is reporting _SC_NPROCESSORS_CONF as 4 (the count used by procps
as the number of CPUs).

Looks like even if it is procps's fault for not just using HZ==100,
the kernel is leading it astray by claiming I have twice as many
CPUs as I really do.

uyer@praxis8:~$ make cpus
cc cpus.c -o cpus
luyer@praxis8:~$ cat cpus.c
#include <unistd.h>

main () {
printf("%d\n", sysconf(_SC_NPROCESSORS_CONF));
}
luyer@praxis8:~$ ./cpus
4
luyer@praxis8:~$ grep 'processor ' /proc/cpuinfo
processor : 0
processor : 1
luyer@praxis8:~$ dmesg | grep -E 'Initializing CPU|CPU #. not
responding'
Initializing CPU#0
Initializing CPU#1
CPU #3 not responding - cannot use it.

David.

2002-07-31 13:15:04

by David Luyer

[permalink] [raw]
Subject: RE: Linux 2.4.19ac3rc3 on IBM x330/x340 SMP - "ps" time skew

> On Wed, 2002-07-31 at 13:59, David Luyer wrote:
> > printf("%d\n", sysconf(_SC_NPROCESSORS_CONF));
> > }
> > luyer@praxis8:~$ ./cpus
> > 4
> > luyer@praxis8:~$ grep 'processor ' /proc/cpuinfo
> > processor : 0
> > processor : 1
>
> In which case I suggest you file a glibc bug. sysconf looks
> at the /proc stuff as I understand it

Great, got it, thanks: sysconf(_SC_NPROCESSORS_CONF) parses
/proc/cpuinfo
using a simple parser:

#ifndef GET_NPROCS_PARSER
# define GET_NPROCS_PARSER(FP, BUFFER, RESULT)
\
do
\
{
\
(RESULT) = 0;
\
/* Read all lines and count the lines starting with the string
\
"processor". We don't have to fear extremely long lines since
\
the kernel will not generate them. 8192 bytes are really
\
enough. */
\
while (fgets_unlocked (BUFFER, sizeof (BUFFER), FP) != NULL)
\
if (strncmp (BUFFER, "processor", 9) == 0)
\
++(RESULT);
\
}
\
while (0)
#endif

It's being tricked by this:

luyer@praxis8:~$ cat /proc/cpuinfo | grep '^processor'
processor : 0
processor id : 0
processor : 1
processor id : 0

The "processor id" line, only present with SMP enabled, is being counted
as a processor.

David.

2002-07-31 13:06:47

by Alan

[permalink] [raw]
Subject: RE: Linux 2.4.19ac3rc3 on IBM x330/x340 SMP - "ps" time skew

On Wed, 2002-07-31 at 13:59, David Luyer wrote:
> printf("%d\n", sysconf(_SC_NPROCESSORS_CONF));
> }
> luyer@praxis8:~$ ./cpus
> 4
> luyer@praxis8:~$ grep 'processor ' /proc/cpuinfo
> processor : 0
> processor : 1

In which case I suggest you file a glibc bug. sysconf looks at the /proc
stuff as I understand it

2002-07-31 13:27:52

by Dana Lacoste

[permalink] [raw]
Subject: RE: Linux 2.4.19ac3rc3 on IBM x330/x340 SMP - "ps" time skew

On Wed, 2002-07-31 at 13:59, David Luyer wrote:
> printf("%d\n", sysconf(_SC_NPROCESSORS_CONF));
> }
> luyer@praxis8:~$ ./cpus
> 4

I ran your test program on a Compaq DL360 and an IBM x330
and both showed '2' for the CPU count (2.4.18 stock, glibc 2.2.3)

Just a point of reference to help narrow the problem area down :)

Dana Lacoste
Ottawa, Canada

2002-07-31 13:35:20

by David Luyer

[permalink] [raw]
Subject: RE: Linux 2.4.19ac3rc3 on IBM x330/x340 SMP - "ps" time skew

Dana Lacoste wrote:
> On Wed, 2002-07-31 at 13:59, David Luyer wrote:
> > printf("%d\n", sysconf(_SC_NPROCESSORS_CONF));
> > }
> > luyer@praxis8:~$ ./cpus
> > 4
>
> I ran your test program on a Compaq DL360 and an IBM x330
> and both showed '2' for the CPU count (2.4.18 stock, glibc 2.2.3)
>
> Just a point of reference to help narrow the problem area down :)

Yes, the problem is in the -ac train only. It's the "processor id"
field that has been added to /proc/cpuinfo which is confusing libc's
way of counting CPUs.

That's a libc bug. But there's also a kernel bug with that field
it appears.

The kernel bug: the "processor id" fields are both printing zero.

Possibly because show_cpuinfo() in arch/i386/kernel/setup.c prints
directly out of phys_proc_id as at the time it's called, but
smpboot.c declates phys_proc_id as __initdata (either that, or
phys_proc_id is actually zero for both CPUs?).

David.
--
David Luyer Phone: +61 3 9674 7525
Network Development Manager P A C I F I C Fax: +61 3 9699 8693
Pacific Internet (Australia) I N T E R N E T Mobile: +61 4 1111 BYTE
http://www.pacific.net.au/ NASDAQ: PCNTF

2002-07-31 13:45:26

by Alan

[permalink] [raw]
Subject: RE: Linux 2.4.19ac3rc3 on IBM x330/x340 SMP - "ps" time skew

On Wed, 2002-07-31 at 14:38, David Luyer wrote:
> Yes, the problem is in the -ac train only. It's the "processor id"
> field that has been added to /proc/cpuinfo which is confusing libc's
> way of counting CPUs.
>
> That's a libc bug. But there's also a kernel bug with that field
> it appears.

Currently yes - it got broken during the Summit rearrangements

> The kernel bug: the "processor id" fields are both printing zero.
>
> Possibly because show_cpuinfo() in arch/i386/kernel/setup.c prints
> directly out of phys_proc_id as at the time it's called, but
> smpboot.c declates phys_proc_id as __initdata (either that, or
> phys_proc_id is actually zero for both CPUs?).

The former is the problem. Thanks for spotting it. As to the text
string, I'll have a chat with Ulrich about it and see what he thinks

2002-07-31 13:53:57

by David Luyer

[permalink] [raw]
Subject: RE: Linux 2.4.19ac3rc3 on IBM x330/x340 SMP - "ps" time skew

> > Possibly because show_cpuinfo() in arch/i386/kernel/setup.c prints
> > directly out of phys_proc_id as at the time it's called, but
> > smpboot.c declates phys_proc_id as __initdata (either that, or
> > phys_proc_id is actually zero for both CPUs?).
>
> The former is the problem. Thanks for spotting it. As to the text
> string, I'll have a chat with Ulrich about it and see what he thinks

The former and the latter possibly: the only assignment I see for
phys_proc_id is when hyperthreading is happening (in fact, requires
all of X86_FEATURE_HT, !disable_x86_ht and smp_num_siblings > 1);
down the end of kernel/setup.c init_intel().

David.

2002-07-31 16:11:56

by Jonathan Lundell

[permalink] [raw]
Subject: NMI watchdog, die(), & console_loglevel

The i386 NMI watchdog handler prints a message, sets console_loglevel
to 0 (no output to console), and then kills the current task
(arch/i386/kernel/nmi.c:nmi_watchdog_tick()); it then leaves the
console turned off.

die(), on the other hand, starts out by setting console_loglevel to
15 (print everything), and leaves it there.

Neither behavior seems particularly appropriate, and taken together
they seem at least inconsistent. What's the justification, if any,
and wouldn't it be better to leave console_loglevel alone and set an
appropriate message loglevel? (Not that I'd claim for an instant that
message loglevels are used consistently; have a look at the various
applications of KERN_EMERG, for example.)
--
/Jonathan Lundell.

2002-07-31 19:11:10

by Albert D. Cahalan

[permalink] [raw]
Subject: Re: Linux 2.4.19ac3rc3 on IBM x330/x340 SMP - "ps" time skew

Alan Cox writes:
> On Wed, 2002-07-31 at 13:59, David Luyer wrote:

>> printf("%d\n", sysconf(_SC_NPROCESSORS_CONF));
>> }
>> luyer@praxis8:~$ ./cpus
>> 4
>> luyer@praxis8:~$ grep 'processor ' /proc/cpuinfo
>> processor : 0
>> processor : 1
>
> In which case I suggest you file a glibc bug. sysconf looks at the /proc
> stuff as I understand it

First you blame ps. Then you blame libc. How about you
place the fault right where it belongs?

Counting processors in /proc/cpuinfo is a joke of an ABI.

Add a proper ABI now, and userspace can transition to it
over the next 4 years.

2002-07-31 23:17:18

by Alan

[permalink] [raw]
Subject: Re: Linux 2.4.19ac3rc3 on IBM x330/x340 SMP - "ps" time skew

On Wed, 2002-07-31 at 20:14, Albert D. Cahalan wrote:
> Alan Cox writes:
> > On Wed, 2002-07-31 at 13:59, David Luyer wrote:
>
> >> printf("%d\n", sysconf(_SC_NPROCESSORS_CONF));
> >> }
> >> luyer@praxis8:~$ ./cpus
> >> 4
> >> luyer@praxis8:~$ grep 'processor ' /proc/cpuinfo
> >> processor : 0
> >> processor : 1
> >
> > In which case I suggest you file a glibc bug. sysconf looks at the /proc
> > stuff as I understand it
>
> First you blame ps. Then you blame libc. How about you
> place the fault right where it belongs?

ps is certainly buggy. HZ is 100. ps grovelling around in /proc is bogus
to say the least. That code wasn't exactly well written.

> Counting processors in /proc/cpuinfo is a joke of an ABI.
>
> Add a proper ABI now, and userspace can transition to it
> over the next 4 years.

Which is what I've been talking to Ulrich about.

2002-07-31 23:40:19

by Lincoln Dale

[permalink] [raw]
Subject: RE: Linux 2.4.19ac3rc3 on IBM x330/x340 SMP - "ps" time skew

At 10:59 PM 31/07/2002 +1000, David Luyer wrote:
>Alan Cox wrote:
> > > procps version is 2.0.7 (Debian 3.0).
> > >
> > > Where's the mistake -- should timer interrupts be on both
> > > CPUs (I think this is the problem), or is procps miscalculating
> > > Hz (seems less likely, someone would have noticed by now...)?
> >
> > HZ on x86 for user space is defined as 100. Its a procps problem
>
>Slight error in my initial diagnosis of why procps is getting Hertz
>wrong tho. It's not because timer interrupts are only happening
>on one CPU. It's because it thinks I have 4 CPUs per system, when
>really I only have 2 CPUs per system.

procps is still wrong.

HZ on x86 is 100 by default.
that isn't 100 per CPU, but 100 per second, regardless of whether the timer
interrupt is distributed between CPUs or serviced on a single CPU.


cheers,

lincoln.

2002-07-31 23:46:01

by Dave Jones

[permalink] [raw]
Subject: Re: Linux 2.4.19ac3rc3 on IBM x330/x340 SMP - "ps" time skew

On Thu, Aug 01, 2002 at 01:37:17AM +0100, Alan Cox wrote:
> > Add a proper ABI now, and userspace can transition to it
> > over the next 4 years.
>
> Which is what I've been talking to Ulrich about.

I thought this was the idea behind sysconf(__SC_NPROCESSORS_CONF) ?

Dave

--
| Dave Jones. http://www.codemonkey.org.uk
| SuSE Labs

2002-08-01 00:11:10

by Alan

[permalink] [raw]
Subject: Re: Linux 2.4.19ac3rc3 on IBM x330/x340 SMP - "ps" time skew

On Thu, 2002-08-01 at 00:49, Dave Jones wrote:
> On Thu, Aug 01, 2002 at 01:37:17AM +0100, Alan Cox wrote:
> > > Add a proper ABI now, and userspace can transition to it
> > > over the next 4 years.
> >
> > Which is what I've been talking to Ulrich about.
>
> I thought this was the idea behind sysconf(__SC_NPROCESSORS_CONF) ?

sysconf is implemented in glibc. Right now this is done by poking around
in /proc/cpuinfo. The kernel doesn't export the data very nicely. With
2.5 and Rusty's hot swappable processors we need to export the data even
more explicitly.

2002-08-01 00:16:02

by Dave Jones

[permalink] [raw]
Subject: Re: Linux 2.4.19ac3rc3 on IBM x330/x340 SMP - "ps" time skew

On Thu, Aug 01, 2002 at 02:30:57AM +0100, Alan Cox wrote:
> sysconf is implemented in glibc. Right now this is done by poking around
> in /proc/cpuinfo.

Gotcha, that's what I feared.

> The kernel doesn't export the data very nicely. With
> 2.5 and Rusty's hot swappable processors we need to export the data even
> more explicitly.

driverfs objects perhaps ? Or something more lightweight ?

Dave.

--
| Dave Jones. http://www.codemonkey.org.uk
| SuSE Labs

2002-08-01 01:30:41

by Albert D. Cahalan

[permalink] [raw]
Subject: Re: Linux 2.4.19ac3rc3 on IBM x330/x340 SMP - "ps" time skew

Lincoln Dale writes:
> At 10:59 PM 31/07/2002 +1000, David Luyer wrote:
> >Alan Cox wrote:

>>> HZ on x86 for user space is defined as 100. Its a procps problem
>>
>> Slight error in my initial diagnosis of why procps is getting Hertz
>> wrong tho. It's not because timer interrupts are only happening
>> on one CPU. It's because it thinks I have 4 CPUs per system, when
>> really I only have 2 CPUs per system.
>
> procps is still wrong.
>
> HZ on x86 is 100 by default.
> that isn't 100 per CPU, but 100 per second, regardless of whether the timer
> interrupt is distributed between CPUs or serviced on a single CPU.

No shit. Now, how do you create a ps executable that handles
a 2.4.xx kernel with a modified HZ value? People did this all
the time. I got many bug reports from these people, so don't
go saying they don't exist. Remember: one executable, running
on both of the these:

2.2.xx i386 as shipped by Linus
2.4.xx i386 with HZ modified

Come on, write the code if you think it's so easy.
You get bonus points for supporting 2.0.xx kernels
and the IA-64 kernel with that same executable.

Maybe you think I should tell these people to go to Hell?
In that case, what about the Alpha systems that ran HZ at
1200 instead of 1024?

I really wonder why people love to torment me for having the
decency to support systems that aren't 100% Linus-compliant.
Do you people burn idols for Linus, or only kiss his butt?

2002-08-01 03:32:52

by Martin J. Bligh

[permalink] [raw]
Subject: Re: Linux 2.4.19ac3rc3 on IBM x330/x340 SMP - "ps" time skew

> No shit. Now, how do you create a ps executable that handles
> a 2.4.xx kernel with a modified HZ value? People did this all
> the time. I got many bug reports from these people, so don't
> go saying they don't exist. Remember: one executable, running
> on both of the these:
>
> <rant deleted>

Is it somehow impossible to just export HZ in /proc, and read it?
Doesn't seem too hard to me.

M.

2002-08-01 08:46:31

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: Linux 2.4.19ac3rc3 on IBM x330/x340 SMP - "ps" time skew

>2.2.xx i386 as shipped by Linus
>2.4.xx i386 with HZ modified
>
>Come on, write the code if you think it's so easy.
>You get bonus points for supporting 2.0.xx kernels
>and the IA-64 kernel with that same executable.
>
>Maybe you think I should tell these people to go to Hell?
>In that case, what about the Alpha systems that ran HZ at
>1200 instead of 1024?

Isn't HZ value passed down to userland via the ELF aux table ?

(At least the "userland visible" one, which isn't the kernel
internal one in recent 2.5's, oh well...)

That's a reason I don't understand why Linus did this separation
between "userland visibl" HZ and kernel internal HZ. I would have
just changed the kernel HZ and let userland be fixed to use the
value passed via the aux table instead of hard coding it.

Ben.


2002-08-01 11:38:39

by Lincoln Dale

[permalink] [raw]
Subject: Re: Linux 2.4.19ac3rc3 on IBM x330/x340 SMP - "ps" time skew

At 09:33 PM 31/07/2002 -0400, Albert D. Cahalan wrote:
> > HZ on x86 is 100 by default.
> > that isn't 100 per CPU, but 100 per second, regardless of whether the
> timer
> > interrupt is distributed between CPUs or serviced on a single CPU.
>
>No shit. Now, how do you create a ps executable that handles
>a 2.4.xx kernel with a modified HZ value? People did this all
>the time. I got many bug reports from these people, so don't
>go saying they don't exist. Remember: one executable, running
>on both of the these:

thanks for the rant. most entertaining. for what its worth, i wasn't
trolling.

>2.2.xx i386 as shipped by Linus
>2.4.xx i386 with HZ modified

(i assume you mean 2.4.xx i386 as shipped by Linus)

>Come on, write the code if you think it's so easy.
>You get bonus points for supporting 2.0.xx kernels
>and the IA-64 kernel with that same executable.

i suspect you're confusing me with someone else.

in either case, for ELF executables, the kernel puts the CLOCKS_PER_TICK on
the stack when loading an elf binary.
this is defined to be HZ on all platforms except ia32 where its set to
100. one would hope that if you redefine HZ to something else, you also
remember to redefine CLOCKS_PER_TICK to that same value too.

my tree uses CLOCKS_PER_TICK set to HZ for x86 too. i also use a tree with
HZ set to 1000 for a packet-latency-inducer packet-scheduler i use.

the following code determines the value of CLOCKS_PER_TICK in a reliable
manner on the hosts i have here (2.4.xx, 2.5.xx, ia32):
i don't have any alpha or ia64 boxes here, but i'm confident it'll still
give you the correct result.


--
#include <stdio.h>
#include <unistd.h>

#define AT_CLKTCK 17 /* Frequency of times() */

int main(int argc, char *argv[])
{
int i = 0;

fprintf(stderr,"sysconf says %u ticks per
second\n",sysconf(_SC_CLK_TCK));

/* loop through command-line and args */
while (argv[i] != NULL)
i++;

/* loop through environment variables */
i++;
while (argv[i] != NULL)
i++;

/* now at elf variables */
i++;
while (argv[i] != NULL) {
if ((int)argv[i] != AT_CLKTCK) {
fprintf(stderr,"(elf header entry %d has
value %d)\n",
(int)argv[i], (int)(argv[(i+1)]));
} else {
/* got it */
fprintf(stderr,"eureka, elf header says we
have %d ticks per second\n",(int)argv[(i+1)]);
break;
}
i += 2;
}
}
--

the code doesn't work on a 2.2.16 box here, given 2.2.16 doesn't have
AT_CLKTCK, but i believe that is incidental to this discussion.


cheers,

lincoln.

2002-08-01 12:55:57

by Alan

[permalink] [raw]
Subject: Re: Linux 2.4.19ac3rc3 on IBM x330/x340 SMP - "ps" time skew

On Thu, 2002-08-01 at 02:33, Albert D. Cahalan wrote:
> > HZ on x86 is 100 by default.
> > that isn't 100 per CPU, but 100 per second, regardless of whether the timer
> > interrupt is distributed between CPUs or serviced on a single CPU.
>
> No shit. Now, how do you create a ps executable that handles
> a 2.4.xx kernel with a modified HZ value? People did this all

HZ in /proc is still 100 on a correctly modified 2.4 kernel. If people
can't get the modifications right it isnt your fault.


2002-08-01 12:57:07

by Alan

[permalink] [raw]
Subject: Re: Linux 2.4.19ac3rc3 on IBM x330/x340 SMP - "ps" time skew

On Thu, 2002-08-01 at 04:34, Martin J. Bligh wrote:

> Is it somehow impossible to just export HZ in /proc, and read it?
> Doesn't seem too hard to me.

Its "100" for x86. HZ is a constant. Thats why the kernel has to keep
the values in terms of HZ published in the same format

2002-08-01 18:23:16

by Albert D. Cahalan

[permalink] [raw]
Subject: Re: Linux 2.4.19ac3rc3 on IBM x330/x340 SMP - "ps" time skew

Lincoln Dale writes:
> At 09:33 PM 31/07/2002 -0400, Albert D. Cahalan wrote:

>> No shit. Now, how do you create a ps executable that handles
>> a 2.4.xx kernel with a modified HZ value? People did this all
>> the time. I got many bug reports from these people, so don't
>> go saying they don't exist. Remember: one executable, running
>> on both of the these:
>
> thanks for the rant. most entertaining. for what its worth, i wasn't
> trolling.
>
>> 2.2.xx i386 as shipped by Linus
>> 2.4.xx i386 with HZ modified
>
> (i assume you mean 2.4.xx i386 as shipped by Linus)

No.

"Debian GNU/Linux 3.0 released July 19th, 2002
...
This version of Debian supports the 2.2 and 2.4
releases of the Linux kernel."

>> Come on, write the code if you think it's so easy.
>> You get bonus points for supporting 2.0.xx kernels
>> and the IA-64 kernel with that same executable.
>
> i suspect you're confusing me with someone else.

Yes and no. You seem to express a common opinion.
Unlike the others, you may have provided a more
reliable hack than the one currently used.

> in either case, for ELF executables, the kernel puts the CLOCKS_PER_TICK on
> the stack when loading an elf binary.
> this is defined to be HZ on all platforms except ia32 where its set to
> 100. one would hope that if you redefine HZ to something else, you also
> remember to redefine CLOCKS_PER_TICK to that same value too.

Uh... that's not good. It makes AT_CLKTCK unreliable on i386, cris,
mips, and mips64. I'll have to think about your "one would hope".

> the following code determines the value of CLOCKS_PER_TICK in a reliable
> manner on the hosts i have here (2.4.xx, 2.5.xx, ia32):
> i don't have any alpha or ia64 boxes here, but i'm confident it'll still
> give you the correct result.

Thank you very much. I'll have to try this on a 64-bit box.
It works on 32-bit ppc with the 2.4.16 kernel.

> the code doesn't work on a 2.2.16 box here, given 2.2.16 doesn't have
> AT_CLKTCK, but i believe that is incidental to this discussion.

Not really, but I might rely on sysconf() when AT_CLKTCK is missing.
Then I can tolerate:

a. any unmodified kernel, except alpha arch @ 1200HZ and user-mode @ 20HZ
b. any 2.4.xx kernel with HZ==CLOCKS_PER_SEC, even with an old libc
c. any 2.6.xx kernel, even with an old libc

That might be good enough. Asking users to run 2.4.xx if they want
to play with HZ is pretty reasonable. Asking them to run 2.5.xx,
or hack up the proc filesystem, is not.