2001-11-06 19:42:09

by Bob Matthews

[permalink] [raw]
Subject: 2.4.14-pre8 stress testing

We've been doing some stress testing on 2.4.14-pre8 here. So far the
results look positive, with the exception of an SMP PAE kernel on one of
our 8-ways.

- A UP kernel with no highmem passed all our tests on 256M/512M and
800M/1600M RAM/Swap combo's.

- An SMP kernel with highmem=4G passed all our tests on two 1G/2G dual
processor configurations.

The problem kernel is statically linked, with HIGHMEM=64G, SMP, NFS
client and server V3, eepro100, and the sym53c8xx driver. The machine
is an 8xPIII configured as 8G/16G.

The machine ran the test suite for ~17 hours, and then gradually began
to slow down to the point where key presses at a virtual console took
many seconds to echo. Eventually, the machine became unresponsive. The
test harness clock is still ticking, and I can change VC's, but that's
about it. Magic Sysrq doesn't give me anything except the name of the
corresponding command. The machine does not appear to have generated
any oops output.

If I can provide you with anymore info, please let me know.
--
Bob Matthews
Red Hat, Inc.


2001-11-06 21:09:56

by Manfred Spraul

[permalink] [raw]
Subject: Re: 2.4.14-pre8 stress testing

> Magic Sysrq doesn't give me anything except the name of the
> corresponding command. The machine does not appear to have generated
> any oops output.

Was just one command name printed, or multiple commands?
The sysrq handlers are protected by a spinlock.
If multiple command names were printed it means that the sysrq handlers
themself returned, and that printk works.

I bet that the console loglevel got corrupted.
The sysrq handler should run with forced loglevel 7, like the print of
the command name.

Did you try SysRQ+7?

--
Manfred

2001-11-06 21:29:25

by Bob Matthews

[permalink] [raw]
Subject: Re: 2.4.14-pre8 stress testing

Manfred Spraul wrote:
>
> > Magic Sysrq doesn't give me anything except the name of the
> > corresponding command. The machine does not appear to have generated
> > any oops output.
>
> Was just one command name printed, or multiple commands?
> The sysrq handlers are protected by a spinlock.
> If multiple command names were printed it means that the sysrq handlers
> themself returned, and that printk works.

Multiple command names were printed, i.e.

<alt><SysRq>T produces SysRq: Show State, but nothing more
<alt><SysRq>P produces SysRq: Show Regs, but nothing else, etc.

>
> I bet that the console loglevel got corrupted.
> The sysrq handler should run with forced loglevel 7, like the print of
> the command name.
>
> Did you try SysRQ+7?

I tried resetting the loglevel to 7. Same results.


--
Bob Matthews
Red Hat, Inc.

2001-11-07 15:59:50

by Bob Matthews

[permalink] [raw]
Subject: Re: 2.4.14-pre8 stress testing

Bob Matthews wrote:

> The problem kernel is statically linked, with HIGHMEM=64G, SMP, NFS
> client and server V3, eepro100, and the sym53c8xx driver. The machine
> is an 8xPIII configured as 8G/16G.
>
> The machine ran the test suite for ~17 hours, and then gradually began
> to slow down to the point where key presses at a virtual console took
> many seconds to echo. Eventually, the machine became unresponsive. The
> test harness clock is still ticking, and I can change VC's, but that's
> about it. Magic Sysrq doesn't give me anything except the name of the
> corresponding command. The machine does not appear to have generated
> any oops output.

Well, this is very strange indeed. Sometime during the night, this
machine came back to life and continued executing the test suite.
According to the logs, it ran the tests for approximately 7 hours, then
became unresponsive again. It is currently in the "zombie" state
described above: Magic Sysrq doesn't give anything but the name of the
command, and a shell I have opened on a VC won't echo any characters. I
guess we'll see if it resurrects itself again.

Manfred mentioned that Sysrq tries to take the task queue spinlock. Is
there some segment of code in the kernel which would cause a process to
grab the task queue spinlock and hold it for a long time under heavy
memory contention?

I should mention that the job mix induced by the test suite would be
considered unreasonable in a normal environment.

--
Bob Matthews
Red Hat, Inc.