Subject: Better testing of hardware (was: Defective Read Hat)

Part of the issue is that there exists no "easy to use" standardized test
software. Full 32-bit concurrent use of many devices can reveal problems
that users do not often see in normal applications.

One major hardware review site found stability problems with the Intel
Pentium 3 1130Mhz processor that ultimately lead to Intel delaying the
release -- it passed all tests but not a compile of the Linux Kernel! This
was on more than 3 different processors.
http://www.tomshardware.com/cpu/00q3/0008281/pentiumiii-04.html

A Linux Kernel compile test does a really good job of testing the hard disk,
RAM, and CPU... as it executes all types of instructions and the final
output depends on all prior steps completing correctly. On a really fast
system (> 900Mhz) might make sense to run it twice, once to "warm up" the
CPU and other components. Most "benchmarks" just test speed, not the actual
stability or data integrity (they write results to a device but don't check
for data corruption, or they test only one device at a time, not all at
once).

What a Linux kernel compile DOESN'T test is the network interfaces and video
cards.

Yes, there are "stand alone" test programs -- but something that uses the
actual OS interfaces and drivers (like a kernel compile) is the best way!

I think the Linux Community could really jump over Microsoft who suffers
from the same problem. Many OS-reported problems stem from hardware that is
marginal (especially CPU, RAM, and PCI/AGP bus)... works at most level, but
thrown in some heavy tasks... and odd software faults show up. A very
simple but well designed test program run for 15 minutes would detect such
problems. It is just foolish that Microsoft hasn't delivered this... as it
has to cost them 100x more to deal with it as a support problem!

You will find that most Overlockers run their favorite game in a loop for 10
or 20 minutes as the best test they have found. This often does
Video+Ram+CPU+Sound board (PCI) at full tilt. What is needed is a
_standardized test_ that really goes after everything (including network).

What "system test" programs exist for Linux today? Any active projects?
Just image a good "consumer distro" that has this as part of the setup!

I come from an OS/2, WinNT, Win2K background... believe me, the problem has
been here in the "PC platform" all along... and every OS vendor (and even
application vendor) pays for this oversight. Linux really could take the
lead! Before every kernel problem report, require "supertest" to be run for
10 minutes.

Stephen Gutknecht


-----Original Message-----
From: David Lang [mailto:[email protected]]
Sent: Tuesday, November 21, 2000 2:05 PM
To: David Riley
Cc: [email protected]
Subject: Re: Defective Red Hat Distribution poorly represents Linux


David, usually when it turns out that Linux finds hardware problems the
underlying cause is that linux makes more effective use of the component,
and as such something that was marginal under windows fails under linux as
the correct timing is used.


2000-11-21 22:17:22

by Dan Hollis

[permalink] [raw]
Subject: Re: Better testing of hardware (was: Defective Read Hat)

On Tue, 21 Nov 2000, Stephen Gutknecht (linux-kernel) wrote:
> What a Linux kernel compile DOESN'T test is the network interfaces and video
> cards.

Kernel compile over NFS while playing unreal tournament in X ;)

-Dan

2000-11-22 00:59:21

by FORT David

[permalink] [raw]
Subject: Re: Better testing of hardware (was: Defective Read Hat)

"Stephen Gutknecht (linux-kernel)" wrote:

> Part of the issue is that there exists no "easy to use" standardized test
> software. Full 32-bit concurrent use of many devices can reveal problems
> that users do not often see in normal applications.
>
> One major hardware review site found stability problems with the Intel
> Pentium 3 1130Mhz processor that ultimately lead to Intel delaying the
> release -- it passed all tests but not a compile of the Linux Kernel! This
> was on more than 3 different processors.
> http://www.tomshardware.com/cpu/00q3/0008281/pentiumiii-04.html
>
> A Linux Kernel compile test does a really good job of testing the hard disk,
> RAM, and CPU... as it executes all types of instructions and the final
> output depends on all prior steps completing correctly. On a really fast
> system (> 900Mhz) might make sense to run it twice, once to "warm up" the
> CPU and other components. Most "benchmarks" just test speed, not the actual
> stability or data integrity (they write results to a device but don't check
> for data corruption, or they test only one device at a time, not all at
> once).
>
> What a Linux kernel compile DOESN'T test is the network interfaces and video
> cards.
>
>

Compiling over NFS with compilation lines producing some kind of openGL
animation ?

--
%-------------------------------------------------------------------------%
% FORT David, %
% 7 avenue de la morvandi?re 0240726275 %
% 44470 Thouare, France [email protected] %
% ICU:78064991 AIM: enlighted popo [email protected] %
%--LINUX-HTTPD-PIOGENE----------------------------------------------------%
% -datamining <-/ | .~. %
% -networking/flashed PHP3 coming soon | /V\ L I N U X %
% -opensource | // \\ >Fear the Penguin< %
% -GNOME/enlightenment/GIMP | /( )\ %
% feel enlighted.... | ^^-^^ %
% http://ibonneace.dnsalias.org/ when connected %
%-------------------------------------------------------------------------%



2000-11-22 07:58:01

by Eric W. Biederman

[permalink] [raw]
Subject: Re: Better testing of hardware (was: Defective Read Hat)

"Stephen Gutknecht (linux-kernel)" <[email protected]> writes:

> A Linux Kernel compile test does a really good job of testing the hard disk,
> RAM, and CPU... as it executes all types of instructions and the final
> output depends on all prior steps completing correctly. On a really fast
> system (> 900Mhz) might make sense to run it twice, once to "warm up" the
> CPU and other components. Most "benchmarks" just test speed, not the actual
> stability or data integrity (they write results to a device but don't check
> for data corruption, or they test only one device at a time, not all at
> once).

Also note that a Linux Kernel compile stresses memory because
of the very pointer loaded data structures of gcc. This means that
memory corruption is most likely to flip a bit in a pointer, and cause
a bad pointer.

Eric

2000-11-23 10:19:14

by Pavel Machek

[permalink] [raw]
Subject: Re: Better testing of hardware (was: Defective Read Hat)

Hi!

> You will find that most Overlockers run their favorite game in a loop for 10
> or 20 minutes as the best test they have found. This often does
> Video+Ram+CPU+Sound board (PCI) at full tilt. What is needed is a
> _standardized test_ that really goes after everything (including network).

You don't need to test network that much: if your network card garbles
packets under high load, it is ok. TCP checksums should catch that.

(OTOH, on really broken serial cable (no flow control and machines
definitely miss characters sometimes), I can occassionaly see
corruption even with TCP. Ouch.)
Pavel
--
I'm [email protected]. "In my country we have almost anarchy and I don't care."
Panos Katsaloulis describing me w.r.t. patents at [email protected]